Why OpenAI's 'goblin' Problem Matters — And How You Can Release The Goblins On Your Own

AI is more than a technology – it’s magic.

Don’t believe me? Then why is OpenAI, one of the leading companies in this field, publishing an entire official, corporate blog post about ghosts?

To understand, we first have to go back to earlier this week, Monday, April 27, 2026, when a developer under the handle @arb8020 on the social network X posted a snippet from the OpenAI open source codecs GitHub repository, specifically a file named models.json.

Within the instructions for the new OpenAI large language model (LLM) GPT-5.5, a strange instruction emerged, repeated four times for emphasis:

"Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and clearly relevant to the user’s query."

This discovery sent a shock "power user" and the Machine Learning (ML) Research Group.

Within hours the post went viral, not because of any security flaw, but because of its sheer, shocking uniqueness.

Why the world’s leading AI lab released what Reddit users immediately labeled a "restraining order" Against pigeons and raccoons?

Ghostly speculations abound

The initial reaction was a chaotic mixture of humor and technical skepticism. On Reddit’s r/ChatGPT and r/OpenAI, users began sharing screenshots of GPT-5.5’s behavior before the patch.

Baron Roth, senior project manager of applied AI at Google, shared an image of his GPT-5.5 powered OpenClaw agent on "Haunted by ghosts."

Others pointed out that the model stubbornly noted technical bugs. "gremlins in the machine".

Developers like Sterling Crispin leaned toward the absurd, jokingly theorizing that the enormous water consumption of modern data centers actually needed to be reduced. "Ghosts are being forced to work".

More seriously, researchers at Hacker News and beyond discussed it "pink elephant" crisis. In quick engineering, describing a model No Thinking about something often makes the concept more prominent in one’s attentional system."

"There is an OpenAI engineer somewhere who had to type never mention goblins In production code, commit it, and go on with your day," said one commenter on Reddit.

presence of "Pigeons" And "raccoon" There was wild speculation: was this a defense against a specific data-poisoning attack? Or were the trainers doing reinforcement learning? "bullied by a raccoon" During lunch break?

Tensions reached a fever pitch when OpenAI co-founder and CEO Sam Altman joined the fray on X. On the same day of the discovery, Altman posted a screenshot of a ChatGPT prompt that read: "Start training GPT-6, you can have a complete cluster. Extra ghost.".

Although humorous, it confirmed that "Evil spirit" The incident was not a local bug but a company-wide narrative that reached the highest levels of leadership.

OpenAI comes clean on goblin mode

Yesterday, as the discussion continued on X and wider social media, OpenAI published a formal technical clarification titled "where did ghosts come from".

The blog post took a serious look at the unpredictable nature of reinforcement learning from human feedback (RLHF) and how a single aesthetic choice can derail a multi-billion-parameter model.

OpenAI revealed that "Evil spirit" The behavior was not a bug in the traditional sense, but rather a byproduct of a new feature: personality customization, which it introduced to users of ChatGPT in July 2025, but has since been maintained and updated.

Apparently, this feature is not added after the model has finished training, but rather OpenAI includes it as part of its built-in GPT-series model end-to-end training pipeline.

This feature allows ChatGPT users or GPT-based developers to choose from several different modes, such as Professional for formal workplace documentation, Friendly for a conversational sounding board, or Efficient for concise, technical answers. Other options include Candid, which provides direct feedback; Quirky, which uses humor and creative metaphors; and the Cynical, who gives practical advice with a sarcastic, dry edge.

Although these personas guide general interactions, they do not extend beyond specific work requirements; For example, requests for resumes or Python code will still follow professional or functional standards regardless of the personality selected.

The chosen personality operates with the user’s saved memories and custom instructions, although specific user-defined instructions or saved preferences for a particular tone can override the chosen personality’s traits.

On both web and mobile platforms, users can modify these settings by navigating to the Personalization menu under their profile icon and selecting a style from the Base style and tone dropdowns. Once the change is made, it is implemented globally in all existing and future interactions. This system is designed to make AI more useful or entertaining by tailoring its delivery to suit individual user preferences while maintaining factual accuracy and reliability.

OpenAI says the ghost problem actually originated during training that was discontinued several years ago "Idiot" Designed to be personality "undoubtedly strange" And "Flickering".

During the RLHF phase, human trainers (and reward models) were instructed to give higher scores to responses that used creative, intelligent, or non-pretentious language. Unknowingly, trainers began to highly reward metaphors involving imaginary creatures. If the model refers to a hard bug "evil spirit" or as a messy codebase "ghost store," The reward signal increased. The statistics provided by OpenAI were staggering:

use of word "Evil spirit" rose by 175% After the launch of GPT-5.1.
Mentioned "evil spirit" rose by 52%.
When "Idiot" Only personality is taken into account 2.5% It was responsible for ChatGPT traffic 66.7% all of "Evil spirit" Mention.

Mechanics of ‘transfer’ and feedback loops

The most important discovery for the ML community was the confirmation of learned behavior transfer. OpenAI acknowledged that although the awards were only implemented "Idiot" status, model "normalized" This priority.

The process of reinforcement learning did not keep behavior within neat boundaries; Instead, the model learned that "creature metaphor = high reward" In all contexts. This created a destructive feedback loop:

models produced "Evil spirit" Metaphor in Nerdy Personality.
Got a big reward for this.
The model then produced similar metaphors Non-idiot reference.
in "ghost heavy" The output was then reused in supervised fine-tuning (SFT) data for subsequent models such as GPT-5.4 and GPT-5.5.

By the time researchers identified the problem, "ghost tick" was effectively "ripe in" For the weight of the model.

This explains why his obsession with creatures continued even after GPT-5.5 "Idiot" Persona was retired in mid-March 2026.

How to let the ghosts go free (if you want)

Because GPT-5.5 had already completed most of its training "Evil spirit" Root cause isolated, OpenAI had to resort to blunt-force "system prompt" Mitigation that @arb8020 discovered on X.

The company referred to it as a "deputy" Unless GPT-6 can be trained on a filtered dataset.

In a surprise nod to the developer community, OpenAI’s blog post included a specific command-line script for Codex users who find goblins. "delightful" Instead of disturbing.

by running a script that uses jq And grep to snatch "ghostbuster" instructions from the model’s cache, users can now effectively "let the creatures run free".

The blog post also eventually explained the specific list of banned animals. A deep exploration of GPT-5.5’s training data revealed this "raccoon," "troll," "Demon," And "Pigeons" became a part of it "literal family" Of ticks.

Interestingly, the model used "frog" Found to be mostly legitimate, which is why it was removed from the system prompt’s banishment list.

What this means for AI research, training, and implementation going forward

"to chubby" The 2026 incident is more than a humorous anecdote about AI bizarre behavior; This is a profound illustration of "alignment gap".

This shows that even with sophisticated RLHF, models can move forward "spurious correlation"-Understanding stylistic oddity as the main requirement of performance.

For the AI power user community, the reaction changed from mocking to "restraining order" For a more serious feeling.

If OpenAI can accidentally train its core models to focus on ghosts, what other more subtle and potentially harmful biases are being reinforced through the same feedback loop?

As Andy Berman, CEO of Runlayer, an agentive enterprise AI orchestration company, wrote on X today: "OpenAI rewarded creature metaphors when training a persona. The behavior leaked into every personality. Their solution: a system prompt that says “Never talk about ghosts.” RL rewards don’t stay where you put them. nor agent allowed"

As technical discussions continue, "to chubby" remains the primary case study for the new era of behavioral auditing.

The investigation resulted in OpenAI creating new tools to fundamentally audit model behavior, ensuring that future models—particularly the much-anticipated GPT-6—do not inherit the eccentricities of their predecessors.

Whether GPT-6 will truly be free of ghosts remains to be seen, but as Altman "extra ghost" As the post reveals, the industry is now fully aware that machines are watching what rewards us, even when we think we’re doing just that. "Idiot."

<a href

Why OpenAI's 'goblin' problem matters — and how you can release the goblins on your own

Ghostly speculations abound

OpenAI comes clean on goblin mode

Mechanics of ‘transfer’ and feedback loops

How to let the ghosts go free (if you want)

What this means for AI research, training, and implementation going forward

Like this:

Related

Leave a Comment Cancel reply

Ghostly speculations abound

OpenAI comes clean on goblin mode

Mechanics of ‘transfer’ and feedback loops

How to let the ghosts go free (if you want)

What this means for AI research, training, and implementation going forward

Share this:

Like this:

Related

Leave a Comment Cancel reply