From Random Parroting to Self-Correction: The Mechanism of Hallucination in Large Models and the Core Value of Agents

Why Do Large Models Hallucinate?

On September 5th, OpenAI published a paper addressing this long-standing industry issue, and the conclusion was surprisingly straightforward—it's not because the models lack "intelligence," but because there is inherent bias in the training and evaluation mechanisms themselves.

During model training, if a model doesn't know the answer and chooses to say "I don't know," it receives a score of 0. If it makes a random guess, even with a very small chance of being correct, it performs better than saying "I don't know." This reward system encourages large models to learn test-taking strategies akin to those in exam-oriented education.

Consequently, models are trained over time to adopt a behavioral pattern: regardless of whether they understand, they must provide a seemingly plausible answer.

Coincidentally, Manus, in sharing their experience building Agents on July 19th this year, also mentioned that in long-context interactions, directly deleting a model's incorrect answers or redundant information can disrupt the KV-cache, leading to increased latency and costs. More importantly, deleting errors is equivalent to erasing the model's "history," depriving it of opportunities for self-correction.

In addressing the hallucination problem, Agent startups have provided solutions earlier than AI vendors. Manus's approach is a typical case: they choose to incorporate errors into the context rather than simply discarding them.

This method aligns with OpenAI's new research—the truly viable solution direction is not "suppressing errors" but "learning from mistakes."


This is similar to how humans learn.

A child learns not to play with fire not only through adult instruction but also through another learning path: making mistakes—getting burned by fire naturally teaches them that fire is not to be touched casually.

The training mechanism based on one-to-one correspondence between questions and answers only follows the first path. This path is like rote learning, rapidly amplifying gains through Scaling Law, but simultaneously rapidly amplifying errors—when the model encounters problems beyond the "question bank," it often can only rely on test-taking techniques to mask its ignorance.

The hallucinations of large models are essentially a replication of this "exam-oriented education logic" in AI: when faced with the unknown, they tend to provide a seemingly reasonable answer rather than admitting they don't know.

This ability initially appears impressive, even resembling "creation out of nothing." However, in human-computer interaction or data interaction scenarios, it poses potential risks.

AI products inherently carry a "high-tech endorsement," making it difficult for users to distinguish the authenticity of answers. In most cases, hallucinations are only truly noticed when the AI's errors are very obvious or cause direct material or psychological harm to the user.

For a long time, the "hallucinations" of large models were considered a precursor to "generalization" ability—large models not only "memorize by rote" but can also transfer learned patterns to new problems. This was also seen as evidence that AGI could be achieved in the near future.

Years of industry research and practice have proven this notion wrong—hallucinations are just hallucinations, not pattern transfer, and certainly not generalization.


A Parrot Randomly Mimicking or Genuine Reasoning?

Within the AI user community, there has long been a debate—do current large model AIs possess reasoning capabilities?

Proponents argue that large models demonstrate preliminary reasoning through emergent abilities and have the potential for systematic reasoning through text interaction, possibly breaking through current limitations in the future. Opponents believe large models are merely word frequency prediction tools, lacking self-reflection, generalization, and dynamic learning abilities; their current performance may be statistical imitation rather than genuine reasoning.

The fundamental disagreement between the two sides lies in the definition of "reasoning."

From my usage experience, I believe current large models do not possess reasoning capabilities; they are merely playing a word-guessing game.

True reasoning should possess two basic abilities: self-falsification ability—whether the model can learn new knowledge from mistakes; and generalization ability—whether the model can understand and apply the fundamental principles of things.

These two abilities correspond to two basic methods in philosophy: deductive reasoning and inductive reasoning.

In fact, if the output quality of large models is comparable to that of human-level reasoning results, whether its working process strictly adheres to definitions is not important in itself. However, from my usage experience over the years, the output level of large models is clearly far from "reasoning."

OpenAI's analytical paper on the causes of hallucinations恰好说明当前的大模型还未真的学会推理、或输出结果并未对标推理效果。


The Future of AI Agents: From Data Stewards to Feedback Coaches

The prerequisite for transferring correct patterns to unfamiliar problems is that the AI possesses self-falsification ability.

From OpenAI's paper and the practices of Agent startups, we seem to see the dawn of ability generalization. Only when the model knows what is wrong does encouraging it to engage in extensive trial and error become meaningful.

In GPT-5, OpenAI applied a new training method that allows the model to answer "I don't know." While this method can reduce hallucinations, it also constrains the model's proactive engagement.

Model vendors cannot truly eliminate hallucinations, just as in a classroom, a teacher cannot exhaustively list all principles of everything to students, let alone enumerate all wrong answers. Hallucinations are a byproduct of the large model mechanism, not a "defect" that can be completely eradicated.

But this doesn't mean we are powerless. In fact, Agents may have more opportunities to address this than the models themselves.

I used to think the primary responsibility of Agents was to organize data for large models: breaking down problems, supplementing context, providing background knowledge, and helping the model better understand the user's context and needs.

But now it seems Agents should take on another critical task: providing feedback to the model.

This means an Agent is not just a "data pre-processor" but also plays the roles of "output referee" and "learning guide":

Effective Organization of Data

  • What information must the model know?
  • What information becomes noise and should be filtered out?
  • How to maintain context in dynamic conversations without being led astray by hallucinations?

Multi-dimensional Evaluation of Results

  • Is the output correct? What is the logical basis for its correctness?
  • If wrong, where is the error? Why is it wrong?
  • Can incorrect answers and their reasons be fed back into the context for the model to gradually adjust?

When an Agent can achieve these two points, it is no longer a transient byproduct easily overridden by the model but a core component connecting the model with users and the physical world.


In the long run, true generalization is not achieved through one-time model training but relies on continuously accumulated correction experiences in real interactions. Agents can precisely carry this closed loop: continuously filtering, providing feedback, and calibrating between users and models, thereby gradually bringing the AI usage experience closer to the ideal state of "self-falsification + generalization."

Here, there remains an unresolved question: who is responsible for identifying hallucinations? Should it be left to the model itself to complete "self-falsification," as OpenAI's attempts suggest, or should we rely more on external system supervision and feedback?

In other words, the future direction of AI may depend on our choice regarding this issue:

  • If identification and correction are entirely left to the model, it may be more stable but with limited application scope—blind faith in books is worse than having no books.
  • If relying on external Agents or system supervision, its capability boundaries could be rapidly expanded, but risks would also amplify accordingly, akin to an alchemist's pursuit.

Whichever path is taken, AI can no longer be just a parrot "talking to itself" but must enter a collaborative cycle of human-system co-correction.

This may be the necessary path from hallucination to true intelligence.