From Random Parroting to Self-Correction: The Mechanism of Hallucination in Large Models and the Core Value of Agents

Why do large models generate hallucinations?

On September 5th, OpenAI published a paper addressing this long-standing industry issue, and the conclusion was surprisingly direct—it's not because the models lack "intelligence," but because there is inherent bias in the training and evaluation mechanisms themselves.

When they train models, if a model doesn't know the answer and chooses to say "I don't know," it gets 0 points. If it takes a random guess, even with a very small chance of being correct, it performs better than saying "I don't know." This reward system encourages large models to learn test-taking strategies akin to those in exam-oriented education.

Consequently, models are trained over the long term into a behavioral pattern: regardless of whether they understand, they must provide a seemingly plausible answer.

Coincidentally, when Manus shared their experience building Agents on July 19th this year, they also mentioned that in long-context interactions, directly deleting a model's incorrect answers or redundant information would damage the KV-cache, leading to increased latency and costs. More importantly, deleting errors is equivalent to erasing the model's "history," depriving it of opportunities for self-correction.

In addressing the hallucination problem, Agent startups have provided solutions earlier than AI vendors. Manus's approach is a typical case: they choose to incorporate errors into the context rather than simply discarding them.

This method aligns with OpenAI's new research—the truly viable solution direction is not "suppressing errors" but "learning from mistakes."

This is similar to how humans learn.

A child learns not to play with fire not only through adult instruction but also through another learning path: making mistakes—getting burned by fire naturally teaches them that fire is not something to be touched casually.

The training mechanism based on one-to-one correspondence between questions and answers only navigates the first path. This path is like rote learning, rapidly amplifying gains through Scaling Law, but simultaneously rapidly amplifying errors—when the model encounters problems beyond its "exercise bank," it often can only rely on test-taking tricks to conceal its ignorance.

The hallucinations of large models are essentially a replication of this "exam-oriented education logic" in AI: when faced with the unknown, they are more inclined to give a seemingly reasonable answer rather than admit they don't know.

This ability initially appears impressive, even resembling "creation out of nothing." However, once placed in human-computer interaction or data interaction scenarios, it becomes a potential risk.

AI products inherently carry a "high-tech endorsement," making it difficult for users to discern the authenticity of answers. In most cases, hallucinations are only truly noticed when the AI's errors are very obvious or cause direct material or psychological harm to the user.

For a long time, the "hallucinations" of large models were considered a precursor to "generalization" ability—large models not only "memorize by rote" but can also transfer learned patterns to new problems. This was also seen as evidence that AGI could be realized in the near future.

Years of industry research and practice have now proven this notion wrong—hallucinations are just hallucinations, not pattern transfer, and certainly not generalization.

A Parrot Randomly Mimicking or Genuine Reasoning?

Within the AI user community, there has long been a debate—do current large model AIs possess reasoning capabilities?

Proponents argue that large models demonstrate preliminary reasoning through emergent abilities and have the potential for systematic reasoning through text interaction, possibly breaking through current limitations in the future. Opponents believe large models are merely word frequency prediction tools, lacking self-reflection, generalization, and dynamic learning capabilities; their current performance might be statistical imitation rather than genuine reasoning.

The fundamental disagreement between the two sides lies in the definition of "reasoning."

Based on my usage experience, I believe current large models do not possess reasoning capabilities; they are merely playing a guessing game.

True reasoning should possess two basic abilities: self-falsification ability—can the model learn new knowledge from mistakes; and generalization ability—can the model understand and apply the fundamental principles of things.

These two abilities correspond to two basic methods in philosophy: deductive reasoning and inductive reasoning.

Actually, if the output quality of a large model is comparable to that of human-level reasoning results, whether its working process strictly follows the definition is not important in itself. However, from my usage experience over the past few years, the output level of large models is clearly far from "reasoning."

OpenAI's analytical paper on the causes of hallucinations恰好说明当前的大模型还未真的学会推理、或输出结果并未对标推理效果。

The Future of AI Agent: From Data Steward to Feedback Coach

The prerequisite for transferring correct patterns to unfamiliar problems is that the AI possesses self-falsification ability.

And from OpenAI's paper and the practices of Agent startups, we seem to see the dawn of ability generalization. Only when the model knows what is wrong does encouraging it to engage in extensive trial and error become meaningful.

In GPT-5, OpenAI applied a new training method, allowing the model to answer "I don't know." While this method can reduce hallucinations, it also constrains the model's proactive engagement.

Model vendors cannot truly eliminate hallucinations, just as in a classroom, a teacher cannot exhaustively list all principles of everything to students, let alone enumerate all wrong answers. Hallucinations are a byproduct of the large model mechanism, not a "defect" that can be completely eradicated.

But this doesn't mean we are powerless. In fact, Agents may have more opportunities to make progress in this area than the models themselves.

I used to think the primary responsibility of an Agent was to organize data for the large model: breaking down problems, supplementing context, providing background knowledge, helping the model better understand the user's context and needs.

But now it seems that Agents should take on another crucial task: providing feedback for the model.

This means an Agent is not just a "data pre-processor" but also plays the roles of "output referee" and "learning guide":

Effective Organization of Data

What information must the model know?
What information becomes noise and should be filtered out?
How to maintain context in dynamic conversations without being led astray by hallucinations?

Multi-dimensional Rendering of Results

Is the output correct? What is the logical basis for its correctness?
If wrong, where is the error? Why is it wrong?
Can incorrect answers and their reasons be fed back into the context, allowing the model to gradually adjust?

When an Agent can achieve these two points, it is no longer just a transient byproduct easily overridden by the model, but a core component connecting the model with users and the physical world.

In the long run, true generalization is not achieved through one-time model training but relies on continuously accumulated correction experiences in real interactions. Agents can precisely carry this closed loop: continuously filtering, providing feedback, and calibrating between the user and the model, thereby gradually bringing the AI usage experience closer to the ideal state of "self-falsification + generalization."

Here, there remains an unresolved question: who is responsible for identifying hallucinations? Should it be like OpenAI's attempt, entrusting "self-falsification" to the model itself, or should we rely more on external system supervision and feedback?

In other words, the future direction of AI may depend on our choice regarding this issue:

If identification and correction are entirely left to the model, it might be more stable, but the application scope is limited—blind faith in books is worse than having no books.
If relying on external Agents or system supervision, its capability boundaries could be rapidly expanded, but risks would also amplify accordingly, akin to an alchemist concocting elixirs.

Whichever path is taken, AI can no longer be just a parrot "talking to itself"; it must enter a collaborative cycle of human-system joint error correction.

This might be the necessary path from hallucinations to true intelligence.

Let's talk