Bottlenecks of Conversational AI and Interaction Considerations for Next-Generation Agent Applications

Over the past two years, I have frequently used various AI language models to solve daily work and life problems, essentially replacing the previous Google search with ChatGPT. However, as I continued to use it, I gradually noticed some issues.

Although compared to traditional search engines, AI does not require users to provide precise keywords, making it more efficient in searching.

But if you ask AI to complete more rigorous tasks, the user needs to organize sufficient, structured prior information to provide adequate context for the AI, and also make precise demands; otherwise, its output will miss the mark.

If LLM-type AI is only used as a search engine, the efficiency advantage it generates is very high compared to the token cost. This is because AI needs to bear the "correctness" indicator during output, which is precisely its weakness.

Thus, a threshold arises: "how to ask AI the right question," which is an extremely user-unfriendly requirement. Even though current services like ChatGPT can use short-term memory and historical data (likely with intent recognition models) to infer user intent and provide more reliable answers, from a product methodology perspective, this is still insufficient.

The mobile internet era, through limited service boundaries, provided information and transaction support for specific scenarios. Users could enter different apps (different scenarios) according to their needs to complete information queries and transactions. The "mobile" approach turned low-frequency into high-frequency.

If we enter the AI era, where all services start from "requiring users to fully express their needs," it would stifle the vast legacy of the mobile internet era, turning AI into a rambling emotional companion, unable to deliver information to users more quickly or help users complete transactions more efficiently, which would be quite a failure.

In my ideal Agent form, it should complete a full closed loop:

Automatic data perception → self-organization → analysis → decision-making → solution recommendation → execution → generation of new data.

This chain emphasizes not "dialogue" but "autonomy." That is, before the user feels the "pain," AI can first identify the user's "pain point."


Three Types of Data: The Foundation of AI Applications

In AI-driven Agent applications, data should at least be divided into three categories:

  • Business Data: Data streams from APIs, search engines, external systems. It is rational factual information.
  • User Data: Input directly generated by the user. It is emotional experiential information and also the user's state information.
  • AI-Generated Data: Intermediate results and new information derived by the model during operation. It is the solution formed by information fusion.

Only when these three types of data can be effectively managed in terms of time and demand hierarchy can the Agent possess complete perception and decision-making capabilities.


Why is Conversational Interaction a Bottleneck?

Today's mainstream AI interaction is still "dialogue." The problems are:

  • Chaos of Time and Demand: The model cannot determine which tasks are outdated or which demands have been abandoned, often repeating execution, accumulating errors like "whack-a-mole."
  • Probability-Driven Rather Than Logic-Driven: Dialogue generation relies on statistical probability. If A says "Hi," AI most likely replies "Hello." This is a kind of "thinking efficiency," but not "answer efficiency."
  • Too High a Threshold for User Questions: Asking a "good question" itself requires a clear cognitive framework. Users who can do this have already solved half the problem, reducing their dependence on AI.

Therefore, conversational interaction is both difficult to use and unreliable for most ordinary users.


The Lesson from the Mobile Internet

The greatness of the mobile internet lies in the evolution of interaction. Each app, through information display, search, filtering, sorting, reduced users' cognitive costs, making services accessible.

In contrast, today's AI application experience remains at "bare dialogue," without any constraints or structure. The result is:

  • Output results spread infinitely.
  • Users lack effective means of constraint.
  • Overall efficiency decreases, even making AI seem "not easy to use."

Users not knowing how to ask questions is not the users' problem; it's the industry's problem.


Interaction Principles for Next-Generation AI Applications

If AI is to truly become widespread, application forms must break through the limitations of conversational interaction, achieving at least the following:

State Management and Memory Layer

  • Possess task states, avoiding infinite forgetting or repeated execution.
  • Know what is outdated and what is a priority.

Multimodal Perception Replacing Text Interaction

  • Proactively "see problems" from business data and user behavior.
  • No longer wait for users to piece together complex questions.

Semi-Structured Interaction Design

  • Use GUI interaction methods like forms, checkboxes, filters, drag-and-drop to let AI operate within a reasonable space.
  • Users only need to confirm and choose, not construct perfect questions.

Execution Closed Loop

  • AI not only answers but can also directly execute.
  • After execution, feedback data forms a closed loop for iteration.

Back then, OpenAI showcased ChatGPT 3.5 to demonstrate the shock of AI dialogue to the world. But what truly determines the popularity of AI is not the size of the model parameters but the evolution of interaction methods.

Future AI applications should be more like "apps with an AI core" rather than "AI chatboxes within apps."

I believe this is the industry's next inflection point and the prerequisite for Agents to truly land.