Bottlenecks of Conversational AI and Interaction Considerations for Next-Generation Agent Applications

Over the past two years, I have frequently used various AI language models to solve daily work and life problems, essentially replacing Google searches with ChatGPT. However, as I continued to use it, I gradually noticed some issues.

Although compared to traditional search engines, AI does not require users to provide precise keywords, making it more efficient in searching.

But if you ask AI to perform more rigorous tasks, the user needs to organize sufficient, structured prior information to provide adequate context for the AI, and also make precise demands; otherwise, its output will miss the mark.

If LLM-type AI is only used as a search engine, the efficiency advantage it generates is very high compared to the token cost. This is because AI needs to bear the "correctness" metric during output, which is precisely its weakness.

Thus, there arises a threshold of "how to ask AI the right questions," which is extremely unfriendly to general users. Even though current services like ChatGPT can use short-term memory and historical data (likely with intent recognition models) to infer user intent and provide more reliable answers, from a product methodology perspective, this is still insufficient.

The mobile internet era supported information and transactions for specific scenarios through limited service boundaries. Users could enter different apps (different scenarios) according to their needs to complete information queries and transactions, with "mobility" turning low-frequency into high-frequency.

If we enter the AI era, where all services start with "requiring users to fully express their needs," it would stifle the vast legacy of the mobile internet era, turning AI into a verbose emotional companion, unable to deliver information to users more quickly or help users complete transactions more efficiently, which would be a significant failure.

In my ideal Agent form, it should complete a full closed loop:

Automatic data perception → self-organization → analysis → decision-making → solution recommendation → execution → generation of new data.

This chain emphasizes not "dialogue" but "autonomy." That is, AI can identify the user's "pain points" before the user feels the "pain."


Three Types of Data: The Foundation of AI Applications

In AI-driven Agent applications, data should be categorized into at least three types:

  • Business Data: Data flows from APIs, search engines, and external systems. This is rational factual information.
  • User Data: Input directly generated by users. This is emotional experiential information and also reflects the user's state.
  • AI-Generated Data: Intermediate results and new information derived by the model during operation. This is the solution formed by information fusion.

Only when these three types of data can be effectively managed in terms of time and demand hierarchy can an Agent possess complete perception and decision-making capabilities.


Why Is Conversational Interaction a Bottleneck?

Today's mainstream AI interaction is still "conversation." The problems are:

  • Chaos in Time and Demand: The model cannot determine which tasks are outdated or which demands have been abandoned, often repeating executions, with accumulated errors like "whack-a-mole."
  • Probability-Driven Rather Than Logic-Driven: Dialogue generation relies on statistical probability. If A says "Hi," AI most likely responds "Hello." This is a kind of "thinking efficiency," but not "answer efficiency."
  • Too High a Barrier for User Questions: Asking a "good question" itself requires a clear cognitive framework. Users who can do this have already solved half the problem, reducing their dependence on AI.

Therefore, conversational interaction is both difficult to use and unreliable for most ordinary users.


The Lesson from the Mobile Internet

The greatness of the mobile internet lies in the evolution of interaction. Each app reduced users' cognitive costs through information display, search, filtering, and sorting, making services accessible.

In contrast, today's AI application experience remains at "bare dialogue," without any constraints or structure. The result is:

  • Output results spread infinitely.
  • Users lack effective means of constraint.
  • Overall efficiency decreases, even making AI seem "not easy to use."

If users cannot ask questions well, it's not the users' problem; it's the industry's problem.


Interaction Principles for Next-Generation AI Applications

If AI is to truly become widespread, application forms must break through the limitations of conversational interaction and at least achieve the following:

State Management and Memory Layer

  • Possess task states, avoiding infinite forgetting or repeated execution.
  • Know what is outdated and what is a priority.

Multimodal Perception Replacing Text Interaction

  • Proactively "see problems" from business data and user behavior.
  • No longer wait for users to piece together complex questions.

Semi-Structured Interaction Design

  • Use GUI interaction methods like forms, checkboxes, filters, and drag-and-drop to let AI operate within a reasonable space.
  • Users only need to confirm and choose, not construct perfect questions.

Execution Closed Loop

  • AI not only answers but can also directly execute.
  • After execution, feedback data forms a closed loop for iteration.

Back then, OpenAI showcased the awe of AI dialogue to the world with ChatGPT 3.5. But what truly determines the popularity of AI is not the size of the model parameters but the evolution of interaction methods.

Future AI applications should resemble "apps with an AI core" more than "AI chatboxes within apps."

I believe this is the industry's next inflection point and the prerequisite for Agents to truly take off.