Current Design Flaws of AI Agents

From Chat UI to AI Native, we may be creating "intelligence" in the wrong way.
This article mainly discusses the current interaction design issues of AI Agents, focusing on the following points:

Human in the loop mistakenly pulls users into engineering decisions.

Chat UI compresses system states into a complex mudslide of language.

Role templates compress the AI's multi-role system capability into a single-role persona.

Many people are amazed when they first experience an AI Agent. It can write code, operate a computer, browse your chat history to help you deal with unwanted attention, or act as your cyber partner. In the words of some irresponsible media, it can even help you "make money," post things here and there, increase passive income, achieve early retirement, and head towards financial freedom.

But if you actually use it for a while, you'll find that the sci-fi stories touted by the media are almost entirely illusions. You haven't become a successful OPC (One Person Company) boss, nor have you added a single zero to your account. All you're left with is a token bill that burns a hole in your pocket and an increasingly fragmented and weary attention span.

I am a heavy user of AI products, spending a significant amount of time each day conversing with AI. So I know very well how much water is in these stories that put AI on a pedestal.

For many non-professional engineers (like myself) who frequently use Agents for vibe coding, the most realistic experience is that the Agent is very capable but exhausting to use. It either asks for permission for everything or goes off on its own.

In current mainstream Agent design, "human in the loop" is a core design philosophy. The main idea is to deeply involve the user in the Agent's workflow, having them review and correct the Agent's output at critical points.

But the problem is that many HITL design scenarios are for programming. The Agent focuses on issues like permissions, responsibility, and data risks during the programming process. These are precisely the areas that most non-professional users are indifferent to. Users can neither assess the risks nor would they actually stop it. Conversely, the Agent tends to go wild in areas the user truly cares about.

So the final result is often unsatisfactory.

It doesn't know what's important to you or what's beyond your understanding. So, when faced with most of its "requests," I just blindly approve them. But the final result it gives me never fails to make me want to spit blood. Either the interaction experience is flawed, or the UI is a mess with chaotic colors – a whole set of OpenAI-style muddy yellow/blue-purple, paired with Google's signature giant rounded corners, making me oscillate between physical discomfort and mental explosion.

Because for a product manager, I don't care about, nor am I capable of caring about, database operation risks. I only focus on interaction experience and UI design. Asking me to make decisions about permission risks is like asking a blind person about colors. In other words, many Agent designs are not user-oriented but are programming tools for engineers.

Since the launch of ChatGPT, Chat UI seems to have become the starting point for all AI tool interactions. Whether it's LLMs or various text-to-image, text-to-video tools, they all start with a huge dialog box and a blank canvas. The industry defaults to Chat UI as the starting point for AI interaction, and this design habit has even carried over to Agents.

Before Deep Research, the Chat interaction method was valid. But with Agents, the shortcomings of this interaction are fully exposed.

The core of an Agent is not to answer questions, but to execute tasks. To maintain a task flow, the most important thing is not to describe it in "human language," but to use various graphical forms like tags, icons, charts, and wireframes to express the states, processes, permissions, tool call results, and external system operation results within the task flow.

For example, in a conversation, any real-time state change requires a large block of text to describe. Compared to the lightweight expression of a state icon's "yes or no," text is heavy and cumbersome, and key information is easily lost in a vast sea of words. A simple conversation only flattens a complex system into a hard-to-receive mudslide of information.

What an Agent does is essentially advance a goal within a constantly changing task space, but Chat UI forces it into a question-and-answer, round-after-round dialogue fragment. Thus, system states are segmented, task flows are interrupted, and users can only repeatedly read the conversation history to confirm what the system is doing.

Another more subtle problem is the current over-reliance on "persona templates" in Agent design.

Many so-called Agent products embed a role setting in the system prompt, even packaging the entire Agent as a fixed persona. On the surface, this is to optimize the model's output stability, making it appear more like an "expert" on a specific task.

But the problem is that many tasks are composite. Take my experience with vibe coding as an example. The creation of an application involves multiple stages like interaction design, visual design, and programming, each handled by a professional role. Designers don't care if the code is elegant, and engineers don't care about the degree of rounded corners. Forcing a single role template like this only "dumbs down" the Agent when handling complex tasks, forcing the engineer to approve the design mockup.

It is precisely because individual capabilities are limited that the Industrial Revolution was so great. It allowed people with different skills to be organically integrated within an organization.

But AI is different. It is a tool trained on the entire society's network data, inherently possessing all skills. The function of the Persona mechanism is to forcibly compress a global model into a single role perspective, forcing it to reason from only one identity within the entire task space.

When facing simple problems, like writing a report, this constraint is efficient because it reduces the search space and increases convergence speed. However, in complex tasks, its side effects are very obvious. The model cannot switch perspectives between different sub-problems and can only solve everything with the same "expert identity."

You can think of AI as an expert team, and the Agent as a production line configured for that team. The value of the production line lies in enabling different capabilities to collaborate in the correct sequence, not forcing every workstation to be completed by the same role.

There are other criticisms I won't write due to space and patience constraints. In summary, the biggest problem with current Agents is that they pull AI back into the familiar industrial era, packaging it as an "assistant," shaping it into an "expert," or even treating it like a "top university intern" to be controlled like a "person."

Ask yourself, which intern knows everything from astronomy to geography, can do industry analysis, and then write code for 24 hours straight?

AI Native should not treat AI as a "smarter employee." Instead, it should be seen as a systemic capability that transcends the individual.

If previous software were tools designed around the boundaries of human capabilities, a truly AI Native Agent inherently possesses the ability to advance layer by layer towards a goal, achieving cross-role collaboration and dynamic execution within the Agent itself.

OPC (One Person Company) is the wrong definition. The Agent itself is a company.

Let's talk