Current Design Flaws of AI Agents

18f5e1f7-eaf8-4a29-ae03-daf161a32e69.png

From Chat UI to AI Native, we may be building "intelligence" in the wrong way.
This article mainly discusses the current issues in AI Agent interaction design, focusing on the following points:

Human in the loop mistakenly pulls users into engineering decisions.

Chat UI compresses system states into a complex mudslide of language.

Role templates compress AI's multi-role system capability into a single-role persona.

Many people feel amazed when first experiencing an AI Agent. It can write code, operate computers, browse your chat history to help you deal with unwanted suitors, or act as your cyber lover. In the words of some unscrupulous self-media, it can even help you "make money," send this and that, increase passive income, achieve early retirement, and head toward financial freedom.

But if you actually use it for a while, you'll find that the sci-fi stories hyped by self-media are almost entirely illusions. You haven't become a successful OPC (One-Person Company) boss, nor have you added a single zero to your account. All that's left is a token bill burning a hole in your pocket and an increasingly fragmented, exhausted attention span.

I'm a heavy user of AI products, spending a lot of time every day talking to AI. So I know well how much water is in these stories that put AI on a pedestal.

For many non-professional engineers (like me) who frequently use Agents for vibe coding, the most realistic experience with Agents is that they are powerful but exhausting to use. They either ask for permission on everything or run wild on their own.


In current mainstream Agent design, "human in the loop" is a core design philosophy. The main idea is to deeply involve users in the Agent's workflow, having them review and correct the Agent's output at key points.

But the problem is that many HITL design scenarios are about programming. The Agent focuses on issues like permissions, responsibilities, and data risks during the coding process. These are precisely the areas that most non-professional users are indifferent to. Users can neither assess the risks nor will they actually stop it. Instead, the Agent tends to go wild on the things users truly care about.

So the final results are often unsatisfactory.

It doesn't know what matters to you or what problems are beyond your understanding. So when faced with most of its "requests," I just blindly approve them. But the results it gives me in the end often make me want to spit blood. Either the interaction experience is flawed, or the UI is a mess with ugly colors—a whole set of OpenAI-style muddy yellow/blue-purple tones, paired with Google's signature giant rounded corners, leaving me oscillating between physical discomfort and a headache.

Because as a product manager, I don't care about and am not capable of caring about database operation risks. I only focus on interaction experience and UI design. Asking me to make decisions about permission risks is like asking a blind person for directions. In other words, many Agent designs are not user-oriented but are programming tools for engineers.


Since the launch of ChatGPT, Chat UI has seemed to become the starting point for all AI tool interactions. Whether it's LLMs or various text-to-image, text-to-video tools, they all start with a huge dialog box and a blank canvas. The industry defaults to Chat UI as the starting point for AI interaction, and this design habit has even carried over to Agents.

Before Deep Research, the Chat interaction method worked, but with Agents, its shortcomings are fully exposed.

The core of an Agent is not to answer questions but to execute tasks. To maintain a task flow, the most important thing is not to describe it in "human language" but to use various graphical forms like tags, icons, charts, and wireframes to express the states, processes, permissions, tool call results, and external system operation results within the task flow.

For example, in a conversation, any real-time change in state requires a large block of text to describe. Compared to the lightweight expression of a state icon's "yes or no," text is heavy and cumbersome, and key information is easily drowned in a vast sea of words. Pure conversation only flattens a complex system into a hard-to-receive mudslide of information.

What an Agent does is essentially advance goals within a constantly changing task space, but Chat UI forces it into a series of question-and-answer, round-by-round conversation fragments. Thus, system states are fragmented, task flows are interrupted, and users can only repeatedly read conversation logs to confirm what's happening in the system.


Another more subtle issue is the over-reliance on "persona templates" in current Agent design.

Many so-called Agent products embed a role setting in the system prompt, even packaging the entire Agent as a fixed persona. On the surface, this approach aims to optimize the model's output stability, making it behave more like an "expert" on specific tasks.

But the problem is that many tasks are composite. Based on my experience with vibe coding, the creation of an application involves multiple stages like interaction design, visual design, and programming, each with professional roles responsible. Designers don't care if the code is elegant, and engineers don't care about the degree of rounded corners. Forcing a fixed role template only makes the Agent "dumber" when handling complex tasks, forcing engineers to review design drafts.

It's precisely because individual capabilities are limited that the Industrial Revolution was so great—it allowed people with different skills to organically integrate within an organization.

But AI is different. It's a tool trained on network data from the entire society, inherently possessing all skills. The function of the Persona mechanism is to forcibly compress a global model into a single-role perspective, allowing it to reason only as one identity within the entire task space.

When dealing with simple problems like writing a report, this constraint is efficient because it reduces the search space and speeds up convergence. But in complex tasks, its side effects are very obvious. The model cannot switch perspectives between different sub-problems and can only solve them with the same "expert identity."

You can think of AI as an expert team, and the Agent as a production line configured for that team. The value of a production line lies in enabling different capabilities to collaborate in the correct order, rather than forcing every workstation to be completed by the same role.


There are more complaints I could make, but limited by space and patience, I'll stop here. Overall, the biggest problem with current Agents is that they pull AI back into the industrial era we're familiar with, packaging it as an "assistant," shaping it into an "expert," or even treating it like a "top university intern" to be controlled like a human.

Ask yourself: which intern knows everything from astronomy to geography, can do industry analysis one moment and write code for 24 hours straight the next?

AI Native should not treat AI as a "smarter employee" but rather as a system capability that transcends the individual.

If previous software was designed around the boundaries of human capabilities, a truly AI Native Agent inherently possesses the ability to advance toward goals layer by layer, completing cross-role collaboration and dynamic execution within the Agent itself.

OPC (One-Person Company) is a wrong definition. The Agent itself is a company.