From "China's AI Shining Star" to lightning layoffs and fleeing to Singapore, a mere 130-day frenzy has made Manus the first star startup in the Agent era to successfully land ashore—securing a $75 million Series B funding from Benchmark.
Almost simultaneously, the blade from Silicon Valley has already been unsheathed.
On July 18th, OpenAI launched ChatGPT Agent, sounding the horn for major model vendors to enter the AI Agent arena: with a user's simple command like "plan a wedding itinerary," the Agent automatically calls a browser to compare hotel prices, generates a travel guide with maps, recommends matching attire, and delivers a task that traditionally took hours in just 25 minutes. It deeply integrates Operator's graphical operations with Deep Research's long-chain reasoning, completing the "think-execute" loop in a virtual computer, turning large language models into "digital labor" capable of searching, clicking buttons, filling forms, and writing code.
This seems to signal that the AI Agent startup arena is beginning to enter its second half, with capital entering and model vendors reaping the harvest. Successful startups that have broken through are starting to be held for the highest bidder, while companies that failed to gain attention are swept into the corner.
And all of this was actually foreseen at the beginning of the year.
The "Cultivation Game" of the Agent Era: Giants Cast the Net, Startups Walk In
Those who have been following the AI Agent arena closely might still remember that earlier this year, Sam Altman and Sundar Pichai successively proclaimed the slogan "Year of the Agent." This was a meticulously designed hunt—giants create the concept, startups sprint forward, capital enters to rapidly inflate valuations.
Now this game has finally reached its final step: giants harvest the market, capital cashes out profits.
The Giants' Open Scheme
Since ChatGPT ignited the AI arena at the beginning of 2024, this playbook has been repeating.
Whether it's open-sourcing models to create the "Artificial Intelligence Era," the launch of GPTs driving the rise of prompt engineering and the digital human arena, or Sora's release opening up a brand-new text-to-video track. All such moves are standard operations by giants using their industry influence to gradually shape the AI ecosystem—just like Chinese giants in the mobile internet era, constantly creating new business models, luring startups to enter, experiment, and shape the ecosystem, and finally harvesting the market through advantages in capital and user scale, feeding their own businesses, and gaining competitive advantages against other giants.
In the AI era, this logic still applies.
The Life-or-Death Sprint for Startups
For most AI Agent startups, the so-called "Year of the Agent" is actually a pseudo-trend driven by external narratives.
Major model vendors tout "general agents will change everything" while releasing model capabilities via APIs. But for entrepreneurs, this means they must piece together a "decent-looking Agent" without real demand validation, with chaotic product structures, and under high model costs.
This forces startups' pace into an extreme state from the very beginning: either explode in popularity short-term, or exit rapidly. Manus, Kto, AI Pin, Rabbit R1—each project seemed to have been hyped, and each exposed core problems very quickly—no general Agent product has truly crossed the line from "Demo to daily use."
Often, so-called "user retention" and "high-frequency use cases" are not real but data illusions created through community operations and early incentive mechanisms. The current AI user ecosystem has zero loyalty; users go with whoever is cheaper or has more accurate outputs. Once growth slows, or a trendier competitor appears, or models iterate, user behavior collapses immediately.
Capital Enters the Game
Capital's logic differs from product logic. They are not betting on whether the market can truly produce an Agent product, but under the drive of traffic thinking, on who can become "the project that first steps onto the stage in the next wave of AI," thereby cashing out valuations before the giants harvest.
Manus is the lucky one that placed a bet during this window: its product isn't stronger than other Agents in the industry, and even has significant flaws in functionality and usability. But it caught a rare combination window—the patriotic narrative and giants' Agents not yet being launched. Domestically, timing it after the DeepSeek moment, the frenzy for China's self-developed AI, ignited product traffic through scarcity marketing and patriotic narratives from community influencers, and globally, following the "Year of the Agent" slogan to launch a product first, garnering worldwide attention.
From the start, Manus's product packaging, launch format, and overall tone had a strong Silicon Valley flavor—this wasn't a product for users, but promotion for American investors.
And what capital cares about is the exit path, not the product path. For them, Manus, which seized the first-mover position in Agent and captured global traffic, was the most premium investment target. Its eventual bet by Benchmark and choice to flee to Singapore is a story of "successful exit." Even if the project later fails, it can be packaged as "the cost of market education."
Capital clearly understands that the essence of the general Agent track is an extension of the battlefield of giants' model capabilities. This track will inevitably be covered by giants; their real goal is to dig the first bucket of gold and cash out profits as soon as possible.
The Real Competition in AI is Competition of Technical Routes: Visual Sandbox, Data Pipelines, and Ecosystem Integration
In the practical implementation of Agents, three major technical routes have gradually taken shape: OpenAI bets on the "visual sandbox," Anthropic delves deep into "data semantic pipelines," and Google leverages ecosystem integration to build application closed loops. What they represent is not just differences in interaction forms, but different understandings of the path to implementing general intelligence.
OpenAI's Visual Interaction: The General Execution Entity Evolving in a Sandbox
- General Operation Model Based on Web DOM
OpenAI builds visual Agents through the Operator module, simulating user actions like clicking, inputting, and scrolling in a browser environment. Its advantage lies in not needing custom adaptation, being able to widely adapt to consumer-level web tasks like price comparison, ticket purchasing, and itinerary organization.
- Capability Boundaries Limited by the Sandbox Environment
To ensure safety and controllability, this mode is strictly confined to running in a virtual environment. While avoiding system-level permission issues, it also means difficulty accessing local files, corporate intranets, and desktop-level tools, limiting its application depth in professional scenarios. Simultaneously, high computational cost and task latency remain currently unsolved problems.
Anthropic's Data Route: Long-Context Understanding and Vertical Application Deep Cultivation
- Complex Document Analysis Driven by Context Window
Claude, from versions 3.7 to 4.0, leveraging a context window as high as 200K and stable semantic retention capabilities, demonstrates advantages in handling structured and unstructured data in vertical industries like programming, healthcare, law, and pharmaceuticals. For example, in pharmaceutical patent analysis, it can achieve high-accuracy document extraction and summarization.
- Emphasis on Both Compliance and Controllability
Anthropic builds a safety baseline for high-compliance industries through "Constitutional AI" and preset rules, earning it a good reputation among financial and pharmaceutical clients. This path emphasizes "precision and safety first," representing a relatively stable force in current enterprise application implementation.
- MCP Leverages External Ecosystem The landing point of MCP is to construct a new paradigm of "model as the main controller, software as the called unit." For Anthropic, this is an attempt at paradigm restructuring no less significant than the plugin system. Claude is no longer just an add-on intelligent assistant for some SaaS, but is attempting to become the main operating system for enterprise software stacks. This approach naturally aligns with its directions in data compliance, task precision, and industry deep cultivation, forming a closed loop.
Google's Ecosystem Integration Strategy: Toolchain Micro-Integration and Cost Balancing
- Native Embedded Agent Experience
Compared to simulating user behavior, Google chooses to directly embed Agent functionality into tools like Workspace, forming "micro-Agent" chains. For example, extracting email content in Gmail to generate schedules, or linking Sheets with BigQuery for data analysis. This method improves stability and efficiency, suitable for enterprise users' daily workflows.
- Leveraging Cloud Ecosystem for Cost Offensives
Google, with its massive cloud infrastructure, can offer Agent access capabilities at highly competitive prices (like Gemini Flash's low-cost token pricing), creating significant pressure on small and medium-sized entrepreneurs at the market level.
Behind the Route Divergence Lies Different Perceptions of the "Agent" Essence
- OpenAI views Agent as an extension of the "general execution model," focusing on how the model actively operates interfaces to complete tasks, adapting to current interaction methods by making the model imitate human behavior;
- Anthropic believes the core lies in data and semantic understanding, emphasizing stable capabilities in structured and compliance scenarios, attempting to connect data interactions between applications with the MCP open protocol, making the model the interaction hub at the data level;
- Google places more value on the practical integration of "AI + toolchain," first making Agent a part of familiar tools, then gradually enhancing intelligence, driving its own product ecosystem with models.
The three directions are not ranked superior or inferior; they reflect the natural extension of their respective technical foundations and product philosophies. In the short term, this technical heterogeneity will continue to coexist, but ultimately, whoever first achieves the closed loop of "generalization capability" and "application necessity" will be the true marker of the Agent landscape settling.
And Manus took OpenAI's visual route, using KV-cache and file system compression to reduce costs and improve hit rates, essentially still engineering optimization wrapped around AI. But isn't this exactly what OpenAI can do and is doing?
The Fatal Flaws of General Agent Startups and Windsurf's Crazy 72 Hours
The Manus case exposes the fatal weaknesses of general Agent startups:
- Hollow Technology
Manus founder Ji Yichao admitted in a retrospective blog: "Choosing context engineering over self-developed large models," motivated by his first failed startup—his self-developed model was made obsolete by the release of GPT-3. Manus actually uses KV cache to optimize costs, simulating "infinite memory" through a file system. Most general Agent startups use similar engineering tricks, reorganizing data to improve output efficiency. While this lightweight engineering approach can reduce costs, it heavily relies on underlying large models, unable to build its own moat, remaining vulnerable before giants.
- Cost Imbalance
Single-task costs reached $2 (5 times OpenAI's), computational consumption exceeding industry average by 500%, and this cost structure directly translates into degraded user experience. Manus adopted a credit-based consumption model: Pro membership costs $199 per month, ideally allowing only about 15 to 20 uses per day. Simple tasks don't need it, complex tasks are slow to respond; with price compounded by response speed, the cost of using a general Agent is extremely high compared to directly using the model.
- No Moat
General Agent services heavily depend on underlying large models (like Claude, GPT, etc.) for inference capabilities and update pace; computing power and capabilities are entirely entrusted to the model API providers, unable to optimize independently from their evolution roadmap. This dependency determines that Agents cannot form differentiated advantages through underlying technology accumulation. On the other hand, they also cannot accumulate high-value, transferable user data assets. User task records, semantic chains, even behavioral paths on an Agent are not fundamentally different from using any general large model platform; data cannot form exclusive understanding models or preference profiles, cannot train proprietary systems, making it difficult to build compound-interest data assets.
In stark contrast to general Agents like Manus is the counter-trend growth of a batch of vertical domain Agents:
- Hippocratic AI builds a closed loop in medical dialogue and screening. According to official information, this year they increased colorectal cancer FIT screening participation rates to 2.6 times that of English-speaking patients through multi-channel interventions (including Spanish-speaking patients). Their clinical assistance accuracy improvement is equally significant, rising from about 80% in early versions to 99.38% in the latest version, with severe misdiagnosis rates dropping to 0. All this benefits from the continuous accumulation of structured case data and feedback execution processes.
- PathChat and its upgraded version PathChat+, targeting pathological imaging, published papers in top journals like Nature, showing performance of "diagnostic accuracy rate of 87%" in image Q&A tasks. This is completely different from general visual Agents; its training and evaluation focus on a few high-value scenarios, possessing industry-level reference value.
- Genspark is praised by users in engineering practice for high execution efficiency and excellent cost-effectiveness. Reddit users pointed out that Genspark can complete more tasks than Manus in the same scenarios, with overall lower costs. Comparative estimates suggest Genspark user experience is "insanely fast," far surpassing the higher-priced Manus.
- Claude Code focuses on programming scenarios, supporting context continuous tracking, code debugging, and explanation, highly praised by the developer community.
- Salesforce Agentforce embeds into the CRM ecosystem, naturally awakening AI capabilities in key workflows, seamlessly integrating into tools salespeople use daily.
Simultaneously, unsettling news emerged from Silicon Valley: Windsurf was rapidly dismantled within 72 hours.
OpenAI once offered a $3 billion acquisition invitation, blocked by Microsoft leading to deal breakdown; Google acqui-hired CEO Varun Mohan and other executives for $2.4 billion, also obtaining a non-exclusive technology license; hours later, Cognition (the team behind Devin) acquired the remaining team, IP, and brand, completing Windsurf's strategic restructuring within 72 hours.
This incident reveals a dangerous signal for the industry: acqui-hire has become the mainstream liquidation path in the Agent field; the fusion of capital and technology is eroding startup trust mechanisms. CEOs can jump ship anytime due to "technology transferability," leaving venture capital and employees in a precarious position. This casts a shadow over the general Agent startup track.
For a period in the future, general Agent startups may enter a more intense and turbulent elimination round.
The AI Interaction Revolution: Survival Space Left for Startups
The general Agent and Windsurf's dismemberment illuminate the illusion of the general Agent model track and also shine a light on the real watershed of this track—structure and closed loop.
After this fever, Agent startups will diverge. One part will remain obsessed with "stronger, more intelligent Agents," trying to replicate the human brain's general path, but they will continue to be plagued by computing power, costs, and error rates; another part will return to rationality, seeking answers again from the original concept of Agent.
As early as 1995, in "Intelligent Agents: Theory and Practice," Nicholas R. Jennings and Michael Wooldridge systematically established the concept of AI Agent: Intelligent agents should possess the ability to perceive the environment, act autonomously, pursue goals, and emphasize that agent architecture design must support this decision-execution system closed-loop mode.
I believe these four attributes correspond to the following engineering capabilities:
- Autonomy: Can operate independently, possessing multi-source heterogeneous data fusion and parsing capabilities.
- Reactivity: Perceives and responds to environmental changes, can respond to specific situations, and remains silent otherwise.
- Proactiveness: Takes proactive actions based on goals, possesses cross-system perception and task planning capabilities.
- Social Ability: Capable of collaboration and communication, synchronous and asynchronous coordination with multiple models and tools.
From the application cases of vertical Agents, we can speculate that the future AI Agent is not a "smarter" person, but a "more controllable" structure. It must possess the following capabilities: Structured Input/Output: Define task boundaries well, control Agent capabilities; Structured Behavioral Paths: Embed into workflows, providing stable process feedback; Structured Feedback Data: Continuously iterate training, walking out a closed-loop path.
Past internet interaction methods will be broken. The logic of organizing information around an individual or organization as a pivot, or information distribution centered on fixed processes or algorithms, will transform into a completely decentralized mode led by AI.
The second half of Agent startups is a competition of "structural capabilities." It's not about who understands large models deeper, but about who can use structural tasks to open up feedback closed loops, compressing "model capabilities" into a system that is implementable, controllable, and capable of growth.
Agent startups must build private data moats. Only by deeply integrating into business, accumulating unique data context and behavioral feedback, can they train Agents that others cannot imitate. This data isn't generated by scraping, but accumulated organizational assets exchanged through repeated trust, efficiency, and feedback via structural capabilities over long-term embedding in processes.
The final result of this training is like the difference between a chef's and a firefighter's reaction to seeing fire: the chef will flip the wok, while the firefighter will pull the fire hose.
If general Agent companies hope that through model generalization capabilities, they will one day become a lever to move social production, it's most likely like fishing for the moon in the water. Even AI models themselves cannot exhaustively list all roles in the world; what qualification do Agents built on top of models have to accomplish this grand ambition?
Creating a true Agent is inevitably a long and thorny road. But as the success of Hippocratic AI reveals, by focusing efforts from a vertical niche, redefining the vitality of "small but beautiful" in the cracks of giants' ecosystems, there is a chance to give birth to AI Agents that truly solve problems.