
A friend wanted a meeting audio transcription tool. She had been using Feishu, but after the subscription fee increased, she found it a bit hard to justify and asked me if there were any good open-source alternatives.
I took a look at the current open-source ASR models and found that the latest FunASR already supports meeting scenarios quite well—speaker diarization, far-field recognition, voice recognition with background noise—these capabilities are no longer issues. So, I quickly brainstormed a usable version for her.
However, what really caught my attention during this brainstorming wasn't just implementing the functionality. Precisely because the "implementation" was so straightforward, I couldn't help but wonder: if this application were stripped of its reliance on ASR, what would remain?
We all know that a product's viability lies in establishing a strong value connection between the application and its users.
What I created for my friend is an application entirely built on model capabilities. If the model is good enough, the application works. What I did was largely just translating my friend's needs into model calls and packaging the results into an interface. This process certainly has value, but it's thin, and the relationship between the application and the user is very fragile.
Take this speech-to-text tool as an example. Converting speech to text is useful, but it's only functional. A valuable meeting should follow these principles:
Meetings should have discussions, discussions should lead to decisions, decisions should lead to actions, actions should yield results, and results should be useful.
Speech-to-text is merely a means. The value of a meeting lies in a chain from discussion to outcome, not just a neatly organized text. So, it can't even be considered a product—it's more like a shell application.
Similarly, many so-called AI products on the market seem to be just model shells. Chatbots are shells for LLMs, voice assistants are shells for speech models, and STT tools are shells for ASR models. They put effort into interaction and UI but fail to define the product's value.
It seems we easily mistake "capability usability" for "product viability."
In the era of traditional software, value was distributed throughout the development and maintenance process. But in the AI era, production has become exceptionally cheap. We must face the reality of value migration and rediscover the connection points between products and users. If we blindly produce old tools or create a bunch of shells, drawing from manufacturing experience, when production capacity is excessive and lacks core barriers, profits quickly shrink to zero.
Painkillers are only effective when there's pain, but they are never the cure for the problem. My friend, as an employee, has the obligation to produce meeting minutes after each meeting. A speech-to-text tool is an efficiency tool for him, but the business value that "meetings" should carry isn't reflected in the application.
For him and his company, meaningless and chaotic meetings haven't decreased, and meeting minutes that equate to "received equals acknowledged" keep piling up. Even the best AI tools can't solve the reality of bullshit jobs, and the messy management system remains a mess.
Therefore, I believe a proper AI product should aim to level the differences in user capabilities. In the past, we created various workflow management office software to provide better recording and information synchronization tools, but this highly depended on employees' professional qualities, management skills, and execution abilities. In other words, for most enterprises and employees lacking these capabilities, office software merely changed handwriting to typing, nothing more.
But AI has the potential to solve this problem. It can internalize high-value business rules into the product. Tasks that previously required humans to connect different tools can now be autonomously linked by AI. Employees with limited capabilities can rely on the product's built-in value system to achieve efficiency leaps, while top-performing employees can free themselves from tedious administrative tasks and focus on higher-value goals.
The true power of AI applications lies in their ability to significantly reduce friction between the product and the user, dissolving the heavy cost of user education within automated logic and empowering everyone with baseline capabilities. This is the value AI products should carry.
Returning to the ASR application I brainstormed for my friend. Simply converting speech to text won't shorten the distance to his work goals. The Feishu subscription fee may hurt, but a free open-source solution won't save him mental energy either.
A truly viable AI product should be the embodiment of "meetings should have discussions, and results should be useful." Only when high-value business rules are internalized as the product's core can we avoid becoming mere transporters for model vendors.
What we should deliver to users is a set of autonomously operating excellent capabilities, not one scattered efficiency tool after another. Only then can we truly find the value connection with users in the AI era and redefine our own value anchors.