Building an "omnipotent and omnipresent" AI, why does Baidu start with the "operating system"?

CN
13 hours ago

Smart and capable super productivity.

Author: Cool Geek

Large models can summarize what has happened in China over the past five thousand years, yet they cannot tell you what time it is now; they can explain what quantum mechanics is, but they struggle to create a professional-level PPT with rich graphics and text.

Why do large models seem omnipotent but always fall short in practical use?

The reason is simple: being smart and knowledgeable does not equate to being capable of getting work done.

Being smart requires large models to train and learn through vast amounts of knowledge, developing a sophisticated brain that can answer questions well;

However, to satisfy both the smart and capable elements, this intelligent brain needs to be paired with flexible limbs to achieve "deep thinking + deep delivery."

Thus, how to drive large models from smart thinking to achieving the evolution of "smart and capable" has become the decisive factor in whether this wave of large models is a fleeting moment or a historical change.

Baidu has provided a prototype.

On April 25, at the Create 2025 Baidu AI Developer Conference, Baidu founder Robin Li introduced the world's first operating system in the content domain, jointly launched by Baidu Wenku and Baidu Wangpan—Cangzhou OS.

By fully integrating the underlying technologies, capabilities, and data accumulated by Baidu Wenku and Wangpan, it can flow like water across different scenarios, achieving low barriers and high-quality end-to-end delivery in the most reasonable form and the most convenient user interface.

With Cangzhou OS, Baidu Wenku and Wangpan's vision and expectations for AI are to achieve true one-stop, end-to-end delivery anytime, anywhere, on any terminal device, allowing AI to be "omnipotent and ubiquitous."

01

Cangzhou OS: Leading AI to an Operating System-Level Evolution

There is a consensus in the tech industry that any technology must undergo a long Gartner curve journey from the lab to truly reach households.

In this curve, the initial phase of growth mainly depends on the market's enthusiastic expectations driven by technological advancements. However, as the practical effects of the technology fall short, this phase of growth quickly enters a decline until the conditions required for the technology's implementation gradually mature and are embodied as almost zero-barrier, omnipotent, and ubiquitous infrastructure, leading to the second phase of ecological explosion.

One of the hallmarks of the second phase of the software industry is typically the emergence of a mature operating system, such as Windows for the computer industry and iOS for the mobile phone industry.

So how do we define a mature operating system? About 15 years ago, there was a debate in the global tech industry: why are Apple or smartphones fundamentally different species from past feature phones, even though they can all operate via touch screens, have large screens, make calls, take photos, listen to music, and send texts?

One of the core reasons is that iOS inherited the kernel-level stability and multitasking capabilities from MAC OS and turned it into an open ecosystem, allowing developers to freely integrate these underlying capabilities from Apple to create their own innovative applications. This transformed the definition of a mobile phone from being the domain of a few giants like Motorola and Nokia to a vast industry with infinite possibilities involving the entire ecosystem, thus opening the door to over a decade of mobile internet development.

Technology will continue to advance, but the narrative of business stories tends to repeat itself in similar rhythms. The underlying logic validated in mobile OS is still applicable in the construction of OS in the era of large models.

In summary, there are three key elements: complete underlying capabilities, flexible central scheduling, and a thriving application service ecosystem. This corresponds perfectly to the three-layer architecture of Cangzhou OS: foundational infrastructure, central system, and application services. The only difference is that the bridge between applications and the central and foundational layers has shifted from traditional APIs to a more standardized, low-barrier MCP.

Among them, the foundational infrastructure of the MCP Server part has a core component called Chatfile plus, which primarily serves to decompose and analyze content of different modalities, forms, and formats at the element level through a knowledge framework, as well as a series of tool framework components for multimodal understanding, multimodal retrieval, and file transcoding analysis.

At the same time, Baidu Wenku and Wangpan have built three major repositories: a public knowledge base, a private knowledge base, and a memory base. The public knowledge base refers to the public knowledge data accumulated by Baidu Wenku over the years, the private knowledge base refers to the knowledge data authorized for use by users from Wangpan, and the memory base refers to the commands, usage habits, and historical records generated by users in Wenku or Wangpan.

These data often present in different modalities, forms, and formats. The public knowledge base provides general knowledge, while the private knowledge base and memory base store personalized user data.

In the knowledge framework, Cangzhou OS will vectorize and label the multimodal content in the "three major repositories," transforming unstructured data such as images, text, video, audio, and documents into multidimensional vector data that computers can understand, represented as a set of tokens.

On the central system side, Baidu Wenku and Wangpan have independently developed "three major tools," which include an integrated editor (for editing documents, PPTs, etc.), a reader (for reading documents and PPTs), and a player (for audio and video playback).

Additionally, Cangzhou OS can utilize a "scheduling hub" to combine interactive components, intent models, and transmission infrastructure, along with user memory and profile data, to understand user intent through models and efficiently allocate and schedule agents.

At the top level, there is a series of AI agents. "Cangzhou OS" integrates hundreds of AI agents for Wenku and Wangpan, including PPTs, AI storybooks, AI mind maps, AI posters, AI notes, AI scanning, and AI listening notes, covering various modalities such as images, text, video, and audio, and comprehensively addressing scenarios in learning, work, and entertainment. It leverages the editing, modification, and fine-tuning capabilities of the integrated editor to enhance the quality of retrieval and content generation, making it more aligned with actual personalized task requirements.

02

On Cangzhou OS,

Creating More "Smart and Capable" Agents

Focusing on the top-level application services, Baidu Wenku & Baidu Wangpan have launched hundreds of AI agents that have been validated by hundreds of millions of users, while also integrating a large number of third-party professional agents to expand the application ecosystem.

As a "one-stop AI content acquisition and creation platform," Baidu Wenku has over 40 million paid users, with AI monthly active users reaching 97 million. Baidu Wangpan has also upgraded to a "one-stop content service platform," serving over 1 billion users, with a total storage space exceeding 100 billion GB and AI monthly active users exceeding 80 million. Baidu Wenku and Baidu Wangpan have become true "super productivity" tools in the era of large models.

At the conference, Baidu Wenku and Wangpan also showcased new capabilities developed based on "Cangzhou OS": "GenFlow Super Partner" and "AI Notes."

GenFlow Super Partner is a multi-agent collaboration capability launched by the Baidu Wenku app. With the support of "Cangzhou OS," content generation can achieve parallel multitasking and can complete various task deliveries based on the most comprehensive professional online information, as well as the user's own habits and preferences.

For example, if a user wants to plan a wedding but only provides a simple input: "I want to hold an outdoor wedding in Hainan during the May Day holiday, help me create a plan and invitation."

The request seems simple, as it could be completed by filling in a historical template. However, to satisfy the user, it is necessary to understand the user's aesthetic preferences, budget expectations, and process preferences, as well as to know the weather, crowd flow, and venue distribution in Hainan during the May Day holiday. After that, it is also necessary to combine these graphics and knowledge using PPT tools to generate a complete plan, and finally, based on the plan and the user's aesthetic preferences, create a complete wedding invitation poster.

To accomplish the above content, it requires scheduling the user's historical chat records, historical browsing records, intent recognition, comprehensive online search, and PPT tools, analyzing user intent, understanding user preferences, and freely combining tools, ultimately providing the user with a very specific complete plan that includes processes, dates, venues, budgets, themes, execution details, styles, and personnel arrangements.

At the same time, the planning document and the poster need to complement each other, which requires all information to remain consistent and to be output in parallel using the same operating system.

Of course, AI cannot generate results that satisfy everyone in one go, which means that both the wedding plan and the poster need to have editable capabilities, supported by the integrated editor capabilities of "Cangzhou OS."

It is not difficult to see that from deep thinking to deep delivery, GenFlow Super Partner is almost the only truly "multi-agent collaboration" product available on the market. It not only addresses the common issues of high costs, long generation times, low efficiency, unstable delivery, and inability to fine-tune through multiple rounds of dialogue in multi-agent collaboration products but is also directly embedded into mature products combined with user-authorized private data, giving AI a real opportunity to achieve the goal of being "omnipotent and ubiquitous."

Baidu Wangpan's AI Notes is a powerful tool for countless office workers and those preparing for exams.

AI Notes is the industry's first multimodal AI note-taking tool, capable of embedding various study videos and note pages stored in Baidu Wangpan into the same interface, achieving smooth interactivity, with video content and notes being strongly correlated. From watching videos to generating AI notes, summarizing AI mind maps, and finally generating AI questions to test learning outcomes, it covers the entire learning cycle for users.

For example, the difficulty of English exams for graduate school has recently become a hot topic, and users want to focus on reviewing for the English exam. AI Notes will first search the relevant materials stored in the user's Wangpan, while also querying publicly available materials to organize the exam points. However, the process does not stop there; AI Notes will also validate the generated exam points against past exam questions, ensuring that only validated points can be used to generate mind maps and predict exam questions, helping users accelerate their learning progress.

Throughout this process, the tools involved are no less than those needed for wedding planning. For instance, finding exam points and past questions requires comprehensive online search capabilities, while past questions often come in PDF or image formats, and expert analyses are presented in video form, necessitating the ability to analyze multimodal content. The final generation of mind maps and exam predictions requires the reasoning capabilities of large models, multimodal content generation capabilities, and the ability to map and relate different content, while also ensuring the absolute accuracy of the generated content.

Behind this is the empowerment of "Cangzhou OS."

Of course, Baidu supports developers to fully embrace MCP, so Cangzhou OS not only serves Baidu's internal ecosystem, but the most important aspect of the operating system's growth is its openness to the outside world, stimulating the innovative capabilities of a wide range of developers.

Therefore, to maximize the value of the ecosystem and applications, Baidu Wenku and Baidu Wangpan have taken the lead in fully utilizing MCP in linking products and ecosystems based on "Cangzhou OS," constructing a three-layer system of MCP Server-Client-Host. This allows the capabilities of Wenku and Wangpan to be opened up through the MCP Server format, and through the MCP Client SDK, it facilitates access for more enterprise users, developers, and agent applications as MCP Hosts.

Among them, the most representative case is Samsung phones. Samsung phones are integrating multiple MCP servers for file uploading, downloading, retrieval, sharing, and content understanding from Baidu Wenku and Wangpan.

On one hand, users can directly achieve functions such as file upload backup to the cloud, cloud sharing, document summarization, and content Q&A through voice commands in the phone's voice assistant interface.

On the other hand, these servers can enrich the cloud storage capabilities of Samsung's phone system, solving the problem of the phone's difficulty in batch backing up and sharing large files or multiple files.

For example, a user can invoke the voice assistant in the phone's photo gallery and say, "Backup the photos taken yesterday at Aosen to Baidu Wangpan, and send Xiao Ming's photo to him." The relevant photos will be uploaded to the user's authorized Wangpan account and a sharing link will be generated. The phone assistant will then use the contact list to send this link via SMS to the other party's phone. By clicking the link, the user can directly access Baidu Wangpan to view or transfer the files.

Undoubtedly, the reliability of the OS's underlying capabilities is not determined by the accumulation of tools or the number of high-tech features. The usability, maturity, and richness of the top-level application service ecosystem are the best standards for testing OS capabilities.

03

The story of the OS has no endpoint

In the capital market, the type of enterprise most recognized by investors is called "friends of time."

The so-called friends of time refer to when a company does one thing right, it just needs to keep going, and then its performance will maintain a perpetual growth, allowing ecosystem developers to continue benefiting.

An operating system is a typical example of a perpetual market. As long as the markets for computers and phones still exist, the story of operating systems belonging to Microsoft, Apple, and Google will have no endpoint.

The same applies to large models. When "deep thinking + deep delivery + public and private data + MCP ecosystem" come together, and in the future become an all-powerful, omnipresent AI of the new era, then a new explosion of species similar to the Cambrian explosion will continuously emerge.

In this process, looking down, it is the openness of Baidu Wenku, Baidu Wangpan, and others to their own capabilities. By actively embracing the ecosystem, they become creators of new species of large models and formulators of new rules.

Looking up, countless new agents are created and recognized based on "Cangzhou OS," forming a magnificent and surging new application service ecosystem.

And at this moment, all stories have just begun.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

Gate:注册解锁$6666
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink