MCP Creators Talk About the Origins, Architectural Advantages, and Future of MCP

CN
6 hours ago

An article worth reading about MCP.

Author: FounderPark

The MCP protocol released by Anthropic last year has suddenly become the hottest protocol in the AI field this year due to the surge of Manus and Agent. Major companies like OpenAI, Microsoft, and Google have all supported the protocol, and domestic companies like Alibaba Cloud and Tencent Cloud have quickly followed suit by launching platforms for rapid deployment.

However, there is also considerable controversy, with many questioning the differences between MCP and API, the lack of expertise of Anthropic's engineers in internet protocols, and the security issues arising from the protocol's simplicity, among other concerns.

It is most appropriate to have the inventors of the MCP protocol address these questions.

In a recent podcast by Latent Space, they invited the inventors of the MCP protocol from the Anthropic team—Justin Spahr-Summers and David Soria Parra—to discuss the origins of MCP and their various thoughts on it: why MCP was launched, how MCP differs from existing APIs, and how to better utilize tools with MCP, among other topics. The information is substantial, and it is recommended to save and read.

Guest Introductions:

  • Alessio Fanelli (Host): Partner and CTO of Decibel

  • swyx (Host): Founder of Small AI

  • David Soria Parra: Engineer at Anthropic

  • Justin Spahr-Summers: Engineer at Anthropic

TLDR:

  • The "flash of inspiration" for the MCP concept came from an internal project at Anthropic called LSP (Language Server Protocol). The two engineers were inspired by LSP to consider whether they could create something similar to standardize the "communication between AI applications and extensions."

  • The core design principle of MCP is that the concept of tools is not just about the tools themselves but is closely related to client applications and, in turn, closely connected to users. Through MCP operations, users should have complete control. Tools are controlled by the model, meaning they are called solely by the model rather than being actively specified by the user (except in cases for prompting purposes).

  • Open APIs and MCP are not mutually exclusive but are very complementary. The key is to choose the tool that best fits the specific task. If the goal is to achieve rich interactions between AI applications, MCP is more suitable; if the aim is for models to easily read and interpret API specifications, open APIs would be a better choice.

  • For the rapid construction of MCP servers, using AI-assisted coding is a very good approach. In the early stages of development, placing snippets of MCP SDK code into the LLM's context window and letting the LLM help build the server often yields good results, with details further optimized later. This is a good way to quickly implement basic functions and iterate. At the same time, Anthropic's MCP team places great emphasis on simplifying the server construction process to facilitate LLM participation.

  • The future development direction of AI applications, ecosystems, and agents will lean towards statefulness, which is also one of the most controversial topics within Anthropic's core MCP team. After multiple discussions and iterations, the conclusion reached is that while there is optimism about the future of statefulness, it is essential not to deviate from existing paradigms and to find a balance between the concept of statefulness and the complexity of practical operations.

Founder Park is building a developer community and invites developers and entrepreneurs who actively try and test new models and technologies to join. Please scan the code to fill out your product/project information in detail. After review, staff will add you to the group!

Once in the group, you have the opportunity to receive:

  • High-density mainstream model development exchanges (such as DeepSeek, etc.);

  • Resource connections and opportunities for direct communication and feedback with API, cloud vendors, and model vendors;

  • Useful and interesting products/cases, with Founder Park actively promoting them.

01 How was MCP born?

swyx (Host): First, what is MCP?

Justin: The Model Context Protocol, abbreviated as MCP, is essentially a design we created to help AI applications expand themselves or integrate plugin ecosystems. Specifically, MCP provides a set of communication protocols that allow AI applications (which we call "clients") and various external extensions (which we call "MCP servers") to collaborate with each other. Here, "extensions" can be plugins, tools, or other resources.

The purpose of MCP is to enable everyone to easily incorporate external services, functionalities, or retrieve more data when building AI applications, allowing the applications to have richer capabilities. The naming includes the concept of "client-server" primarily to emphasize the interaction model, but essentially it is a universal interface for "making AI applications more extensible."

However, it is important to emphasize that MCP focuses on AI applications rather than the models themselves, which is a common misunderstanding. Additionally, we liken MCP to the USB-C interface for AI applications, as it serves as a universal interface connecting the entire ecosystem.

swyx (Host): The characteristics of clients and servers imply that it is bidirectional, much like a USB-C interface, which is interesting. Many people are trying to conduct related research and build open-source projects. I feel that Anthropic is more proactive than other labs in attracting developers. I'm curious if this is influenced by external factors or if you two came up with it in a room together?

David: In fact, most of it was just the two of us having a lightbulb moment in a room. This wasn't part of a grand strategy. In July 2024, shortly after I joined Anthropic, I was mainly responsible for internal developer tools. During that time, I was thinking about how to get more employees to deeply integrate existing models, as these models are excellent and have great potential, so naturally, I wanted everyone to use our models more.

In my work, based on my background in development tools, I quickly became a bit frustrated because Claude Desktop had limited functionality and could not be extended, while the IDE lacked the practical features of Claude Desktop, making it cumbersome to copy content back and forth between the two. Over time, I realized this was an MxN problem, which is the challenge of multiple applications integrating with various integrations, and it was just right to solve it with a protocol. At that time, I was also working on an internal project related to LSP (Language Server Protocol) that wasn't making much progress. After pondering these ideas for a few weeks, I had the thought of building some kind of protocol: Could we create something similar to LSP to standardize the "communication between AI applications and extensions"?

So, I approached Justin, shared this idea, and fortunately, he was very interested, and we began working on it together.

From the moment the idea was conceived, it took about a month and a half to build the protocol and complete the first integration. Justin did a lot of the work in the initial integration with Claude Desktop, while I did many proofs of concept in the IDE to demonstrate the application of the protocol within the IDE. Before the official release, one could find many details by looking at the relevant code repositories; this is roughly the origin story of MCP.

Alessio (Host): What was the timeline like? I know November 25 was the official release date. When did you start working on this project?

Justin: Around July, shortly after David proposed the idea, I was excited to start building MCP with him. The first few months were slow because there was a lot of foundational work to establish the communication protocol that included clients, servers, and SDKs. But once things could communicate through the protocol, it became exciting, as we could build all sorts of wonderful applications.

Later, we held an internal hackathon where some colleagues used MCP to code a server that could control a 3D printer, as well as extensions that implemented "memory functions." These prototypes were very well received, which made us believe that this idea had great potential.

swyx (Host): Back to building MCP, what we see is just the final result, which is clearly inspired by LSP, and you both acknowledge this. I want to ask how much work went into the construction? Was the process mainly about writing a lot of code, or was there a lot of design work? I feel like design work played a significant role, such as choosing JSON-RPC; how closely did you follow LSP? What parts were particularly challenging?

Justin: We drew a lot of inspiration from LSP. David has extensive experience with LSP in development tools, while I primarily worked on product or infrastructure, so LSP was new to me.

From a design principle perspective, LSP addressed the M x N Problem that David mentioned. Previously, different IDEs, editors, and programming languages operated independently; you couldn't use JetBrains' excellent Java support in Vim, nor could you use Vim's excellent C language support in JetBrains. LSP created a common language for all parties to "communicate," unifying the protocol so that "editor-language" pairs only needed to implement it once. Our goal is similar, but the scenario has shifted to the connection between "AI applications and extensions."

In terms of specific details, after adopting JSON-RPC and the concept of bidirectional communication, we went in different directions. LSP focuses on functional presentation, thinking through and providing different basic elements rather than semantic principles, which we also applied to MCP. Afterward, we spent a lot of time thinking about each basic element in MCP and the reasons for their differences, which involved a significant amount of design work. Initially, we wanted to support three languages: TypeScript, Python, and Rust for Zed integration, building an SDK that included clients and servers to create an internal experimental ecosystem and stabilize the local MCP concept (involving launching subprocesses, etc.).

We have taken into account various criticisms of LSP and tried to improve upon them in MCP. For example, some practices of LSP on JSON-RPC are too complex, so we have implemented more straightforward approaches. When building MCP, we chose to innovate in specific areas while borrowing from established patterns in others, such as the choice of JSON-RPC being less important, while focusing on innovations in basic elements, which has been very helpful for us.

swyx (Host): I am interested in protocol design, and there is a lot to unpack here. You have already mentioned the M x N Problem, which anyone working on developer tools has encountered, also known as the "Universal Box" problem.

The fundamental issue and solution in infrastructure engineering is to connect many things to N different entities, creating a "Universal Box." Companies like Uber, GraphQL, Temporal (where I worked), and React all face this kind of problem. I'm curious if you solved the N times N problem while at Facebook?

David: To some extent, yes. This is a great example. I have dealt with many such issues in version control systems and similar areas. It involves consolidating problems into something that everyone can read and write, building a "Universal Box" to solve them. Such problems are ubiquitous in the developer tools field.

swyx (Host): Interestingly, those who build "Universal Boxes" face the same issues, such as composability, remote versus local problems, etc. The functional presentation issue mentioned by Justin involves similar underlying concepts that need to be clearly defined to present them differently.

02 The Core Concepts of MCP: Tools, Resources, and Prompts are Indispensable

swyx (Host): When I looked at the MCP documentation, I had this question: why should these two things be distinguished? Many people treat tool calls as a universal solution, but in reality, different types of tool calls have different meanings; sometimes they are resources, sometimes they perform operations, and sometimes they serve other purposes. I want to understand which concepts you categorize as similar and why you emphasize their importance.

Justin: We think about each basic concept from the perspective of application developers. When developing applications, whether it's an IDE, Claude Desktop, or Agent interface, it becomes much clearer from the user's perspective what functionalities they want to obtain from the integration, and at the same time, tool calls are necessary, requiring differentiation of different functionalities.

Thus, the initial core basic concepts of MCP have since been expanded:

  • Tool: This is the core. It directly adds tools to the model, allowing the model to decide when to call them. For application developers, this is akin to a "function call," but initiated by the model.

  • Resource: This essentially refers to data or background information that can be added to the model context and controlled by the application. For example, the model might automatically search for and find relevant resources to incorporate into the context; or there might be a clear user interface function set in the application that allows users to make selections through dropdown menus or paperclip-style menus, making it part of the information sent to the LLM. These are all application scenarios for resources.

  • Prompt: Specifically designed to be initiated or replaced by the user as text or messages. For instance, in an editor environment, it is like a slash command or similar to an autocomplete feature, such as a macro that you want to insert directly.

Through MCP, we have our own insights into the different presentations of these contents, but ultimately it is up to application developers to decide. For application developers, having these concepts expressed in different ways is very useful, as it allows them to determine the appropriate experience and create differentiation. From the perspective of application developers, they do not want their applications to be uniform; when connecting to an open integration ecosystem, unique approaches are needed to create the best experience.

I think there are two aspects: the first is that currently, tool calls account for over 95% of integrations, and I hope more clients will utilize resource calls and prompt calls. The first implementation is the prompt function, which is very practical and can build a traceable MCP server, driven by user interactions where the user decides when to import information, which is better than waiting for the model to process. At the same time, I hope more MCP servers will use prompts to demonstrate tool usage.

On the other hand, the resource aspect also has great potential. Imagine an MCP server that publicly documents databases and other resources, with clients building a complete index around these. Because resource content is rich and not driven by the model, you may have far more resource content than what is actually available in the context window. I look forward to applications better utilizing these basic concepts in the coming months to create richer experiences.

Alessio (Host): With a hammer, people tend to treat everything as a nail, using tool calls to solve all problems. For example, many people use it for database queries instead of resource calls. I'm curious about the pros and cons of using tools and resources when there are API interfaces (like databases). When should tools be used for SQL queries? When should resources be used to process data?

Justin: The way we distinguish between tools and resources is that tools are called by the model, which determines the appropriate tool to apply. If you want the LLM to run SQL queries, it makes sense to set it as a tool.

Resource usage is more flexible, but currently, many clients do not support it, making the situation complex. Ideally, for database table schemas and similar content, resource calls could be used. Users can inform the application of relevant information to initiate a dialogue or allow AI applications to automatically search for resources. As long as there is a need to list entities and read them, modeling it as a resource is reasonable. Resources are uniquely identified by URIs and can be seen as universal converters, for example, using an MCP server to interpret user-input URIs. In the case of the Zed editor, it has a prompt library that interacts with the MCP server to fill prompts, and both parties need to agree on the URI and data format, which is a cool cross-example of resource application.

Returning to the perspective of application developers, consider the needs and apply this thinking in practice. For instance, look at existing application functionalities; if this approach were adopted, which functionalities could be separated and implemented by the MCP server? Essentially, any IDE with an attachment menu can naturally be modeled as a resource. However, these implementation methods already exist.

swyx (Host): Yes, when I see the @ symbol in Claude Desktop, I immediately think of how it functions similarly to Cursor, and now other users can also utilize this feature. This design goal is great because the functionality already exists, and people can easily understand and use it. I showed that chart, and you must agree with its value; I think it is very helpful and should be placed on the homepage of the documentation. This is a great suggestion.

Justin: Would you be willing to submit a PR (Pull Request) for that? We really like this suggestion.

swyx (Host): Sure, I will submit it.

As a developer relations person, I have always been committed to providing clear guidance, such as first listing key points and then spending two hours explaining in detail. Therefore, using a chart to cover the core content is very helpful. I really appreciate your emphasis on prompts. In the early development of ChatGPT and Claude, many people tried to create prompt libraries and prompt manager libraries similar to those on GitHub, but none of them really became popular.

Indeed, more innovation is needed in this field. People expect prompts to be dynamic, and you have provided that possibility. I strongly agree with your mention of the multi-step prompt concept, which indicates that sometimes, to get the model to function properly, a multi-step prompting approach or breaking through some limitations is necessary. Prompts are not just single dialogue inputs; sometimes they are a series of dialogue processes.

swyx (Host): I think this is precisely where the concepts of resources and tools converge, because you now mention that sometimes a certain degree of user control or application control is needed, while at other times, you want the model to control. So, are we now just selecting a subset of tools?

David: Yes, I think that is a reasonable concern. Ultimately, this is a core design principle of MCP, which is that the concept of tools is not just about the tools themselves; it is closely related to client applications and, in turn, closely connected to users. Through MCP operations, users should have complete control. We say tools are controlled by the model, meaning they are called solely by the model and not actively specified by the user to use a particular tool (of course, this excludes cases for prompting purposes, but this should not be a regular user interface function).

However, I believe it is entirely reasonable for client applications or users to decide to filter and optimize the content provided by the MCP server. For example, client applications can obtain tool descriptions from the MCP server and optimize their display. Under the MCP paradigm, client applications should have complete control. Additionally, we have a preliminary idea: to add functionality in the protocol that allows server developers to logically group these basic elements of prompts, resources, and tools. These groups can be seen as different MCP servers, which users can then combine according to their needs.

03 MCP and OpenAPI: Competition or Complementarity?

swyx (Host): I want to discuss the comparison between MCP and OpenAPI, as this is clearly one of the issues that everyone is very concerned about.

Justin/David: Fundamentally, the OpenAPI specification is a very powerful tool, and I often use it when developing APIs and their clients. However, for the application scenarios of large language models (LLMs), the OpenAPI specification seems too detailed; it does not adequately reflect higher-level, AI-specific concepts, such as the basic concepts of MCP we just mentioned and the mindset of application developers. Compared to merely providing a REST API for the model to freely operate, the model can gain more benefits from tools, resources, prompts, and other basic concepts specifically designed for it.

On the other hand, when designing the MCP protocol, we intentionally made it stateful. This is because AI applications and interactions are inherently more inclined towards statefulness. While statelessness always has its place to some extent, statefulness is becoming increasingly popular with the rise of interaction modes (such as video, audio, etc.), making stateful protocols particularly useful.

In fact, Open API and MCP are not mutually exclusive; they complement each other. Each has its strengths and is very complementary. I think the key is to choose the tool that best fits the specific task. If the goal is to achieve rich interactions between AI applications, then MCP is more suitable; if the aim is for the model to easily read and interpret API specifications, then Open API would be a better choice. There have already been early efforts to bridge the two, with some tools capable of converting Open API specifications into MCP format for publication, and vice versa, which is great.

Alessio (Host): I co-hosted a hackathon at AGI Studio. As a personal agent developer, I saw someone build a personal agent that could generate an MCP server: just by inputting the URL of the API specification, it could generate the corresponding MCP server. What do you think of this phenomenon? Does it mean that most MCP servers are merely adding a layer on top of existing APIs without much unique design? Will this continue in the future, primarily relying on AI to interface with existing APIs, or will there be entirely new and unprecedented MCP experiences?

Justin/David: I think both scenarios will exist. On one hand, the need to "bring data into applications through connectors" is always valuable. Although currently, tool calls are more commonly used by default, in the future, other basic concepts may be more suitable for addressing such issues. Even if it remains a connector or adapter layer, adapting different concepts can still add value.

On the other hand, there are indeed opportunities to create interesting application scenarios that build MCP servers that do more than just act as adapters. For example, a memory MCP server could allow the LLM to remember information across different conversations; a sequential thinking MCP server could enhance the model's reasoning capabilities. These servers are not integrated with external systems but provide the model with entirely new ways of thinking.

Regardless, using AI to build servers is entirely feasible. Even if the functionalities to be implemented are not about adapting other APIs but are original, the model can usually find ways to achieve them. Indeed, many MCP servers will be API wrappers, which is both reasonable and effective, helping you make significant progress. However, we are still in the exploratory phase, continuously investigating the possibilities that can be realized.

As client support for these basic concepts continues to improve, rich experiences will emerge. For instance, an MCP server that can "summarize Reddit section content" has not yet been built, but the protocol itself can fully support it. I believe that when people's needs shift from "I just want to connect what I care about to the LLM" to "I want a real workflow, a truly richer experience where I hope the model can interact deeply," you will see these innovative applications emerge. However, there is indeed a "chicken or egg" problem between the capabilities supported by clients and the functionalities that server developers want to implement.

04 How to Quickly Build an MCP Server: Programming with AI

Alessio (Host): I feel that there is another aspect of MCP that people discuss relatively less, which is server construction. What advice do you have for developers who want to start building MCP servers? As a server developer, how do you find the best balance between providing detailed descriptions (for the model to understand) and directly obtaining raw data (leaving it for the model to process automatically later)?

Justin/David: I have some suggestions. One advantage of MCP is that building some simple functionalities is very easy; it can be set up in about half an hour. Although it may not be perfect, it is sufficient to meet basic needs. The best way to get started is to choose your preferred programming language and use the corresponding SDK if available; build a tool you want the model to interact with; set up the MCP server; add this tool to the server; write a simple description of the tool; connect it to your favorite application through standard input-output protocols; and then observe how the model can use it.

For developers, being able to quickly see the model acting on what they care about is very appealing, which can ignite their enthusiasm and prompt them to think deeply about what other tools, resources, and prompts are needed, as well as how to evaluate effectiveness and optimize prompts. This is a process that can be explored continuously, but starting with simple tasks and seeing how the model interacts with what you care about is itself a lot of fun. MCP adds an element of enjoyment to development, allowing the model to quickly take action.

I also tend to leverage AI to assist in coding. In the early stages of development, we found that we could place snippets of MCP SDK code into the LLM's context window, allowing the LLM to help build the server, and the results were often quite good, with details further optimized later. This is a good way to quickly implement basic functionalities and iterate. From the beginning, we focused on simplifying the server construction process to facilitate LLM participation. Over the past few years, starting an MCP server might only require 100 to 200 lines of code, which is indeed very simple. If there is no ready-made SDK, you can also provide relevant specifications or other SDKs to the model, allowing it to help you build some functionalities. Making tool calls in your preferred language is usually also very straightforward.

Alessio (Host): I find that server builders largely determine the final format and content of the returned data. For example, in the case of tool calls like Google Maps, which attributes are returned is determined by the builder. If a certain attribute is missing, the user cannot override or modify it. This is similar to my dissatisfaction with some SDKs: when people build SDKs that wrap APIs, if they miss a newly added parameter in the API, I cannot use those new features. What do you think about this issue? How much intervention power should users have, or should it be entirely up to the server designer?

Justin/David: Regarding the Google Maps example, we may bear some responsibility since it is a reference server we published. Generally speaking, at least for now, we intentionally designed the results of tool calls not to necessarily be structured JSON data, nor do they need to match a specific pattern, but rather to be presented in the form of messages like text or images that can be directly input into the LLM. In other words, we tend to return a large amount of data and trust that the LLM can sift through it and extract the information it cares about. We have made significant efforts in this regard, aiming to allow the model to flexibly obtain the information it needs, as that is its strength. Our focus is on how to fully leverage the potential of the LLM rather than overly restricting or specifying it, thus avoiding becoming difficult to scale as the model improves. Therefore, in the example server, the ideal state is that all result types can be passed directly from the called API without alteration, with data automatically passed by the API.

Alessio (Host): It is indeed a difficult decision to draw that line.

David: Here, I may need to emphasize the role of AI a bit. Many example servers are written by Claude, which is not surprising. Currently, people often tend to use traditional software engineering methods to address problems, but in reality, we need to relearn how to build systems for LLMs and trust their capabilities. With significant advancements in LLMs every year, it is wise to delegate data processing tasks to models that excel in this area. This means we may need to let go of the traditional software engineering practices we have relied on for the past two or three decades, or even forty years.

From another perspective on MCP, the speed of AI development is astonishing, both exciting and somewhat concerning. For the next wave of capabilities in models, the biggest bottleneck may lie in their ability to interact with the external world, such as reading external data sources and taking stateful actions. While working at Anthropic, we placed great importance on safe interactions and implemented corresponding controls and calibration measures. As AI develops, people will expect models to possess these capabilities, and connecting models to the outside world is key to enhancing AI productivity. MCP is indeed a bet on our vision for future development and its importance.

Alessio (Host): That's right; I feel that any API attribute with the word "formatted" should be removed. We should obtain raw data from all interfaces. Why does it need to be pre-formatted? The model is certainly smart enough to format information like addresses on its own. So this part should be left to the end user to decide.

05 How to Enable MCP to Call More Tools Better?

swyx (Host): I also want to ask a question: how many related functionalities can an MCP implementation support? This involves the issue of breadth and depth, and is directly related to what we just discussed about MCP nesting.

When Claude launched its first million-token context example in April 2024, it stated that it could support 250 tools, but in many practical cases, the model cannot effectively use that many tools. In a sense, this is a breadth issue, as there is no situation where tools call other tools; there is only the model and a flat layer of tools, which can easily lead to tool confusion. When the functionalities of tools are similar, the model may call the wrong tool, resulting in unsatisfactory outcomes. What recommendations do you have for the maximum number of MCP servers that can be enabled at any given time?

Justin: To be honest, there is no absolute answer to this question. It partly depends on the model you are using and whether the naming and descriptions of the tools are clear enough for the model to understand accurately and avoid confusion. The ideal state is to provide all information to the LLM and let it handle everything, which is also the envisioned future blueprint of MCP. However, in practical applications, client applications (i.e., AI applications) may need to do some supplementary work, such as filtering the toolset or using a small and fast LLM to first filter out the most relevant tools before passing them to the larger model. Additionally, some MCP servers can be set up as proxies for other MCP servers to perform filtering.

At least for Claude, supporting hundreds of tools is relatively safe. However, the situation for other models is still unclear. Over time, things should improve, so we need to be cautious about limitations to avoid hindering this development. The number of tools that can be supported largely depends on the degree of overlap in descriptions. If the functionalities of the servers are distinct, and the tool names and descriptions are clear and unique, then the number of tools that can be supported may exceed situations where similar functionality servers exist (for example, connecting both GitLab and GitHub servers simultaneously).

Additionally, this is also related to the types of AI applications. When building highly intelligent applications, you might reduce the number of questions posed to users and the configurability of the interface; however, when constructing programs like IDEs or chat applications, it is entirely reasonable to allow users to choose their desired set of functionalities at different times rather than having all features enabled at all times.

swyx (Host): Finally, let's focus on the Sequential Thinking MCP Server. It has branching capabilities and can provide "more writing space," which is very interesting. Additionally, Anthropic released a new engineering blog last week introducing their Thinking Tool, and there has been some confusion in the community about whether there is overlap between the Sequential Thinking Server and this Thinking Tool. In fact, this is just different teams doing similar things in different ways, as there are many ways to achieve the same goal.

Justin/David: As far as I know, the Sequential Thinking Server does not have a direct common origin with Anthropic's Thinking Tool. However, it does reflect a general phenomenon: there are many different strategies to enable LLMs to think more comprehensively, reduce hallucinations, or achieve other goals, allowing for a more comprehensive and reliable presentation of effects from multiple dimensions. This is precisely the strength of MCP—you can build different servers or set up different products or tools within the same server to achieve diverse functionalities, allowing the LLM to apply specific thinking patterns to obtain different results.

So, there is no ideal, prescribed way for LLMs to think.

swyx (Host): I believe different applications will have different uses, and MCP allows you to achieve this diversity, right?

Justin/David: Exactly. I think some of the approaches taken by MCP servers precisely fill the gaps in the model's capabilities at the time. Training, preparing, and researching models takes a lot of time to gradually enhance their capabilities. Take the Sequential Thinking Server, for example; it may seem simple, but it is not, and it can be set up in just a few days. However, implementing such complex thinking functions directly within the model is not something that can be accomplished in just a few days.

For instance, if the model I am using is not very reliable, or if someone feels that the results generated by the current model are generally not reliable enough, I can envision building an MCP server that allows the model to attempt to generate three results for a query and then select the best one from those. With MCP, this recursive and composable LLM interaction can be achieved.

06 What is the Difference Between Complex MCP and Agent?

Alessio (Host): Next, I want to ask about the concept of composability. What do you think about the idea of introducing one MCP into another? Are there any related plans for this? For example, if I want to build an MCP for summarizing Reddit section content, this might require calling an MCP corresponding to the Reddit API and another MCP that provides summarization functionality. How should I construct such a "super MCP"?

Justin/David: This is a very interesting topic that can be viewed from two perspectives.

On one hand, consider building components like summarization functionality. While it may call the LLM, we hope it can remain agnostic to specific models. This involves the bidirectional communication capabilities of MCP. For example, Cursor manages the interaction loop with the LLM. Server developers can request the client (i.e., the application where the user is) to perform certain tasks through Cursor, such as having the client summarize using the model currently selected by the user and returning the results. In this way, the choice of summarization model depends on Cursor, and developers do not need to introduce additional SDKs or API keys on the server side, achieving a model-agnostic construction.

On the other hand, it is entirely possible to build more complex systems using MCP. You can envision an MCP server that supports services like Cursor or Windsurf, while this server itself also acts as an MCP client, calling other MCP servers to create richer experiences. This reflects a recursive characteristic, which is also evident in aspects like authorization in specifications. You can link these applications that are both servers and clients together, and even use MCP servers to construct a Directed Acyclic Graph (DAG) to achieve complex interaction processes. Intelligent MCP servers can even leverage the capabilities of the entire MCP server ecosystem. Relevant experiments have already been conducted in this regard. If we also consider features like automatic selection and installation, there are many possibilities that can be realized.

Currently, our SDK still needs to add more details so that developers can more easily build applications that are both clients and recursive MCP servers, or more conveniently reuse the behaviors of multiple MCP servers. These are areas for future improvement, but they already showcase some applications that are feasible but have not yet been widely adopted.

swyx (Host): This sounds very exciting, and I believe many people will gain a lot of ideas and inspiration from it. So, can this MCP that is both a server and a client be considered a type of Agent? To some extent, an Agent is when you make a request, and it performs some underlying operations that you may not fully understand. There is an abstraction layer between you and the ultimate source of raw data. What unique insights do you have about Agents?

Justin/David: I believe that it is indeed possible to build an Agent through MCP. Here, it is important to distinguish between an MCP server that merely acts as an Agent plus a client and a true Agent. For example, within an MCP server, you can enrich the experience using a sample loop provided by the client and allow the model to call tools, thus constructing a true Agent, which is a relatively straightforward approach.

In terms of the relationship between MCP and Agents, we have several different lines of thought:

First, MCP may be a good way to express the capabilities of an Agent, but perhaps it currently lacks some features or functionalities that could enhance user interaction experiences, which should be considered for inclusion in the MCP protocol.

Second, MCP can serve as a foundational communication layer for building Agents or allowing different Agents to combine with each other. Of course, there are other possibilities, such as believing that MCP should focus more on integration at the AI application level rather than overly concentrating on the concept of Agents themselves. This remains a topic under exploration, with each direction having its trade-offs. Returning to the earlier analogy of a "universal box," one point we need to be particularly careful about when designing protocols and managing ecosystems is to avoid overly complex functionalities that attempt to encompass everything, as this may lead to poor performance in various aspects. The key question is to what extent Agents can naturally integrate into existing models and paradigms, or to what extent they should exist as independent entities, which is still an unresolved issue.

swyx (Host): I believe that when achieving bidirectional communication, allowing the client and server to merge and delegate work to other MCP servers, it becomes more like an Agent. I appreciate that you always keep the importance of simplicity in mind and do not try to solve every problem.

07 Next Steps for MCP: How to Make the Protocol More Reliable?

swyx (Host): Recent updates regarding the transition from stateful servers to stateless servers have sparked interest. You chose Server-Sent Events (SSE) as the publishing protocol and transport method, and support for pluggable transport layers. What is the reasoning behind this? Was it influenced by Jared Palmer's tweet, or was it already in preparation?

Justin/David: No, we discussed the challenges related to statefulness and statelessness publicly on GitHub months ago and have been weighing our options. We believe that the future development direction of AI applications, ecosystems, and Agents leans towards statefulness. This is one of the most controversial topics within the core MCP team, having undergone multiple discussions and iterations. The final conclusion is that while we are optimistic about the future of statefulness, we cannot deviate from existing paradigms; we must find a balance between the concept of statefulness and the complexity of practical operations. Because if we require MCP servers to maintain long-term persistent connections, the difficulty of deployment and operation would be very high. The initial design of SSE transmission is based on the idea that once you deploy an MCP server, clients can connect and maintain an almost indefinite connection, which is a high demand for anyone needing to operate at scale; it is not an ideal deployment or operational model.

Therefore, we are considering how to balance the importance of statefulness with the ease of operational maintenance. The streaming HTTP transport method we introduced, including SSE, is designed to be incremental. The server can be a regular HTTP server that retrieves results via HTTP POST requests. Then, functionalities can be gradually enhanced, such as supporting streaming results and even allowing the server to actively request from the client. As long as both the server and client support Session Resumption, it is possible to achieve convenient scalability while accommodating stateful interactions and better handling unstable network conditions.

Alessio (Host): Yes, it also includes session IDs. What are your plans for future authentication? Currently, for some MCPs, I only need to paste my API key in the command line. What do you think the future direction will be? Will there be something like a dedicated configuration file for MCP to manage authentication information?

Justin/David: In the next draft revision of the protocol, we have already included authentication specifications. The current focus is on user-to-server authorization, using OAuth 2.1 or its modern subset. This approach works well, and everyone is building on it. It addresses many issues because you certainly don't want users to paste API keys carelessly, especially considering that most servers in the future will be remote servers that require secure authorization between them.

In a local environment, since the authorization information is defined at the transport layer, it means that data frame encapsulation (setting request headers) is required, and standard input/output (stdin/stdout) cannot be implemented directly. However, when running programs that use standard input/output locally, the operation is very flexible, and you can even open a browser to handle the authorization process. Regarding whether to use HTTP for authorization locally, we have not fully determined this internally; Justin tends to support it, while I personally disagree, which is a point of contention.

For authorization design, I believe, like other aspects of the protocol, we strive to keep it quite streamlined, addressing actual pain points, simplifying functionality first, and then gradually expanding based on actual needs and pain points to avoid over-design. Designing protocols requires great caution because once a mistake is made, it is basically irretrievable, as it could undermine backward compatibility. Therefore, we only accept or add content that has been thoroughly considered and validated, allowing the community to experiment temporarily through an extension mechanism until there is broader consensus indicating that certain functionalities should indeed be added to the core protocol, and that we have the capability to provide ongoing support in the future, making it easier and more robust.

Taking authorization and API keys as an example, we have brainstormed extensively. The current authorization method (OAuth 2.1 subset) is already capable of meeting the use cases for API keys. An MCP server can act as an OAuth authorization server and add related functionalities, but if you access its "/authorize" webpage, it might just provide a text box for you to enter your API key. While this may not be the most ideal way, it does conform to existing patterns and is feasible at present. We are concerned that if too many other options are added, both the client and server will need to consider and implement more scenarios, which would instead increase complexity.

Alessio (Host): Have you considered the concept of scopes? Yesterday, we had a show with the creator of Agent.ai, Dharmesh Shah. He gave an example about email: he owns all his emails and hopes to have more granular scopes control, such as "you can only access these types of emails" or "only access emails sent to this person." Nowadays, most scopes are typically designed based on REST APIs, meaning which specific endpoints you can access. Do you think future models might understand and utilize scopes layers to dynamically limit the data being transmitted?

Justin/David: We recognize the potential demand for scopes and have discussed it, but adding it to the protocol requires great caution. Our standard is to first identify actual problems that the current implementation cannot solve, then prototype based on the scalability of the MCP structure, and only consider formally incorporating it into the protocol after demonstrating that it can provide a good user experience. The situation with authentication is different; it is more top-down designed.

Every time we hear descriptions of scopes, we find them reasonable, but we need specific end-to-end user cases to clarify the shortcomings of the current implementation before we can further discuss it. Considering the design philosophy of composability and logical grouping, we generally recommend designing MCP servers to be relatively small, with many different functionalities best implemented by independent, discrete servers, and then combined at the application layer. There are also opposing views that do not support a single server taking on authorization tasks for multiple different services, arguing that these services should correspond to their own independent servers, which are then combined at the application level.

08 Security Issues in MCP Server Distribution

Alessio (Host): I think one outstanding design of MCP is its programming language agnosticism. As far as I know, Anthropic does not have an official Ruby SDK, nor does OpenAI. Although developers like Alex Rudall have excelled in building these toolkits, with MCP, we no longer need to adapt SDKs for various programming languages; we just need to create a standard interface recognized by Anthropic, which is fantastic.

swyx (Host): Regarding the MCP Registry, there are currently five or six different registries, and the official registry that was initially announced has ceased operations. The service model of registries, such as providing download counts, likes, ratings, and trust mechanisms, easily evokes traditional software package repositories (like npm or PyPI), but it makes me feel a bit unreliable. Because even with social proof, the next update could expose a previously trusted package to security threats. This kind of abuse of the trust system feels like establishing a trust system that is instead harmed by the trust system itself. Therefore, I prefer to encourage people to use MCP Inspector, as it only needs to monitor communication traffic, and many security issues might be discovered and resolved this way. What are your thoughts on the security issues and supply chain risks of registries?

Justin/David: That's right; you are completely correct. This is indeed a typical supply chain security issue that all registries may face. There are different solutions in the industry for this problem. For example, a model similar to the Apple App Store can be adopted, where software undergoes strict reviews, and automated systems and manual review teams are established to complete this work. This is indeed one way to address such issues and is feasible in certain specific scenarios. However, I believe this model may not be suitable in open-source ecosystems, as open-source ecosystems typically adopt decentralized or community-driven approaches like MCP registries, npm package managers, and PyPI (Python Package Index).

swyx (Host): These repositories essentially face the issue of supply chain attacks. Some core servers that have already been released in the official codebase, especially special servers like memory servers and reasoning/thinking servers, seem to not just simply encapsulate existing APIs but may be more convenient to use than directly operating the APIs.

Taking the memory server as an example, while there are some startups focused on memory functionality in the market, using this MCP memory server requires only about 200 lines of code, making it very simple. Of course, if more complex extensions are needed, more mature solutions may be required. But if you just want to quickly introduce memory functionality, it provides a very good implementation, possibly eliminating the need to rely on those companies' products. Do you have any special stories to share about these non-API encapsulated special servers?

Justin/David: Actually, there aren't many special stories. Many of these servers originated from the hackathons we mentioned earlier. At that time, people were very interested in the idea of MCP, and some engineers within Anthropic who wanted to implement memory functionality or try related concepts could quickly build prototypes that were previously difficult to achieve using MCP. You no longer need to be an end-to-end expert in a specific field or have specific resources or private codebases to add functionalities like memory to your applications or services. Many servers were born this way. At the same time, we also considered how broad a range of functional possibilities to showcase at the time of release.

swyx (Host): I completely agree. I think this has contributed to the success of your release to some extent, providing rich examples for people to directly copy and paste and build upon. I also want to highlight the file system MCP server, which provides the functionality to edit files. I remember Eric showcasing his excellent bench project in a previous podcast, and the community was very interested in the open-source file editing tool within it. There are some related libraries and solutions on the market that regard this file editing capability as core intellectual property, while you directly open-sourced this functionality, which is really cool.

Justin/David: The file system server is one of my personal favorite features. It addressed a practical limitation I encountered at the time; I had an amateur game project and was very eager to associate it with cloud services and the "artifacts" David mentioned earlier. Being able to interact with cloud services and local machines is very significant, and I really like this feature.

This is a typical example; the birth of this server stemmed from the frustrations we encountered during the creation of MCP and the demand for such functionality. From encountering problems to developing MCP and this server, there is a clear and direct evolutionary thread, which Justin feels particularly strongly about. Therefore, it holds a special place in our hearts and can be seen as a spiritual origin point of this protocol.

09 MCP is Now a Large Project Involving Multiple Companies

swyx (Host): The discussion about MCP is very lively. If people want to participate in these debates and discussions, what channels should they use? Is it directly on the discussion page of the specification's code repository?

Justin/David: It is relatively easy to express opinions on the internet, but putting them into practice requires effort. Jason and I are both traditional supporters of open-source principles, and we believe that actual contributions are crucial in open-source projects. If you demonstrate your results through actual work with specific examples and invest effort into the extension functionalities you want in the software development kit (SDK), your ideas are more likely to be adopted by the project. If you only stay at the level of expressing opinions, your voice may be overlooked. We certainly value various discussions, but considering the limited time and energy, we will prioritize those who have invested more actual work.

The discussions and notifications related to MCP are quite extensive, and we need to find a more scalable architecture to interact with the community to ensure that the discussions are valuable and effective. Running a successful open-source project sometimes requires making difficult decisions that may not please everyone. As maintainers and managers of the project, it is essential to clarify the project's actual vision and steadfastly move in the established direction, even if some people disagree, because there will always be projects that may better align with their philosophies.

Taking MCP as an example, it is just one of many solutions addressing issues in the general domain. If you do not agree with the direction chosen by the core maintainers, the advantage of open source is that you have more choices; you can choose to "fork" the project. We do expect to receive feedback from the community and strive to make the feedback mechanism more scalable, but at the same time, we will also make intuitive decisions that we believe are correct. This may spark a lot of controversy in open-source discussions, but sometimes this is the essence of such open-source projects, especially in rapidly evolving fields.

swyx (Host): Fortunately, you seem to be no strangers to making tough decisions. Facebook's open-source projects provide a lot of experience to draw from; even without direct involvement, one can understand the practices of the participants. I have been deeply involved in the React ecosystem and previously formed a working group where the discussion process was open. Every member of the working group had a voice, and they were all people with actual work and significant contributions, which was very helpful for a time. Regarding GraphQL, its development trajectory and early popularity are somewhat similar to the current MCP. I experienced the development process of GraphQL, which Facebook eventually donated to an open-source foundation.

This raises a question: Should MCP do the same? This question is not a simple "yes" or "no," as there are trade-offs involved. Currently, most people are satisfied with Anthropic's work on MCP, after all, you created and manage it. But as the project grows to a certain scale, it may encounter bottlenecks and realize that it is a company-led project. People will eventually expect that true open standards should be driven by non-profit organizations, with multiple stakeholders and good governance processes, such as those managed by the Linux Foundation or the Apache Foundation. I know it may be too early to discuss this issue now, but I would like to hear your thoughts on it.

Justin/David: Governance in the open-source realm is indeed an interesting and complex issue. On one hand, we are fully committed to making MCP an open standard, open protocol, and open project, welcoming all interested parties to participate. Progress is going well; for example, many ideas for streaming HTTP have come from different companies like Shopify, and this cross-company collaboration has been very effective. However, we are indeed concerned about official standardization, especially through traditional standardization bodies or related processes, as these processes may significantly slow down project development in a rapidly evolving field like AI. Therefore, we need to find a balance: how to address any concerns or issues they may have regarding governance models while maintaining the active participation and contributions of existing parties, and finding the right future direction without undergoing repeated organizational changes.

We sincerely hope that MCP is a truly open project. Although it was initiated by Anthropic, and both David and I work at Anthropic, we do not want it to be seen merely as "Anthropic's protocol." We hope that various AI labs and companies can participate or utilize it. This is very challenging and requires balancing the interests of all parties to avoid falling into the trap of "committee decision-making leading to project stagnation." There are various successful management models in the open-source field, and I believe most of the subtleties revolve around corporate sponsorship and the influence of corporations in the decision-making process. We will properly address these related issues, and we absolutely hope that MCP ultimately becomes a true community project.

In fact, many non-Anthropic employees already have commit and management rights for the MCP code. For example, the Pydantic team has commit rights for the Python SDK; companies like Block have made significant contributions to the specifications; SDKs for languages like Java, C#, and Kotlin are handled by different companies such as Microsoft, JetBrains, and Spring AI, and these teams have full management rights. So, if you look closely, it is actually already a large project involving multiple companies, with many people contributing, not just the two of us having commit rights and related privileges over the project.

Alessio (Host): Do you have any special "wish list" items for future MCP servers or clients? Are there any features you particularly hope people will build that have not yet been realized?

Justin/David: I would like to see more clients that support sampling. I also hope someone can build specific servers, such as a server that can summarize Reddit discussion threads or a server that retrieves last week's dynamics from EVE Online. I especially hope the former (sampling client) can be model-agnostic—not that I don't want to use models other than Claude (since Claude is currently the best), but I purely hope for a client framework that supports sampling.

More broadly, it would be great to have more clients that support the complete MCP specifications. We considered the possibility of gradual adoption during the design phase, and it would be fantastic if these well-designed foundational concepts could be widely applied. Reflecting on my initial motivation for participating in MCP and my excitement about the file system server—

In my spare time, I am a game developer, so I would love to see an MCP client or server integrated with the Godot engine (which I used to develop games). This would make it very easy to integrate AI into games or allow Claude to run and test my games. For example, let Claude play Pokémon games. There is already a foundation for realizing this idea. Further, starting now, how about letting Claude use Blender to build 3D models for you?

swyx (Host): To be honest, even things like shader code could theoretically be implemented. This really goes beyond my area of expertise. But when you provide developers with support and tools, the things they can accomplish are truly amazing. We are preparing a "Claude plays Pokémon" hackathon with David Hersh. I initially did not plan to incorporate MCP into it, but now it seems worth considering.

免责声明:本文章仅代表作者个人观点,不代表本平台的立场和观点。本文章仅供信息分享,不构成对任何人的任何投资建议。用户与作者之间的任何争议,与本平台无关。如网页中刊载的文章或图片涉及侵权,请提供相关的权利证明和身份证明发送邮件到support@aicoin.com,本平台相关工作人员将会进行核查。

OKX:注册返20%
链接:https://www.okx.com/zh-hans/join/aicoin20
Ad
Share To
APP

X

Telegram

Facebook

Reddit

CopyLink