State, and long-lived vs. short-lived connections · modelcontextprotocol/modelcontextprotocol · Discussion #102 (original) (raw)
Context
MCP is currently a stateful protocol, with a long-lived connection between client and server. This allows us to support behaviors like:
- Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
- Server-initiated sampling at any point, enabling agentic workflows.
- Passing arbitrary server logging through to the client.
- … more stuff in future? …
The connection is restartable with fairly little recovery cost (it's not catastrophic, like losing data), but the protocol is definitely not designed around repeatedly opening a connection, issuing one semantic request, then closing.
Problem
This is fairly limiting for serverless deployments, which frequently autoscale up and down, and generally aren't designed around long-lived requests (for example, typically there's a max request lifetime measured in minutes).
Deploying to a Platform-as-a-Service is really nice and convenient as a developer, so not being very compatible with this model creates an impediment to broader MCP adoption.
Possible solutions
I can imagine a few different answers here, each with their own tradeoffs:
Option 1: encapsulate state into a state or session token
Any stateful interaction over a long-lived connection could instead be modeled as independent requests (e.g., webhooks) by passing back and forth some sort of token that either:
- Encodes all session state itself, or…
- Merely identifies the stateful session, with server and client having some persistent storage associated with that session token.
Pros:
- This is the simplest, incremental evolution from where MCP is today.
- Implementable just at the transport layer—protocol and application logic can be (mostly) agnostic to this, I think.
Cons:
- Somewhat annoying for servers to implement (and a key design principle of MCP is that servers should be really easy to implement):
- Requires complex state serialization/deserialization. The SDKs could do some of this, but probably not all.
- May require servers to be deployed with persistent storage.
Option 2: offer "stateless" and "stateful" variants of the protocol
Continue supporting all the behaviors I listed up top, but only when used in "stateful" mode. Offer a "stateless" mode that doesn't have those things.
It's possible that some transports could implement this in a fairly gradated way—e.g., HTTP could be stateful if client -> server can use SSE, but gracefully degrade to stateless by just using POSTed webhooks.
Pros:
- Stateless servers could be implemented very simply.
- No additional complexity for stateful servers.
- Backwards compatible.
Cons:
- Adds complexity to the spec and SDKs, and creates bifurcation.
- Plausibly adds complexity to MCP hosts, which may need to connect to both kinds of servers.
Option 3: make all of MCP "stateless"
Make sweeping changes to completely revamp MCP into a fully stateless protocol. Drop all features that require statefulness, like those mentioned up top.
Pros:
- Simplest for spec, SDKs, and client/server/host implementations (where "simple" here means "fewest concerns," not necessarily "easiest" or "best").
Cons:
- Agents and "live" behaviors like subscriptions get significantly more difficult to implement.
- Any server -> client communication would need to use something like web hooks (although this is transport dependent). Importantly, this means that the client would need to be internet-accessible, which isn't currently a requirement.
- Very backwards incompatible. Massive change right after the initial release.
Thoughts?
I'd welcome all of:
- Feedback on whether you think this is a problem worth solving in the first place.
- Feedback or questions about the options listed here.
- Suggestions for other options I didn't think of and list!
You must be logged in to vote
Claude's thoughts:
Based on the MCP documentation and the problem you've outlined, I'll share my analysis of the situation and the proposed solutions:
First, I agree this is a problem worth solving. The ability to deploy MCP servers in serverless environments would significantly lower the barrier to entry and increase adoption, especially for developers already working in cloud environments.
Let's analyze each option:
Option 1 (State/Session Token):
- This feels like the most pragmatic solution while preserving MCP's current capabilities
- The main concern is the implementation complexity for server authors, which goes against MCP's principle of being easy to implement
- However, this could potentially be mitigated by:
- Having the SDKs handle most of the state management complexity
- Providing simple default implementations for common cases
- Making state persistence optional - servers could choose to only support ephemeral sessions
Option 2 (Stateless/Stateful Variants):
- This is an interesting middle ground that provides flexibility
- The graceful degradation approach with transports is particularly compelling
- However, the bifurcation could lead to:
- Confusion about which mode to use
- Inconsistent behavior across different server implementations
- Increased complexity in host applications that need to handle both modes
Option 3 (Fully Stateless):
- While this would simplify the protocol, it seems too restrictive
- The loss of server->client communication would significantly limit MCP's capabilities
- The requirement for clients to be internet-accessible is particularly problematic for desktop applications and IDE integrations
Additional Considerations:
- Hybrid Approach:
- What if we kept the protocol stateful but added a "connection pooling" layer in the transport?
- The transport could maintain a pool of connections and handle reconnection/state recovery transparently
- This could allow serverless deployments while preserving the protocol's capabilities
- Servers could optionally implement more sophisticated state management if needed
- Event-Sourcing Pattern:
- Instead of maintaining direct state, servers could maintain an append-only log of events
- State could be reconstructed from this event log when needed
- This could simplify state management while preserving functionality
- Could be particularly well-suited for serverless environments with managed event stores
My recommendation would be to explore a combination of Options 1 and 4:
- Add session token support at the transport layer
- Implement connection pooling in the SDKs
- Provide simple default implementations for common state management patterns
- Keep it optional - servers can choose to be completely stateless if they don't need the additional capabilities
This would preserve MCP's powerful features while making it more accessible for serverless deployments. What are your thoughts on this hybrid approach?
You must be logged in to vote
1 reply
I don't think "connection pooling" really solves anything. The problem with serverless architectures is that you often don't have good ways to coordinate across instances and do this kind of sticky routing (even if possible, it's annoying).
Event sourcing is a good callout that I didn't think of, though. Main question is what the signal-to-noise ratio would be on those logged events how many of those logged events will be redundant. Another way to think about this: what would be the ideal ratio between "checkpoints" and events in the log? If the checkpoints are very frequent, then just using that as the backing state seems better.
Also, although my post almost entirely focuses on protocol state, we could also imagine servers that have application state, like stateful agents. Those might naturally lend themselves to a long-lived connection anyway, even aside from the stateful features in the protocol.
You must be logged in to vote
0 replies
Speaking from Vercel's perspective option 1 is probably best.
You could consider a hybrid version where
- You do have a session token
- Clients can use it to resume
- Servers can say "I have stateful load-balancing and I wont restart, so I'll just keep state in memory", hence not increasing complexity for such servers
- But even serverful solutions would benefit from full resumability
I do agree that statefulness is hard to avoid in general (even a trivial stream could be interrupted and ideally the client could resume it), so I would not shy away from it as a general feature–just implement it in a way that is easy for distributed systems to achieve.
You must be logged in to vote
0 replies
What are some statefull use cases that you ve seen?
You must be logged in to vote
1 reply
For example, if you want to be notified of remote resources appearing or disappearing, this is effectively state. It could be delivered via web hooks or other means, but that's a question of how to represent it.
Hi everyone, I've been following this discussion with interest. I already have a working prototype solution for tool invocation, based on a stateful client-server connection, as you can see in the attached diagram.
To truly achieve the full potential of serverless and make it usable in all dimensions, we really just need a robust AAA (Authentication, Authorization, Accounting) mechanism. This would allow both the MCP and the server to recognize:
1A - Who (Authentication) - who is the user or client initiating the connection (or continuing it in the Nth subsequent session...).
2A - What (Authorization) - what resources and operations this user/client has access to (it would be nice to have such a mechanism and have 1000 tools, but not always all tools need to be exposed to the client).
3A - How (Accounting) - how are the resources being used? For example, has there been any conversation exchange within a given tool? What was the result? Was there a previous error? Has the last operation completed? etc.
For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).
I'm planning to use an Agent as a Tool (Swiss Army Knife/Multitool). Although, as of today, the solution is simple because, as a user, I'm working in my own space / the Docker container is entirely for me.
So, if you want MCP to be stateless also - that's a great idea... but I only ask for one thing... please don't abandon the stateful concept because I think that with the AAA function, it would be a beautiful solution for communicating with much more complex scenarios.
Option 2: +1 vote ! 😉
You must be logged in to vote
1 reply
For me, maintaining context across multiple invocations is actually quite important... one tool writes, another one executes, and a third one analyzes (and all of this on the server side).
In particular, it could be valuable to maintain context across multiple invocations within a trusted environment (e.g., the server) without requiring sampling calls to a potentially untrusted LM client. This would enable the server to process intermediate states securely (to "privately think") before exposing final results to the client—an extension of AAA that requires statefulness.
As a motivating extension, a naive form of this exists in ad-tech clean rooms, where advertisers and publishers privately combine data for targeting and measurement with agreed-upon egress rules. This allows compute on data with more manageable privacy and competitive risks. [Seems quite likely similar dynamics will appear in MCP applications.]
- Feedback on whether you think this is a problem worth solving in the first place.
- Feedback or questions about the options listed here.
- Suggestions for other options I didn't think of and list!
Some thoughts that come to mind are:
- Is a max request timeout in terms of minutes an actual problem? If so, then i think it would be worth getting more explicit about the specific challenges that introduces. For example, if efficiency/scalability of data transfer of resuming a session is the key problem, then could be worth getting more specific about that with use cases. (But even if the session length is in days, not sure it would change that problem dramatically)
- My assumption is the average duration of an LLM user session is probably also measured in minutes.
- Even a stateful transport session can disconnect due to poor network connectivity. Are resumable stateful sessions also a requirement? I would say probably not? But the point being that clients already need to handle reconnecting for many reasons
- Taking on session state management at a higher level may have other additional follow on problems e.g. garbage collecting old sessions, managing secondary resources associated with a session, leaking connections, etc. The session will need to have some notion of lifetime/TLL/keep alive, etc then at that point are we back to a similar question about the max request timeout.
- My impression is that today you can solve this by building a local MCP server that provides a custom transport to use any protocol. Then another way to think about this problem is improving distribution of local MCP servers. This may be an existing problem to solve anyway.
My take is that it is not clear this is worth solving given the current motivation/problem statement.
You must be logged in to vote
2 replies
A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.
A pretty common use case would be having your MCP server deployed using Kubernetes with multiple pods due to high traffic. The client connects to 1 pod for the SSE connection, but subsequent POST requests get round-robined to any other pod which totally breaks the protocol.
Sure but what are you proposing? I am aware of thats how SSE works today. The sse implementation in the python-sdk already handles this by encoding a session is in the follow up post urls and associates it with the original request. That doesn't need an additional protocol extension as it's already supported with external state if needed.. unless you are arguing for removing state from the protocol.
Progressive enhancement could be an option.
At the base level, keep a very simple mental model: tool call === procedure call === JSON-RPC.
If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL (the presence of which can be negotiated with MCP's capability negotiation model).
i.e. Don't tightly couple JSON-RPC with SSE. JSON-RPC is very simple to implement. Don't sabotage this simplicity by coupling it with SSE.
Also, tool calling should ideally be stateless / idempotent: Tool calls are just procedures that take a bunch of arguments and return some result. Application state should only be part of it by including a chunk of grounding data / conversation context as an argument to a tool call. Tools would remain portable/reusable with that. EDIT: Not so sure about this.
AAA can be an envelope/tunnel around MCP and not overcomplicate MCP itself. JSON-RPC and SSE URLs could be signed URLs generated by off-MCP API requests that establish AAA.
Apologies if all this is too terse / abstract. Just wanted to quickly dump things at the top of my mind.
You must be logged in to vote
6 replies
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
That is where the parameters to notifications.*
come in. Bi-directional messaging over HTTP where the message is not in response to a request is difficult. Both SSE and web sockets provide a pattern that works over an established connection; however, if the connection is severed, then all bets are off. We can add parameters that make reconnecting a disconnected session (stream?) possible. I'm going to call it a session for the time being. Stream might be a better way to think of it though.
I was working on a response to the "tool calling should ideally be stateless" comment that I abandoned. There is a similar problem there since many tools need to identify/cache/stash information that is local to the session. Perhaps, creating a session ID in the protocol that is included in messages. It could be optionally included in the capabilities.notifications.SSE
param of initialize
to resume a disconnected session. The server could create a new session and return the identifier in the response. The same identifier could be passed into tool calling somehow (haven't fleshed out what this would look like yet). The idea is to not necessarily mandate a session storage mechanism but to enable one for clients, servers, and tools that opt-in. Having document level properties would work nicely here.
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"capabilities": {
"notifications": {
"SSE": {},
"websocket": {},
"polling": {
"interval": {number}
}
}
}
}
Riffing on the idea, what if the client was informed that it could "check in" with the client state of some sort. The Server could then inform the client that it needs to make a request. There might be some piggybacking that's possible with the expectation of regular interval pings.
In this world, the client can always send standard requests/notifications as needed (using the allowed notifications) to the server. With polling, it can check in with the server to identify if it wants to request anything. This would only happen during the duration of the session as determined by the client - no different from the expectations for SSE/WS. The server can then inform the client that it wants a status update if it's stale on something it needs.
This might also help as a means for graceful degradation for servers as well. If/when there are issues with connection management.
This could work for notifications and anything directly tied to an issued request, but doesn't provide an easy way for the server to make requests of the client, which is a core feature (e.g., for sampling).
{ "jsonrpc": "2.0", "id": 1, "method": "initialize", "params": { "capabilities": { "notifications": { "SSE": {}, "websocket": {}, "polling": { "interval": {number} } } } }
SSE, polling makes sense for long running operations. I would like to see support for regular synchronous request / response. I would assume many tools would just return the output right away or within meaningfully small interval.
{
"jsonrpc": "2.0",
"id": 1,
"method": "initialize",
"params": {
"capabilities": {
"notifications": {
"SSE": {},
"websocket": {},
"http": {},
"polling": {
"interval": {number}
}
}
}
}
I agree , the stateful should be negotiated , and the client should assume it's not available except if negotiated with the server
So that even on "STDIO" transport, you could just call one shot commands ( grep, jq etc.) without needing to add wrapper around them.
On a side note, currently Librechat provide an option to feed the tools with an openapi-spec https://swagger.io/specification/ and it's a real time saver , suddenly all the hundreds of Saas with an rest api become accessible without any wrapper
Big fan here 👋
Option 2 feels right and is in line with the other conversations around authorization that enable multiple paths depending on the servers capabilities. The trade-off being the additional complexity to the architecture itself. That said, this feels solvable by trying to look at the protocol payloads as separate from the delivery mechanism and let the delivery mechanisms abide to a separate contract layer which itself could be decoupled from clients as SDKs. With that the protocol shouldn't bifurcate - assuming we solve sufficient parity.
You must be logged in to vote
1 reply
What's more is that we should probably decouple the notifications/communication from how to represent state as well. Even for WS/SSE, this will be useful but is a little different to the problem of how do we provide sufficient requesting capabilities for client to server and for server to connected client.
Thank you @jspahrsummers and others for this discussion.
From my perspective (I'm quite new to MCP, so please correct me if I'm wrong), the most interesting would be
Option 2: offer "stateless" and "stateful" variants of the protocol"
Why?
- It keeps the full functionality of the statefull server, for those that need it.
- It opens the door for a "simplified" server (with reduced functionality) to be implemented in a stateless manner.
I think stateless enables easier scaling, federation of MCP servers, "tools discovery", and MCP proxies. If we want to create some "HuggingFace" for MCP tools, this would be "MCP Stateless" would make it easier.
I currently see all the AAA layer as completely independnet from the Statefull / Statless discussion.
I'll be happy to help / contribute if "Stateless MCP" becomes a thing.
Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?
You must be logged in to vote
1 reply
Just out of curiosity, what does the decision process to change / improve the MCP protocol look like? I mean, other than creating a change in the spec and SDK code changes, how are these proposals reviewed and approved / rejected?
That's how. 🙂 We have a core set of maintainers that we are looking to expand over time; they are responsible for reviewing proposals and offering feedback. Changes to the spec specifically require the approval of multiple maintainers.
I'm building a hosting platform for deploying MCPs and SSE makes it hard to scale remote MCPs because we can't use serverless.
I did more research into this, and it seems like there's no way to properly route a connection under a protocol like SSE because all the POST requests are independent (REST is stateless, after all). So if you scale up any server to multiple replicas (even in a non-serverless way using VMs or Kubernetes), it's a pain to figure out which spun-up instance to route the messages to.
Actually, statefulness isn't the issue here - it's SSE. One way to side-step this is via gRPC or WebSockets due to how they retain the connection on subsequent requests (there's a sense of session affinity). Is there a reason why WS or gRPC wasn't chosen as the primary transport and SSE was chosen instead? Just want to fully understand the motivations.
@jspahrsummers
You must be logged in to vote
3 replies
iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.
It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this:
* Use something like redis to route messages to the correct places
* During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances
That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections
iirc the reason we went for SSE over websockets/gRPC is because SSE exists within standard HTTP, and we figured that adopting websockets on average would probably be a bigger lift than supporting SSE for existing web stacks.
It's a good point regarding additional complexity of routing of subsequent requests back to the container/instance that is holding open the SSE connection. This is another complexity/barrier to the deployment of servers. Some off the cuff ways to solve this: * Use something like redis to route messages to the correct places * During the configuration of the SSE transport, the server specifies the endpoint in which it will listen for messages in the session - this endpoint would be used to route back to the correct server instances
That being said, I think stateful/stateless discussion is still relevant - as supporting long lived websockets in a webapp would still necessitate solving all the same issues with stateful/long lived connections
Thanks for clarifying the motivation behind SSE.
I think having both options (WS/SSE) officially documented would be great, since I think for practical purposes WS is more efficient and designed specifically for these long-lived bidirectional connections and avoids requiring server-side complexities.
Plus, I noticed WS is already in the SDKs, so it would be good if that's formalized.
MCP servers can already be seen as a proxy that requires new development. I’m not convinced of the argument that SSE is preferred for simplicity of being native to HTTP.
WS seems far more natural for MCP and is (relatively) trivial to add to SDKs whose purpose is to already abstract development of MCP servers.
I’m surprised this hasn't received more discussion (catching up so maybe it does further in the discussion)
I've been mulling this over a bit and wanted to share my (candid and somewhat rambly) thoughts on this.
A bit of a recap of the problem
The key issue with the statefulness is the scaling characteristics of long lived connections / inability to use serverless deployments. There is also an issue with the SSE transport where the "side channel" post requests need to be routed to the server instance holding open the SSE stream.
The reason we have a stateful bidirectional protocol is to enable some really nice features (quoting justin):
- Notifications about changes—e.g., changes to resources, or tools being added/removed. (These can occur in either direction too.)
- Server-initiated sampling at any point, enabling agentic workflows.
- Passing arbitrary server logging through to the client.
- … more stuff in future? …
I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication). It's still very early days, but most servers and clients aren't properly leveraging these features. I suspect this is because they are harder to implement, and there aren't many good examples of clients in the wild that support the features.
It's important for adoption that we don't add undue complexity/friction to client and server developers early on, but it's also important that we don't close doors on the aspects of the protocol that will enable the long-tail of great features.
The direction I'm currently leaning in
I really like @atesgoral's approach of progressive enhancement:
If a client needs real-time notifications, it would call some sort of "subscribe" method to get back an SSE URL
I feel like we could update the SSE transport (or just make a new transport) where:
All client->server messages go through HTTP POST requests (including initialization) and the responses i.e:
→ POST body contains:
{
"method": "tools/call",
"params": {
"name": "string",
"arguments": {...}
}
}
← Response 200 contains:
{
"content": [
// Array of TextContent, ImageContent, or EmbeddedResource
],
"isError": false // Optional, defaults to false
}
→ POST body contains:
{
"method": "resources/read",
"params": {
"uri": "resource-uri"
}
}
← Response 200 contains:
{
"contents": [
// Array of TextResourceContents or BlobResourceContents
]
}
(Note: In the current SSE implementation all server->client messages come through the open SSE channel)
And all server initiated messages (i.e. notifications and sampling requests) come through an SSE stream that the client can optionally subscribe to.
The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.
Then, when server implementers and clients decide to implement the richer stateful features, they can implement the SSE channel and tackle the scaling implications.
These SSE channels could also be best effort, and it's okay for them to occasionally disconnect (i.e. when a deployment occurs).
Pros:
- Easy for server implementers to deploy MCP at scale
- Only tackle the complexity of long-lived connections if they want/need the additional features
- Don't close the door to the richer bidi features of the protocol
Cons:
- Most servers won't support the richer features
- Some additional complexity of the transport implementation for clients
There are probably other issues with this that I haven't thought through
You must be logged in to vote
7 replies
What's more, the real-time notification URL could also be a ws:
URL, telling the client to either:
- Open a unidirectional WebSockets stream for notifications
- Move over to bidirectional WebSockets entirely. This is fun because it will effectively be a 2-step upgrade, from JSON-RPC to HTTP to WebSockets (using the Upgrade/Connection headers).
This direction makes sense to me! I think this threads the needle well between simplicity while still supporting these features that we believe will be important—especially for agents.
Perhaps we could make SSE support a stronger requirement on servers, but allow disconnecting it at any time? Then a short-lived interaction but still supporting bidirectionality could look like:
- Client periodically sends requests/notifications to server via HTTP POST
- Client periodically connects to server over SSE
- Server delivers any "queued" requests/notifications over SSE
- If the server doesn't wish to support a long-lived connection, it then cleanly disconnects the SSE stream
- [repeat]
I tihnk we all tend to agree that the crux of the server scaling problem is not as much about "state", but more about "long lived connections" (i.e. SSE).
As @jerome3o-anthropic, @jspahrsummers, @atesgoral , and others mentioned, if the SSE connections are "relatively short", and the servers are "allowed to disconnect" (i.e. best effort), the scaling issues should be minimized.
It seems to me the we are reaching the conclusion that SSE connections should be alive only during a "transaction cycle", e.g. request for an "agent / tool", and the agent uses sampling capabilities. Within this back and forth (which may last a few seconds, or up to a couple of minutes), we keep the SSE channel open, but then it's closed when the end of the "agent / tool cycle" is reached.
IMO the protocol "as is" allows this behaviour (perhaps with some minor additions to #153). The changes in specification are probably in the clarifications and examples on "how to implement" the client and server. Some changes will also be needed in the reference implementations, which I'm happy to contribute to if there is a decision to move forward.
This is a very good direction!
At Automattic we are trying to use MCP for WordPress.
I managed to hack together native WordPress support for the SSE approach, but because PHP is kind of stateless, I had to:
- Throw in a bunch of
php,.ini
directives to hack together a working SSE connection - Store responses in the database
- Read them in a loop in the SSE endpoint.
This is very sub-optimal and I would love for the HTTP transport to just be query/response. The current SSE implementation frankly sounds like the STDIO approach ported directly to the web. HTTP/REST is a stateless protocol and yet there is a lot of hacks to make it behave statefully by using databases, memcache, etc etc.
The implementation of the SSE channel is optional for servers, allowing server implementers to get some value from MCP (tool calls, read resources, evaluate prompts, resource/prompt completions) without needing to support long lived connections.
YES PLEASE
I think these (+ future bidirectional) features will be important in the long run to achieve great UX in user facing apps and rich and efficient communication between agents (somewhat speculative, but I can definitely imagine graphs of agents being well served by stateful bidirectional communication)
Cons:
Most servers won't support the richer features
@jerome3o-anthropic I don't think the current SSE implementation is helping here. Again, it looks like a direct port of STDIO approach to the web, which is not how most web apps seem to operate.
The agentic workflows and larger systems will require long-running intermittent connections to operate and the way to do those would be to introduce
- Webhooks
- Message endpoints that would be polled /checked periodically.
Taking inspiration from payments
Come to think of it, these workflows could have similar latency to payments flows, where authorizing a credit card could take up to a minute, invoicing could take minutes and renewals could take years - its a mix of short-and-long term connections.
So from Stripe API we have:
- Webhooks that you can set up on Stripe https://docs.stripe.com/api/v2/core/event_destinations
- And an endpoint to retrieve all events https://docs.stripe.com/api/v2/core/events/list
Then a web-based client could send a webhook, or poll the server for an event.
I know this all requires clients to be a little more complicated, but I also think we'll end up in a world whree there is more servers than clients. thus we should optimize a bit for the ease of use of servers vs the clients.
In any event, your proposal of the SSE channel being optional is great.
Hello, I have been following the discussions with great interest and I really like this approach too.
A few remarks about the server response in a stateless situation as described in the OP (maybe it's already covered somewhere else that I missed).
First, I think we should still use the JSONRPC format for the response instead of using another format.
Also, instead of only sending the final response directly, I believe it would be useful for the server to be able to stream back messages to the client in order to inform of progress during long running operations.
I understand it's somehow still server side events but in a transactional / serverless use-case. People using LLM are used to it as most LLM api are streaming.
It could be configured by the client header (Accept: application/json-rpc-stream
) or in clientCapabilities
, if not supported, the server would just discard all notifications/*
messages.
(Maybe it's just simpler to always stream the answer)
Very exciting stuff anyway !
Thank you for this discussion!
If I'm understanding the current spec correctly, I think there are two categories of server>client communication to solve for over short-lived and/or interruptable connections, but today they are not distinguished between each other in the spec. I'm wondering if they should be, and if they should happen over distinct connections between client and server, instead of over one monolith streaming connection.
My rough stab at how that might look, without perfectly understanding the spec today:
Category 1. Notifications about changes to what the server can provide to the client
Examples: Resource/prompt/tool list changes, resource content changes
Use case: As the client application, I need to keep track of the resources/prompts/tools that a server can provide to me, so I can reason about using those resources/prompts/tools and/or present that list to the user. Streaming notifications from the server help me keep my local list of resources/prompts/tools up-to-date in real time. If I get disconnected, I can re-build my local list of resources/prompts/tools by calling the server's List/Get APIs, and then connect to a stream for updates. If a server does not support streaming updates, I can poll the server's List/Get APIs periodically to keep my local list up-to-date.
For servers that don't support streaming (or clients who don't want to stream):
Client periodically sends requests/notifications to server via HTTP POST
For servers that support streaming:
- Client discovers server's resources/prompts/tools via HTTP POST
- Client subscribes to change notifications over SSE
- Server delivers change notifications over SSE
- SSE connection disconnects (this could be a network issue, the server going through a deployment or scale down, etc)
- Repeat 1-4
Category 2. Requests/notifications that are (hopefully?) directly related to some work that the client requested
Examples: Sampling requests, tool progress notifications, (logging?), (roots?)
Use case: As the client application, I want to use prompts, tools, and agents from a server. In the course of completing my request to the server, the server may need additional information from me (like LLM samples). Or, it may want to send me occasional updates like progress notifications and logs. I establish a bidirectional communication stream with the server, so that the server can send me the information and requests it needs to complete my work. If the stream is disconnected mid-way, the server may not be able to complete my request and I may need to start a new request.
In the spec today, there doesn't seem to be any kind of a "session ID" or "job ID" associated with a request that might take a while to complete and might require some back-and-forth communication. For example, sampling requests and progress notifications from server>client don't seem to be directly associated to the original tool call request initiated from client>server. It seems like today it is technically valid for a server to spam the client with sampling requests and root requests over the long-running connection, without the client ever actually using the server.
Let's assume that some kind of session ID is introduced that is assigned to requests from the client for using prompts/tools/agents. For certain types of server>client requests, they must be within the context of a session ID. The original request from the client can be upgraded to a stream for bidirectional communication for that session only. The server completes the session when it has completed the requested work. (I think this pattern is similar to the "transactions" @pcingola was describing in his comment above)
For servers that don't support streaming:
Client sends a tool call request to server via HTTP POST.
Connection is held open until the server returns the result.
If the connection breaks in the middle, the client must send a new request.
For servers that support streaming:
- Client sends a tool call request to server via HTTP POST
- The request is upgraded to SSE
- Server delivers requests/notifications to the client via SSE
- Server delivers final result over SSE
- Server closes the SSE connection
If the connection breaks in the middle, the client must send a new request.
Optional: For servers that persist session state (for example, by session ID):
For non-streaming servers, the client could poll the server by session ID for any requests the server has for the client.
If the connection is broken (for either streaming or non-streaming servers), the client can make a request to get the results of a session ID. The result comes back immediately if the session is already complete. The request is resumed if the session ID exists. The behavior then depends on whether the server supports streaming, as above - either the HTTP request is held open until the result is ready, or a stream is started for bidirectional communication.
You must be logged in to vote
0 replies
We have solution. We manage transporter and server per connection.
We can handle multiple SSE remote sessions with this way.
import express, { RequestHandler } from "express";
import createServer from "./server.js";
// Add Winston logger for better logging
import winston from 'winston';
// Configure logger
const logger = winston.createLogger({
level: 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.json()
),
transports: [
new winston.transports.Console(),
]
});
// Add uncaught exception handler
process.on('uncaughtException', (error) => {
logger.error('Uncaught Exception:', { error: error.message, stack: error.stack });
// Give time for logs to be written before potential pod restart
setTimeout(() => process.exit(1), 1000);
});
process.on('unhandledRejection', (reason, promise) => {
logger.error('Unhandled Rejection:', { reason, promise });
});
// Start receiving messages on stdin and sending messages on stdout
const transportMap = new Map<string, SSEServerTransport>();
const app = express();
// Add request logging middleware
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
logger.info('Request completed', {
method: req.method,
path: req.path,
statusCode: res.statusCode,
duration,
userAgent: req.get('user-agent')
});
});
next();
});
const sseHandler: RequestHandler = async (req, res) => {
const transport = new SSEServerTransport("/messages", res);
const server = createServer();
res.setHeader('X-Accel-Buffering',"no");
try {
transportMap.set(transport.sessionId, transport);
res.on('close', async () => {
logger.info('SSE connection closed', { sessionId: transport.sessionId });
await server.close()
transportMap.delete(transport.sessionId);
});
await server.connect(transport);
logger.info('SSE connection established successfully', { sessionId: transport.sessionId });
} catch (error) {
logger.error('Failed to establish SSE connection', {
sessionId: transport.sessionId,
error: error instanceof Error ? error.message : 'Unknown error',
stack: error instanceof Error ? error.stack : undefined
});
transportMap.delete(transport.sessionId);
res.status(500).end();
}
};
const messageHandler: RequestHandler = async (req, res) => {
const sessionId = req.query.sessionId as string;
res.setHeader('X-Accel-Buffering',"no");
console.log("Messages sessionId", sessionId);
if (!sessionId) {
logger.error('Message received without sessionId');
res.status(400).json({ error: 'sessionId is required' });
return;
}
const transport = transportMap.get(sessionId);
if (!transport) {
logger.error('No active transport found for session', { sessionId });
res.status(404).json({ error: 'No active connection found for this session' });
return;
}
try {
logger.debug('Received message', { sessionId, body: req.body });
await transport.handlePostMessage(req, res);
logger.debug('Message handled successfully', { sessionId });
} catch (error) {
logger.error('Error handling message', {
sessionId,
error: error instanceof Error ? error.message : 'Unknown error',
stack: error instanceof Error ? error.stack : undefined
});
res.status(500).json({ error: 'Internal server error' });
}
};
app.get("/sse", sseHandler);
app.post("/messages", messageHandler);
const port = process.env.PORT || 3001;
// Add error handling for server startup
const httpServer = app.listen(port, () => {
logger.info(`Server started`, {
port,
nodeEnv: process.env.NODE_ENV,
pid: process.pid,
memory: process.memoryUsage()
});
});
// Handle server-specific errors
httpServer.on('error', (error: Error) => {
logger.error('Server startup error:', {
error: error.message,
stack: error.stack,
port,
pid: process.pid
});
// Exit process on critical errors
if ((error as any).code === 'EADDRINUSE') {
logger.error('Port is already in use, exiting process');
process.exit(1);
}
});
// Handle process termination
process.on('SIGTERM', () => {
logger.info('SIGTERM received, shutting down gracefully');
httpServer.close(() => {
logger.info('Server closed');
process.exit(0);
});
// Force close if graceful shutdown fails
setTimeout(() => {
logger.error('Could not close server gracefully, forcing shutdown');
process.exit(1);
}, 10000);
});
process.on('SIGINT', () => {
logger.info('SIGINT received, shutting down gracefully');
httpServer.close(() => {
logger.info('Server closed');
process.exit(0);
});
// Force close if graceful shutdown fails
setTimeout(() => {
logger.error('Could not close server gracefully, forcing shutdown');
process.exit(1);
}, 10000);
});```
You must be logged in to vote
0 replies
At Shopify, we're so far mostly using what we call "MCP Lite": Just regular, transactional (POST and get the result in the HTTP response) JSON-RPC, and often just implementing the MCP tools/call
method. This meets most of our current use cases since we have a fledgling ecosystem of internal tools with no immediate need for dynamic server/tool discovery. But that need is fast-approaching!
We have in fact done a PoC implementation of the JSON-RPC-SSE transport when it first came out, but as others in this thread have pointed out, it's awkward to implement: In podded deployments we are forced to use an inter-processes message passing mechanism to link the JSON-RPC POST request to the SSE stream.
I proposed progressive enhancement above, without pictures. Time for some pictures.
Selective notification subscription
"MCP Lite", using plain JSON-RPC. No SSE in sight. Very simple for adoption:
sequenceDiagram participant C as MCP Client participant S as MCP Server
C->>+S: POST JSON-RPC tools/call
S-->>-C: tool result
Loading
Important points:
initialize
can be called for capability negotiation, but maybe there are smart/assumed defaults that make this step optionaltools/list
can be called to discover tools, but if servers talking to each other already know what tools exist at the destination, they can skip this
Discovering notification URLs during initialization, splitting the MCP Server's JSON-RPC and notification endpoints for clarity:
sequenceDiagram participant C as MCP Client box MCP Server participant J as JSON-RPC Endpoint participant N as Notification Endpoint end
C->>+J: POST JSON-RPC initialize
J-->>-C: Notification URLs
C->>N: Start streaming from a notification URL above
N-->>C: event 1
C->>+J: POST JSON-RPC tools/call
J-->>-C: tool result
N-->>C: event 2
Loading
Important points:
- The notification stream returned from initialize can be zero or more transports supported by the server. It could also be a subset based on a capability query from the client, making it easy for the client to pick the first one that it ranks as preferable. HTTP poll URL, SSE URL, WebSocket URL, Kafka URL, ...
- The JSON-RPC tool/call is completely unhindered by any notification streaming happening on some other pipe. They are on different planes of existence.
You must be logged in to vote
0 replies
Sampling without streaming (borderline crazy idea)
In an "MCP Lite" world (see above), how can MCP-server-initiated sampling work?
Borrowing from HTTP, where servers can emit different response codes to ask clients to take certain actions (e.g. provide credentials, redirect away and forget this URL, I'm busy backoff, etc.) the tools/call
method could return a special result to prompt the client to process sampling, and then return the result to the MCP server as a "tool call continuation":
sequenceDiagram participant C as MCP Client participant S as MCP Server participant U as User participant L as LLM
C->>+S: POST JSON-RPC tools/call
S-->>-C: sampling request, continuation payload
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+L: Perform completion
L-->-C: Completion
C->>+U: Get user approval
U-->>-C: Go ahead
C->>+S: POST JSON-RPC tools/continue
S-->>-C: tool result
Loading
Assumption: The MCP Server will never send an unsolicited sampling request to the client, but these will all be as a response to tool calls.
Abstractly, this treats the tool as a finite state machine. When sampling is needed, the state of the tool is bounced back to the client and the client can progress the state of the tool by passing it the state + completion to transition the state back to running. This "state" could simply be a tool call reference if the MCP Server is stateful and can persist the paused tool state on its side.
You must be logged in to vote
0 replies
We've been struggling with this too. Long-lived connections are problematic for the reasons others have listed.
It seems like robust tool calling needs to satisfy two constraints:
- Simple tools should be fast and familiar.
- Long-running tools should be cancellable, streamable, and reliable.
Most cloud APIs solve this by having two types of endpoints:
- Standard endpoints that return the result immediately, without support for streaming or cancellation.
- Job endpoints that return a reference to a job that is cancellable, reliable, and streamable.
Good examples of this pattern are Google's AIP-151 for Long-Running Operations and Fal AI's Queue Endpoint. FAL's Queue API is a good reference implementation for long-running operations for models and tools that have streaming output.
Modifying this to MCP's JSON-RPC protocol would be relatively straightforward.
For simple tools:
tool/call -> immediate result
For long-running tools:
tool/call -> returns Operation reference (job_id)
operation/stream?id=xxx -> SSE stream of results
operation/cancel?id=xxx -> cancel operation
operation/get?id=xxx -> get current state/result
This gives you a stable job id that you can cancel and reconnect to regardless of connection stability.
This is slightly slightly more complicated than just upgrading to SSE on the initial call, as proposed in previous comments, but is easy to understand. I guess you could also support upgrading to SSE directly if optimizing was a priority, but conceptually there is a job.
Sequence diagrams end up like:
sequenceDiagram participant Client participant MCP as MCP (Job Manager) participant Tool
%% Simple Tool Flow
Client->>MCP: tool/call (simple tool)
MCP->>Tool: Execute simple tool
Tool-->>MCP: Result
MCP-->>Client: Immediate response
%% Long-running Tool Flow
Client->>MCP: tool/call (long-running tool)
MCP->>MCP: Create job record
MCP->>Tool: Start job execution
Note right of MCP: MCP tracks job state
MCP-->>Client: Return Operation reference (job_id)
Client->>MCP: operation/stream?id=xxx
Tool-->>MCP: Job progress updates
MCP-->>Client: Stream updates via SSE
%% Optional Get/Cancel Flow
opt Get Operation State
Client->>MCP: operation/get?id=xxx
MCP-->>Client: Current state/result
end
opt Cancel Operation
Client->>MCP: operation/cancel?id=xxx
MCP->>Tool: Cancel job execution
Tool-->>MCP: Execution cancelled
MCP-->>Client: Cancellation confirmed
end
Loading
If there are needs for other types of notifications than job progress updates that seems like a separate Events API. I'd lean toward making that be done via reliable webhook delivery vs. a single long-lived SSE connection.
You must be logged in to vote
0 replies
I also believe Option 1 makes sense as a way to disconnect sockets from sessions.
I have a transport proposal #182 that goes into some of the implications of bi-directional traffic, such as the lifecycle and security.
From the discussion post, I would argue that we can keep the management of session context and state management up to the server to decide upon.
We can enable flexibility by being generous and undescriptive in the "session id" format.
Whether a server wants to encode data or include a Secure Access Signature as a pointer, this should be opaque to the client.
In terms of extensibility, a few additions that could be great but are not required to solve the long-running sessions:
- we would not lock ourselves out of defining extensibility points for clients and servers to support an additional SSE "subscribing" channel to listen on a session.
- Potentially allow the ability to "return early" with a response body when full async is not required for a given call.
When a server always returns in the same http request, this would essentially define a stateless server. The server maintainer could decide at any time to start leveraging the callback URI as the server evolves and starts operating more asynchronously.
You must be logged in to vote
0 replies
Coming to this thread a bit late, but speaking for Cloudflare Workers:
Statefulness is just fine for us. Durable Objects are all about handling stateful protocols. The original stateful MCP protocol over a WebSocket transport should be a great fit for MCP servers built on Workers.
A protocol involving session IDs would also be OK -- it's trivial for Workers to route requests with the same session ID to the same Durable Object, where its state is tracked. The main problem is lifecycle: if the MCP client disappears without explicitly ending the session, how does the MCP server decide when it can clean up? WebSockets are nice because you naturally clean up when the connection is closed. So MCP servers built on Workers would probably prefer a stateful WebSocket-based protocol, but could also live with session IDs.
I am not sure how a session token that "Encodes all session state itself" would work exactly, but it sounds like complexity that wouldn't benefit Workers users.
You must be logged in to vote
9 replies
Going back to the motivation behind my suggestion above: After a regular POST initialize handshake, the returned WS stream URL can be a signed URL that can embody anything from authentication to some sort of session state (and thus no need for custom headers).
This is super useful information.
One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.
What I usually recommend is to use the protocols
parameter from the WebAPI and send the Authorization
and the token as subprotocols.
So you can denial the websocket connection, and send a 401 HTTP response, because the connection was not upgraded yet.
Disclaimer: I maintain uvicorn and starlette. The server dependencies of the MCP package in Python.
This is super useful information.
One thing we currently ran into is that we really want a potential Websocket connection to work with WebAPIs. It turns out the browser/webapi for WS doesn't support adding custom headers for authentication, etc. I am curious what people think are potential options to ensure if we do Websockets that we can have arbitrary headers (as allowed per HTTP standard), while keeping webapi compat.
From the JVM perspective - so speaking entirely selfishly since I operate mostly in that space🙃 - websocket server implementations are fairly badly featured/inconsistent/untestable/trickier to secure, so I believe requiring them would definitely hinder adoption in that space. Our current SDK model is to provide standard protocol options for SSE + WS + straight JsonRpc (no streaming) + StdIo.
That said, a regular POST plus redirect to Websocket makes good sense here from a usability and consistency viewpoint - ie if you want streaming then expect an endpoint to be returned from the initial call.
It's indeed annoying that the in-browser WebSocket API does not allow you to set headers. IMO they should allow you to set the Authorization
header at least. But we don't control that.
- I suspect most non-browser WebSocket implementations will let you set headers, since the specific security concerns motivating this restriction don't really apply outside a browser. Certainly Cloudflare Workers supports setting headers on outgoing WebSockets. Do we expect MCP clients to be running in-browser or do we expect this to be server-to-server communications?
- The "recommended" approach is of course to send credentials as the first message on the WebSocket. Of course, this doesn't always fit nicely into HTTP frameworks that would like to do authorization as middleware.
- A simple compromise is to just put the token in the URL. This might have some security concerns due to the fact that some systems may assume URLs are not secrets and might decide to log them and such, but perhaps that can be mitigated by using a single-use short-lived token? (I think this is what @daviddenton is suggesting, the initial POST would be authenticated and then redirect to a single-use secret URL.)
- I haven't heard of @Kludex's approach before of using the
protocols
parameter, but that does seem like a neat hack and I can't think of a real problem with it!
My feeling right now is: Support the regular Authorization
header and also support stuffing the token into the protocols
as a work-around for browsers. Show what we've done to browser-makers and hope that it bothers them enough that they just start supporting setting the Authorization
header directly in the API.
In multi-tenant setup, what's a session duration? The user, thread, connection, turn, tool call?
we're working on solving internal operations things over at SST for our users and letting them ship tools in a lambda is super important. it becomes a no brainer vs something they have to think about if it has to be containerized
option 2 is obviously the simplest for us - and we actually already built this in the short term so we can get moving. bridge mcp server that can talk to a stateless implementation of the mcp protocol hosted at some url
You must be logged in to vote
1 reply
Totally agreed.
Following the discussions, I think there are a few ways to go about this
- Replace SSE with gRPC or Web Sockets. Given the current protocol specification requires SSE for the notification mechanism (asking the client for LLM output which also acts like a human in the loop), removing it will be complicated, especially with many current server implementations dependent on it.
Also, Web Sockets are already part of the SDKs, so working around it for Lambda functions should be straightforward. - Building a stateful MCP server and giving it access to a serverless environment as a tool. This also seemingly satisfies the original motivation for this discussion.
The core idea is to build an MCP server with the current stateful approach and then build a comprehensive set of tools the MCP server has access to with the help of the client library. This will provide support for providing execution environments, prompt-based serverless function invocation, etc.
This also aligns with the MCP roadmap for providing support for server sandboxing which will work out of the box if this approach is adopted.
You must be logged in to vote
0 replies
For client -> server - Just remove the SSE transport from the spec and have everyone use stdio. Developers are free to implement any protocol they wish to connect to their web service and then expose the client as an MCP-server.
This is the "paving the cowpaths" way, it's what most MCP servers in the wild are already doing (e.g Dax's comment), and it leaves developers to come up with the best solution for their needs. This also leaves the door open to future standardization on (possibly multiple) protocols more suited to client -> server.
(MCP Host / Client) --- MCP/STDIO ---> (standardized MCP Server/Web Client) --- Some new thing ---> (Web Service)
The SSE transport could still be used - but now via a standard client npx mcp-see https://my-cool-tool.ai/mcp
. Similar clients could be made for websockets etc.
I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.
You must be logged in to vote
2 replies
I think server -> server is a completely different problem (i.e. the host application/mcp client is a web app) - but here tbh I think a completely different protocol would make more sense, so you can take advantage of standard conventions like http callbacks.
For server->server we were working under the assumption that each server can simply connect to the other as a client - that would be simple and mean that each participant had access to the full range of tools from the other.
Server-to-server, a server that hosts an agent (red process boundary) is also an MCP Client:
Good morning, folks!
Maybe I’m too unfamiliar with this subject to offer a fully informed opinion, but I can share my experience with MCP as a developer user.
From my perspective, I’d go all in with HTTP requests. It could significantly increase the number of available servers since it opens up opportunities for people to monetize them. In my experience with MCP, a single request is usually enough to get what I need—I don’t have to listen for ongoing updates. This makes synchronous communication simple to implement and straightforward to use.
I suggest keeping the current SSE approach but adding this new HTTP-based option, each with its own pros and cons. The server’s developer can then decide which protocol best suits their needs.
Just sharing my two cents—keep rocking!
You must be logged in to vote
1 reply
For the perspective of those of us building out MCPs SDKs, adding a straight stateless JSONRPC (without negotiation and hence no notifications or sampling), was actually quite simple, as was WS - so it seemed a shame not to speculatively add it. 🙃
From my understanding, the biggest issue with supporting standard HTTP endpoint calls is that there isn't a means for the server to do sampling
, same connection resource
change subscriptions, or general server-initiated communication. But what if that is the trade-off we're asking for the spec to make? Every MCP server that wants to do server-initiated communication should use SSE/WS style transports. This is just the common expectation for the web in general - you have to either implement polling/SSE/WS to do it or you're looking at client-initiated transactions only.
I can only think of a handful of use cases that would want to support sampling
, but countless that only want to expose their capabilities reliably using the systems they understand today. Will this change in the future? Maybe .. but we should build a system to support developers today with the option to evolve it to support future capabilities. This is why there is a capabilities negotiation layer.
You must be logged in to vote
4 replies
I agree that looking in the ecosystem today it seems like >90% of all MCP servers are doing stateless things that do not require subscriptions or ongoing connections. Most of them are tools or prompts.
To that end, it seems like Option 2 could be pursued -- capabilities negotiation can determine whether a stateful connection is required, and servers and clients both can avoid the added complexities of persistent connections if they don't need it. Option 1 could be pursued in parallel to make it easier to build stateful connections on stateless infra (along with e.g. a webhook transport spec).
Exactly. Consider game creation in AI coding agents. It'll be pretty wizzy for the agent to be able to convert a user's plain English request into media asset discovery / generation & procurement, directly into the project. At best it's stateless, and at worst it needs an auth token for context just like normal web services.
👍
HTTP has this right:
- Stateless by default
- Session token on initial connect if stateful
- Use existing UPGRADE mechanism to switch to websocket when desirable
@cramforce I agree with this completely.
If we stick closely to stateless HTTP (w/ WebSocket upgrade), then MCPs are "just another kind of server" from the perspective of quite a bit of infrastructure. This is a very valuable property, and it's hard to find any MCP constraints that make it so special that it needs a special kind of network protocol.
Personally, I'd go with Option 3.
MCP is supposed to make it easy for AI agents to integrate with tools and resources. This is a data integration problem. The industry standard for integrating data across platforms are REST APIs. This is what 99% of companies will already have up and running. The burden of integration for MCP is largely on the server developers - and expecting them to not only create a new set of endpoints but to run their software in an entirely different way (requiring long-running servers) feels absurd to me.
You could argue that it is to support additional capabilities. But the two main capabilities I am seeing above are 'sampling' and the server informing the client about updated resources/capabilities. The latter is easily solved 90% of the time by the client polling the server - and for the last 10%, the server can simply reply with a 400-level error.
As far as 'sampling' - I believe this is an anti-pattern and should be out of scope for MCP. If servers need AI capabilities to properly respond to tool/resource requests, they should implement that behind their API. They shouldn't have to depend on unpredictable AI capabilities of an unknown client. I don't think this capability should even be something that servers should be able to do. It creates security issues where servers can covertly request sensitive data that clients may have. It also adds unnecessary risk for client developers since servers can effective utilize the client's AI tokens. I'm not sure why a client developer would even build support for sampling given these concerns (what do they really have to gain?) - speaking of which, none of the current documented clients have support for sampling: https://modelcontextprotocol.io/clients
Any other more complex server-client interactions should be handled by multiple separate tool/resource calls.
In my opinion, a stateless version is an absolute must. Many developers are using serverless solutions and long-running servers/connections are a non-option for them. So at a minimum, we should go with Option 2. But I would go a step further and simplify the protocol by removing features which (in my opinion) shouldn't be there in the first place.
You must be logged in to vote
11 replies
It's not a dumb comment and it's true, however, the spec isn't oriented around these stateless approaches. Another challenge is that local computers would require a webhook to execute this as well. That itself is not an insurmountable problem but would open up a lot of new problems for allowing servers to callback into local clients. For web
Sampling is the one part of the spec that didn't make sense to me. What are some examples as to why you would use sampling instead of treating the tool as an independent sub-agent with its own access to run a model and context? If it is simply about saving the tool from configuring/running a model then I wonder if the MCP Host could provide a signed URL that the tool could make a simple request to.
richard-rance to answer your question, you can imagine a tool that acts like a wizard in order to satisfy an agent request. Let's say I provide a GitHub MCP server, and I want to expose a tool that allows agents to set up their own GitHub repos. There are countless ways you could configure a base GitHub repo, given the number of templating tools, licenses, directory structures, languages, build tools, so on and so forth. So you could handle all of these degrees of freedom with the set of parameters to the tool, but this will likely cause your agent to have a high error rate when using this tool, as the number of configuration options naturally lends towards making mistakes.
An alternative approach would be to guide that agent to their desired GitHub repo configuration via step-by-step questions. You might start by asking high level questions like "What programming languages", or "What license". And throughout the flow, go narrow. The host agent is still the best person to answer these questions, as they have the full context on the codebase they'd like to set up and can presumably ask the human user questions. Implementing this outside of MCP is pretty trivial, however with MCP you pretty much need something like sampling or this is impossible.
@Mocuto this is actually the first example of sampling that I have ever seen actually written down! It's very difficult to reason about without one - and it would be great to see examples like this on a sequence diagram IMHO. I have even fed the entire spec to Claude 3.7 on several occasions and asked it to come up with a worked example - and it failed miserably to grasp how the servers would interact - always getting confused about the direction of messages and cause and effect. 🤷
sean-roberts I presume that if Option 3 is taken, a large rewrite would be needed to move to stateless. As for local computers, I don't see what the issue is with webhooks. Assuming the MCP Server is running locally, you'd simply need to pass the localhost port and webhook URL. In the event of a remote server is where things become tricky, but an ingress app isn't that high a bar to clear, and would enable a lot of additional use cases as well.
我在生产实践中,也遇到了类似的问题。
我想我找到了一个相对比较好的抽象方式(应该是按照方案一的模式)使得mcp server能通过session_id保持状态,同时也能使用传统的方式做到很好的扩展
- 重新抽象一个基于https://nchan.io/ 的
Transport
和
SseServerTransport
完全不同,这个Transport
作为一个nginx module实际是在mcp server的前面同时作为网关Gateway
一样的形式存在
同时支持sse + websocket
- 通过
nchan_publisher_upstream_request
将mcp client发送过来的ClientRequest
转发到后端的mcp server - 实现一个HTTMCP继承FastMCP(可以使用内置的装饰器定义tools/resource...)
- 放弃内置的server.run方法启动,而是将所有的request_handler注册到一个APIRouter上
- 最后将一个或者多个mcp server挂载到fastapi的app上,以普通的web api的方式部署。
系统可用性
- nchan是一个专业的做im通信的软件可以支持比较高的并发访问能力
- nchan作为
Transport
的同时,也作为Gateway
存在,其底层基于nginx,也是一个高性能的web服务器。 - 这里使用nginx的upstream和真实的mcp server通信。通信协议是http,后端可以使用任意一种传统的web服务器开发mcp功能
系统扩展性
- nchan支持以redis cluster做存储进行横向扩展。
- 这里使用nginx upstream和mcp server通信,后端的mcp server也很容易通过增加负载均衡的方式进行扩展。
You must be logged in to vote
2 replies
Claude provided translation:
@jspahrsummers
In my production practice, I've encountered similar issues. I think I've found a relatively good abstraction method (which should follow pattern one) that allows the mcp server to maintain state through session_id while also achieving good scalability using traditional methods.
- Re-abstract a
Transport
based on https://nchan.io/
UnlikeSseServerTransport
, thisTransport
actually exists in front of the mcp server as a kind ofGateway
in the form of an nginx module, supporting both sse + websocket- Forward
ClientRequest
from mcp client to the backend mcp server throughnchan_publisher_upstream_request
- Implement an HTTMCP that inherits from FastMCP (can use built-in decorators to define tools/resources...)
- Abandon the built-in server.run method to start, but instead register all request_handlers to an APIRouter
- Finally, mount one or more mcp servers on the fastapi app and deploy them as ordinary web APIs.
- Forward
System availability:
- nchan is professional software for IM communication that can support relatively high concurrent access capabilities
- nchan exists as both a
Transport
and aGateway
, with its underlying technology based on nginx, which is also a high-performance web server. - Here, nginx's upstream is used to communicate with the real mcp server. The communication protocol is http, and the backend can use any traditional web server to develop mcp functionality.
System scalability:
- nchan supports horizontal scaling using redis cluster for storage.
- Here, nginx upstream is used to communicate with the mcp server, and the backend mcp server can also be easily scaled by adding load balancing.
I drew a sequence diagram, which should illustrate the entire process well:
- the session_id (which is equivalent to nchan_channel_id) will remain from the start of the connection until the connection ends.
- If the task has a short execution time, it can return directly.
- If the task has a long execution time, it can be executed in an asynchronous queue, only requiring progress to be pushed in between and the final result to be pushed at the end.
sequenceDiagram MCP Client->>NCNAN: connect activate NCNAN MCP Client-->>NCNAN: jsonrpc request NCNAN-->>FastAPI: nchan_publisher_upstream_request FastAPI-->>MCP Server: call_tool(name, args) MCP Server-->>FastAPI: result FastAPI-->>NCNAN: jsonrpc response NCNAN-->> MCP Client: jsonrpc response
MCP Client-->>NCNAN: jsonrpc request NCNAN-->>FastAPI: nchan_publisher_upstream_request FastAPI-->>MCP Server: call_tool(name, args) in backend MCP Server-->>NCNAN: push notification NCNAN-->> MCP Client: notification MCP Server-->>NCNAN: push jsonrpc response NCNAN-->> MCP Client: jsonrpc response NCNAN->> MCP Client: close deactivate NCNAN
Loading
I'm late to the conversation and just a random guy who started using and implementing MCPs recently.
Option 1 makes the most sense to me, and the con could be mitigated since I think the state management could be handled as a server-agnostic persistence wrapper that the serverless containers would use.
Basically with a session state structure like this:
interface Subscription { subscribeRequest: SubscribeRequest, resourceHash: string, }
interface MCPSessionState { initializeMessage: InitializeRequest; latestResponses: { toolList?: Tool[]; promptList?: Prompt[]; resourceList?: Resource[]; resourceTemplateList?: ResourceTemplate[]; }; subscriptions: Subscription[]; }
Starting a new session
- Initialize a new session state object with the received
initialize
request - Store it somewhere using the session ID (assuming the transport layer generates that somehow)
- Spin up the actual server as subprocess etc. and bridge all communication
- Remember which of the
listChanged
capabilities are available
- Remember which of the
During a running Session (while the serverless container is alive)
- When responding to a
{feature}/list
request and the server has the respectivelistChanged
capability:- Store the server response in the respective
latestResponse
property
- Store the server response in the respective
- When receiving a
resources/subscribe
request from the client:- Remember the subscription by:
* Storing the subscription message
* Retrieving the resource from the server, calculating a hash and storing that
- Remember the subscription by:
- When receiving a
resources/unsubscribe
request:- Remove the matching subscription
- When the server sends a
notifications/resources/updated
notification:- Retrieve the resource yourself too and update the hash of the subscription
Continuing a running session (in a new serverless container etc.)
- Retrieve the session state object via the session ID
- Spin up the server
- Replay the
initialize
message- Ignore the response (assuming the MCP servers are deterministic with regard to their initialize message exchanges)
- For each stored latest response:
- Retrieve the current list from the server
- If it is different from the stored result, send a
notifications/{feature}/list_changed
notification to the client and update the stored value
- For each subscription:
- Replay the subscription message
- Handle resource changes that occurred between session interruption and continuation:
* Retrieve the resource and compare its hash to the stored hash; if changed, send anotifications/resources/updated
notification to the client
You must be logged in to vote
1 reply
Of course if the protocol would evolve to, for example add something like a "HEAD" style requests for lists and resources that returns only a last_updated_at
timestamp then some of the bookkeeping inside the wrapper could become much more simple since only the timestamps would need to be compared to
But that might be out of scope for this discussion... This idea assumes the current stable protocol
This seems to be a good option:
https://blog.blaxel.ai/how-we-optimized-mcp-servers-for-serverless
WebSockets: a game-changer for MCP
Facing these challenges, we made the decision to switch from SSE to WebSockets for all our MCP server communications. Despite the fact that MCP documentation doesn't extensively discuss WebSockets, they are officially supported - and as we discovered, they work significantly better in cloud environments.
Why WebSockets outperform SSE for MCP servers
WebSockets establish a persistent, full-duplex TCP connection between client and server, allowing for bidirectional communication. This architecture offers several advantages over SSE for MCP servers:
- Connection Stability: WebSockets maintain more stable connections, with built-in mechanisms for handling disconnections and reconnections.
- Bidirectional Communication: While MCP often doesn't require extensive client-to-server communication, having the capability for bidirectional data flow eliminates the need for separate HTTP requests for client-initiated actions.
- Binary Data Support: WebSockets can transmit both binary data and UTF-8 text, whereas SSE is limited to UTF-8. This provides more flexibility for different types of data exchange.
- Better Performance: WebSockets typically offer lower latency and overhead compared to SSE, especially for frequent communications.
- No Connection Limits: WebSockets don't suffer from the same browser connection limits as SSE, making them more suitable for applications where users might have multiple tabs open.
Forking Supergateway
To implement our WebSocket solution, we forked the Supergateway project and modified it to use WebSockets instead of SSE. The core changes involved:
- Protocol Adaptation: Modifying the communication layer to use WebSocket protocol instead of HTTP streaming.
- Connection Management: Implementing robust connection handling with automatic reconnection logic.
- Error Handling: Enhancing error detection and recovery mechanisms to ensure reliable operation in cloud environments.
- Scaling Optimizations: Adding features to better support horizontal scaling across multiple instances.
Our modified version of Supergateway is available on GitHub as Blaxel's Supergateway, and we welcome contributions and feedback from the community!
Technical implementation: WebSockets for MCP
For those interested in the technical details, here's how we implemented WebSockets for our MCP servers. Please note that the entire code can be found in open-source on our GitHub on Blaxel's Supergateway and Blaxel’s SDK.
You must be logged in to vote
2 replies
Kudos on this solution and the article was a great read, but its unclear to me how this resolves the stateful/stateless debate? Wouldn't long lived serverless functions be an anti-pattern?
Yeah it kind of is an anti-pattern. I read this tonight: The new AWS Lambda specification now supports hosting Model Context Protocol (MCP) servers, which is a significant development for serverless AI applications. This support is enabled through Lambda's response payload streaming feature, which allows for progressive streaming of response payloads back to clients.
This supports MCP now in Serverless env. The Article author used Cloudflare Workers (sim to Response payload streaming) in his solution for hosting MCP in serverless environment. Lambda can be used in conjunction with Amazon API Gateway to create WebSocket APIs. This integration allows for bidirectional communication between clients and servers, enabling real-time data exchange.
Implementation:
When a client initiates a WebSocket connection, API Gateway invokes a Lambda function associated with the $connect route. Lambda functions can be triggered for connection establishment, disconnection, and message handling. The server (Lambda) can push messages to connected clients using the API Gateway Management API.
Key aspects of this support include:
Streaming capability: Lambda functions can now stream response payloads as they become available, enabling real-time communication necessary for MCP servers.
Larger payloads: The new feature supports payloads up to a soft limit of 20 MB, which is crucial for handling the potentially large data transfers involved in MCP interactions.
Long-running operations: MCP servers often require maintaining longer connections, which is now possible with Lambda's ability to perform long-running operations while reporting incremental progress.
MCP2Lambda bridge: A tool called MCP2Lambda has been developed to act as a bridge between MCP clients and AWS Lambda functions, allowing generative AI models to access and run Lambda functions as tools.
Security: This approach implements segregation of duties, allowing models to invoke Lambda functions without direct access to other AWS services, enhancing security in AI applications.
Autodiscovery: The MCP server can autodiscover Lambda functions in your account, making it easier to expose multiple functions as tools to AI models.
Could the MCP team please add WebSockets? It personally feels like it is the "smartest universal solution" led to overly increased complexity, making it hard to adopt and build on top of the tech, and no one is happy....
SSE: a promising start with disappointing Results
During our search for better solutions, we discovered Supergateway, a tool that wraps stdio-based MCP servers with Server-Sent Events (SSE). On paper, this looked like an elegant solution.
For those unfamiliar with SSE, it's a technology that establishes a one-way communication channel from server to client over HTTP. Unlike WebSockets, which provide full-duplex communication, SSE is designed specifically for server-to-client updates. This makes it seemingly ideal for scenarios where clients primarily need to receive updates from servers.
We implemented Supergateway with SSE, but quickly ran into significant issues:
What are the problems with SSE in serverless environments
- Connection Instability: In serverless environments, SSE connections dropped randomly and frequently. This is particularly problematic for AI agents that need reliable, persistent connections to function properly.
- Scaling Challenges: As we tried to scale our platform, the limitations of SSE became increasingly apparent. The protocol wasn't designed with cloud-native architectures in mind.
- Browser Connection Limits: SSE suffers from a limitation to the maximum number of open connections, which is set to a very low number (6) per browser and domain. This became problematic when users opened multiple tabs.
- Proxy and Firewall Issues: Some proxies and firewalls block SSE connections because they don't have a Content-Length header, creating deployment challenges in enterprise environments.
After extensive testing, we concluded that while SSE might work well for simpler use cases or controlled environments, it wasn't robust enough for our cloud-based AI agent platform.
(Edited to replace the screenshot with text in the original post. Point 2 above was highlighted. - @atesgoral)
You must be logged in to vote
0 replies
WebSockets can surely be a more ergonomic transport for persistent connections, especially since they provide full-duplex message transfer out-of-the-box.
However some of the SSE criticism and WebSocket praising above is a bit misinformed.
SSE is not a protocol upgrade in the same sense WebSockets is. SSE is plain old HTTP, using "text/event-stream" as content type, with the server sending events as streaming body chunks. It's up to the SSE server and clients to reestablish the connection and make use of SSE event ids for reliable delivery. But most clients are implemented very naively and are not robust. This is not an inherent problem with SSE itself. SSE actually has some built-in reconnection capabilities in the spec (Last-Event-ID header and automatic reconnection in the EventSource API) that are often overlooked.
WebSocket is also not this magical transport with connection and delivery guarantees. It's just a persistent TCP socket. It's still up to the clients and servers to properly implement robustness measures like pings and reconnects. There are mature WebSocket client/server libraries out there that can create the impression that robustness is a built-in feature of WebSockets. You can experience this by implementing your own WebSocket client from scratch, and then reinventing all the robustness measures that libraries that have been around for a while have implemented.
Also, in the end, SSE is also just another persistent TCP socket, but with only one side doing the talking (if we ignore HTTP/3 QUIC).
You must be logged in to vote
0 replies
Thank you all for the discussion here! This has been a very thorny topic with a lot of tradeoffs, no matter which path we pursue.
Incorporating all of the input we've heard here and elsewhere, we are proposing a new Streamable HTTP transport, which roughly amounts to plain HTTP plus optional SSE—very similar to what @jerome3o-anthropic described above. Please take a look at that PR description and the associated spec language, and flag any questions or concerns!
Note that other options, like WebSocket, remain possible in future, but we believe strongly this is the right path to pursue for now. All of your comments are greatly appreciated! 🙇
You must be logged in to vote
3 replies
Nice. Go simple live life!!!
When will the Python SDK be supported?
When will this be released?