Conversation
| deploy_vm(vm) | ||
| ``` | ||
|
|
||
| **Advantages:** |
There was a problem hiding this comment.
| **Advantages:** | |
| #### Advantages: |
| * **Minimal Complexity:** Protocol complexity is minimal, requiring no | ||
| additional changes. | ||
|
|
||
| **Disadvantages:** |
There was a problem hiding this comment.
| **Disadvantages:** | |
| #### Disadvantages: |
There was a problem hiding this comment.
The main downside I see of this is that it adds a burden to the client's orchestration to be able to pass the correct handles back and forth. I'm not saying this is overriding but just to make sure we understand this, it would mean one of two things on the client side:
- The model would need to be given the handles in its context when making the decision to call a tool in such a way that it would correctly pass the handles back as parameters.
- The client would need to maintain hooks that intercept / augment tool calls to ensure handles are passed correctly. Let's take the example of
create_basket()and assume the client only would want to create one basket per session and was worried (1) above would be flaky. For the client's definition of session length, every timecreate_basket()is called, it might return the original response to that tool call rather than calling the underlying MCP.
The advantage of a session id is it would remove the need for (1) and (2) and would do so in a general way. The counterargument to this is that as we expect models and architectures to supply the right context to improve (1) above should become less and less of an issue.
| As we weigh the advantages and disadvantages of both proposals, multi-agent | ||
| orchestration presents a significant unresolved challenge: |
There was a problem hiding this comment.
nit: should this be after both proposals?
There was a problem hiding this comment.
I moved it here because it seemed like the open questions only applied to to the Option A
| #### Disadvantages: | ||
|
|
||
| * **Protocol Complexity:** Retains the concept of state within the protocol | ||
| definition, requiring servers to manage state lifecycles, TTLs (Time to Live), |
There was a problem hiding this comment.
nit: manage state lifecycles, TTLs (Time to Live), and garbage collection. is a function of whether the server wants to maintain state so is the same in both proposals. In Option A it's linked to the session id if the server chooses to support sessions. In Option B it's just linked to the handle.
The bigger issue with this option is the extra complexity of the session handshake and when it should happen.
There was a problem hiding this comment.
I was trying to capture the complexity that @markdroth mentioned here -- essentially protocol levels TTLs and changes between interactions of requests in a session -- but didn't quite nail it.
| tool/call: list_tools() -> Returns: [connect] | ||
| tool/call: execute_query("SELECT 1") -> ERROR: Tool not found | ||
| tool/call: connect($DATABASE_URI) -> Success | ||
| tool/call: list_tools() -> Returns: [execute_query, list_tables, ...] |
There was a problem hiding this comment.
Separate from this, but what would be the trigger for the client to call list_tools() again here?
There was a problem hiding this comment.
I was going to point this out as well. This is going to be a problem, because IIUC the SDKs handle caching the tool list and won't refresh until they have some indication that the tool list has changed. While it's true that we are going to retain the ability for the client to subscribe to tool-list-changed notifications as an optional optimization, there won't be a way to tie that notification to the client that called connect($DATABASE_URI) once we remove sessions. So clients will not actually see the new tool list until the TTL expires.
I think this would be a problem only if we go with option A below. And under that option, if we do decide that this particular use-case is important, there are other ways we could consider handling it. For example, we could put a bit in the tool call response that tells the client to invalidate its tool list cache.
There was a problem hiding this comment.
I think we've previously discussed a combination of things like TTLs, notification-style changes (e.g. returning a indicator on a tool result that info needs to change). I agree there's future work here, but I think it's solvable (and likely has some value outside of these specific use-cases)
There was a problem hiding this comment.
It's an advantage of handles, because then you don't need to hide the execute_query tool. If only the connect tool is given, then the model may not believe that using it will help it achieve it's goals without the other tools being present in the context to use that connection as well.
| As we weigh the advantages and disadvantages of both proposals, multi-agent | ||
| orchestration presents a significant unresolved challenge: | ||
|
|
||
| * **Sub-Agent Orchestration:** It is currently unclear how sessions or state |
There was a problem hiding this comment.
Isn't this just up to the client application to determine? It can either use the session from the parent or isolate depending on its use case.
In that way it is somewhat similar to option B where option B could initialize the subagent with some context including required handles.
There was a problem hiding this comment.
I think the answer to that is yes, but in this case the session data is handled by the application and not the agent. So it's likely going to need to be standardized in some fashion -- if half the servers assume that a subagent should use a new session and have use the existing sessions, server's will have a hard time matching both.
|
|
||
| **Disadvantages:** | ||
|
|
||
| * **Breaking Change:** If applications are using sessions today for application |
There was a problem hiding this comment.
Another thing to consider in this vein is that today, clients likely have the infrastructure to maintain session ids for the MCP servers they are interacting with in some form. If we remove sessions completely, it will be harder to add back later, because those clients likely will have removed that infrastructure.
markdroth
left a comment
There was a problem hiding this comment.
Thanks for writing this up, Kurtis!
| as a specific user context, a continuous conversation thread, or stateful tool | ||
| operations. | ||
|
|
||
| ### **Why Sessions Need to Change** |
There was a problem hiding this comment.
I suspect (but correct me if I'm wrong) that sessions were originally added as a way of essentially shoehorning the existing stdio transport into HTTP. If so, it may be worth pointing out as background information that the mechanism just flat-out doesn't actually work as originally intended, because the idea was that all requests on a session would go to the same server instance, and that can't actually be guaranteed with HTTP (except perhaps in certain special cases that are far from the norm).
There was a problem hiding this comment.
I'm not sure this is quite correct. It was a very intentional decision for MCP to be a stateful protocol and the session ID was supposed to be the key for that state. The deployment difficulty was accepted as a trade-off for making stateful interactions more natural. Whether or not that was the right trade-off (or necessary trade-off) is another story.
See modelcontextprotocol/modelcontextprotocol#102 for some background (note: justin was one of the co-creators in MCP).
There was a problem hiding this comment.
I get that it was designed as a stateful protocol, but wasn't it initially designed for stdio, where that statefulness is a more natural fit? Or was the HTTP transport part of the initial design?
| tool/call: list_tools() -> Returns: [connect] | ||
| tool/call: execute_query("SELECT 1") -> ERROR: Tool not found | ||
| tool/call: connect($DATABASE_URI) -> Success | ||
| tool/call: list_tools() -> Returns: [execute_query, list_tables, ...] |
There was a problem hiding this comment.
I was going to point this out as well. This is going to be a problem, because IIUC the SDKs handle caching the tool list and won't refresh until they have some indication that the tool list has changed. While it's true that we are going to retain the ability for the client to subscribe to tool-list-changed notifications as an optional optimization, there won't be a way to tie that notification to the client that called connect($DATABASE_URI) once we remove sessions. So clients will not actually see the new tool list until the TTL expires.
I think this would be a problem only if we go with option A below. And under that option, if we do decide that this particular use-case is important, there are other ways we could consider handling it. For example, we could put a bit in the tool call response that tells the client to invalidate its tool list cache.
|
|
||
| #### Disadvantages: | ||
|
|
||
| * **Protocol Complexity:** Retains the concept of state within the protocol |
There was a problem hiding this comment.
I think another important aspect of this is that this option requires us to define how each protocol mechanism interacts with sessions, and that will increase complexity in both the client and the server. For example, in the "capability unlocking" example, we'd need to define the tool list as being a function of the session, and both clients and servers would need to understand that.
|
|
||
| **Disadvantages:** | ||
|
|
||
| * **Breaking Change:** If applications are using sessions today for application |
There was a problem hiding this comment.
I think that because the semantics of sessions are changing, we'd probably need a breaking API change in the SDKs in either option, so I'm not sure this is really a disadvantage of option B specifically.
There was a problem hiding this comment.
If today there are use cases that use sessions, those tools would need to be re-written to use explicit state handles instead. I think that the use-cases are limited to use-cases where clients and servers are both under control of the developer, but I don't think that means they don't exist.
The feedback from the SDK owners has been "lots of people are asking about this" including things for resuming sessions and associating with the session_id.
There was a problem hiding this comment.
In option A, I think any existing tools using the current session mechanism would need to be changed to use the new mechanism anyway, because the semantics of the current session mechanism are not well defined, so we can't guarantee that things will continue to work right for existing code anyway. I believe you were making this same argument in 1442 in the context of removing initialize and adding sessions/create: the existing mechanism isn't well defined and therefore not useful, and we don't want to surprise people with behavior changes.
| responsibility of tracking state references entirely to the client via explicit | ||
| state handles. | ||
|
|
||
| ## **Use Cases for Application Sessions** |
There was a problem hiding this comment.
I think we still need to take a step back and start by coming to consensus on what set of use-cases we actually need to support. These four examples are interesting as hypotheticals, but if no one is actually doing any of this today, then the examples aren't useful.
I think that in order to support the complexity of sessions in the protocol, we should require evidence of concrete real-world use-cases that are common enough to justify the complexity we'd be taking on.
There was a problem hiding this comment.
I don't think that's a fair callout. Sessions today don't work. As described above, the client and server can't agree on them, which makes implementing something like this impossible.
There was a problem hiding this comment.
I agree that they don't work today in the general case, in the sense that you can't use any arbitrary client with any arbitrary server and expect it to work properly. But I think we've heard that there are specific cases where people control both the clients and servers and are leveraging sessions for some functionality where those particular clients and servers do agree on things like the scope of the session. It seems like it would be useful to get a list of those use-cases along with some idea of how many people are using them.
But even setting aside examples of what people are actually doing today, I think there's still an important element of what things people want to do. We currently have no way of knowing how many people will actually want to do anything like these hypothetical use-cases. If no one (or very few people) want to implement one of these use-cases, then it's not worth the complexity of supporting it, and we should stop considering it.
My overall point here is that before we commit to supporting a given use-case, we should first have confidence that enough people will actually use it to make it worth supporting it. Right now, I don't see a strong signal of that -- we've had a lot of hypothetical discussion, but very little real-world input on use-cases.
In the absence of concrete use-cases that we have evidence that enough people are interested in, I would lean heavily toward option B.
| * **Minimal Complexity:** Protocol complexity is minimal, requiring no | ||
| additional changes. | ||
|
|
||
| **Disadvantages:** |
There was a problem hiding this comment.
One disadvantage that we may want to call out here is that this approach would work for tools only, not (e.g.) resources. In theory, there could be cases where reading a particular resource mutates session state, and that wouldn't be covered by this approach.
That having been said, I'm not sure this is actually a real case. See my comment above about stepping back to agree on use-cases.
|
We discussed this decision today and were unable to align on a direction. The final vote was:
|
A summary of the sessions vs session-less decision that we aim to make a recommendation on in the 3/25 Transport Working Group meeting.