Skip to content

doc: add sessions-vs-sessionless-decision#27

Open
kurtisvg wants to merge 1 commit intomainfrom
kvg-sessions-decision
Open

doc: add sessions-vs-sessionless-decision#27
kurtisvg wants to merge 1 commit intomainfrom
kvg-sessions-decision

Conversation

@kurtisvg
Copy link
Collaborator

@kurtisvg kurtisvg commented Mar 20, 2026

A summary of the sessions vs session-less decision that we aim to make a recommendation on in the 3/25 Transport Working Group meeting.

deploy_vm(vm)
```

**Advantages:**
Copy link
Collaborator

@pja-ant pja-ant Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Advantages:**
#### Advantages:

* **Minimal Complexity:** Protocol complexity is minimal, requiring no
additional changes.

**Disadvantages:**
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Disadvantages:**
#### Disadvantages:

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main downside I see of this is that it adds a burden to the client's orchestration to be able to pass the correct handles back and forth. I'm not saying this is overriding but just to make sure we understand this, it would mean one of two things on the client side:

  1. The model would need to be given the handles in its context when making the decision to call a tool in such a way that it would correctly pass the handles back as parameters.
  2. The client would need to maintain hooks that intercept / augment tool calls to ensure handles are passed correctly. Let's take the example of create_basket() and assume the client only would want to create one basket per session and was worried (1) above would be flaky. For the client's definition of session length, every time create_basket() is called, it might return the original response to that tool call rather than calling the underlying MCP.

The advantage of a session id is it would remove the need for (1) and (2) and would do so in a general way. The counterargument to this is that as we expect models and architectures to supply the right context to improve (1) above should become less and less of an issue.

Comment on lines +196 to +197
As we weigh the advantages and disadvantages of both proposals, multi-agent
orchestration presents a significant unresolved challenge:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should this be after both proposals?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it here because it seemed like the open questions only applied to to the Option A

#### Disadvantages:

* **Protocol Complexity:** Retains the concept of state within the protocol
definition, requiring servers to manage state lifecycles, TTLs (Time to Live),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: manage state lifecycles, TTLs (Time to Live), and garbage collection. is a function of whether the server wants to maintain state so is the same in both proposals. In Option A it's linked to the session id if the server chooses to support sessions. In Option B it's just linked to the handle.

The bigger issue with this option is the extra complexity of the session handshake and when it should happen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to capture the complexity that @markdroth mentioned here -- essentially protocol levels TTLs and changes between interactions of requests in a session -- but didn't quite nail it.

tool/call: list_tools() -> Returns: [connect]
tool/call: execute_query("SELECT 1") -> ERROR: Tool not found
tool/call: connect($DATABASE_URI) -> Success
tool/call: list_tools() -> Returns: [execute_query, list_tables, ...]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Separate from this, but what would be the trigger for the client to call list_tools() again here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to point this out as well. This is going to be a problem, because IIUC the SDKs handle caching the tool list and won't refresh until they have some indication that the tool list has changed. While it's true that we are going to retain the ability for the client to subscribe to tool-list-changed notifications as an optional optimization, there won't be a way to tie that notification to the client that called connect($DATABASE_URI) once we remove sessions. So clients will not actually see the new tool list until the TTL expires.

I think this would be a problem only if we go with option A below. And under that option, if we do decide that this particular use-case is important, there are other ways we could consider handling it. For example, we could put a bit in the tool call response that tells the client to invalidate its tool list cache.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've previously discussed a combination of things like TTLs, notification-style changes (e.g. returning a indicator on a tool result that info needs to change). I agree there's future work here, but I think it's solvable (and likely has some value outside of these specific use-cases)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's an advantage of handles, because then you don't need to hide the execute_query tool. If only the connect tool is given, then the model may not believe that using it will help it achieve it's goals without the other tools being present in the context to use that connection as well.

As we weigh the advantages and disadvantages of both proposals, multi-agent
orchestration presents a significant unresolved challenge:

* **Sub-Agent Orchestration:** It is currently unclear how sessions or state
Copy link

@gjz22 gjz22 Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this just up to the client application to determine? It can either use the session from the parent or isolate depending on its use case.

In that way it is somewhat similar to option B where option B could initialize the subagent with some context including required handles.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer to that is yes, but in this case the session data is handled by the application and not the agent. So it's likely going to need to be standardized in some fashion -- if half the servers assume that a subagent should use a new session and have use the existing sessions, server's will have a hard time matching both.


**Disadvantages:**

* **Breaking Change:** If applications are using sessions today for application
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing to consider in this vein is that today, clients likely have the infrastructure to maintain session ids for the MCP servers they are interacting with in some form. If we remove sessions completely, it will be harder to add back later, because those clients likely will have removed that infrastructure.

Copy link
Contributor

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up, Kurtis!

as a specific user context, a continuous conversation thread, or stateful tool
operations.

### **Why Sessions Need to Change**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect (but correct me if I'm wrong) that sessions were originally added as a way of essentially shoehorning the existing stdio transport into HTTP. If so, it may be worth pointing out as background information that the mechanism just flat-out doesn't actually work as originally intended, because the idea was that all requests on a session would go to the same server instance, and that can't actually be guaranteed with HTTP (except perhaps in certain special cases that are far from the norm).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is quite correct. It was a very intentional decision for MCP to be a stateful protocol and the session ID was supposed to be the key for that state. The deployment difficulty was accepted as a trade-off for making stateful interactions more natural. Whether or not that was the right trade-off (or necessary trade-off) is another story.

See modelcontextprotocol/modelcontextprotocol#102 for some background (note: justin was one of the co-creators in MCP).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that it was designed as a stateful protocol, but wasn't it initially designed for stdio, where that statefulness is a more natural fit? Or was the HTTP transport part of the initial design?

tool/call: list_tools() -> Returns: [connect]
tool/call: execute_query("SELECT 1") -> ERROR: Tool not found
tool/call: connect($DATABASE_URI) -> Success
tool/call: list_tools() -> Returns: [execute_query, list_tables, ...]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to point this out as well. This is going to be a problem, because IIUC the SDKs handle caching the tool list and won't refresh until they have some indication that the tool list has changed. While it's true that we are going to retain the ability for the client to subscribe to tool-list-changed notifications as an optional optimization, there won't be a way to tie that notification to the client that called connect($DATABASE_URI) once we remove sessions. So clients will not actually see the new tool list until the TTL expires.

I think this would be a problem only if we go with option A below. And under that option, if we do decide that this particular use-case is important, there are other ways we could consider handling it. For example, we could put a bit in the tool call response that tells the client to invalidate its tool list cache.


#### Disadvantages:

* **Protocol Complexity:** Retains the concept of state within the protocol
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another important aspect of this is that this option requires us to define how each protocol mechanism interacts with sessions, and that will increase complexity in both the client and the server. For example, in the "capability unlocking" example, we'd need to define the tool list as being a function of the session, and both clients and servers would need to understand that.


**Disadvantages:**

* **Breaking Change:** If applications are using sessions today for application
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that because the semantics of sessions are changing, we'd probably need a breaking API change in the SDKs in either option, so I'm not sure this is really a disadvantage of option B specifically.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If today there are use cases that use sessions, those tools would need to be re-written to use explicit state handles instead. I think that the use-cases are limited to use-cases where clients and servers are both under control of the developer, but I don't think that means they don't exist.

The feedback from the SDK owners has been "lots of people are asking about this" including things for resuming sessions and associating with the session_id.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In option A, I think any existing tools using the current session mechanism would need to be changed to use the new mechanism anyway, because the semantics of the current session mechanism are not well defined, so we can't guarantee that things will continue to work right for existing code anyway. I believe you were making this same argument in 1442 in the context of removing initialize and adding sessions/create: the existing mechanism isn't well defined and therefore not useful, and we don't want to surprise people with behavior changes.

responsibility of tracking state references entirely to the client via explicit
state handles.

## **Use Cases for Application Sessions**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still need to take a step back and start by coming to consensus on what set of use-cases we actually need to support. These four examples are interesting as hypotheticals, but if no one is actually doing any of this today, then the examples aren't useful.

I think that in order to support the complexity of sessions in the protocol, we should require evidence of concrete real-world use-cases that are common enough to justify the complexity we'd be taking on.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's a fair callout. Sessions today don't work. As described above, the client and server can't agree on them, which makes implementing something like this impossible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that they don't work today in the general case, in the sense that you can't use any arbitrary client with any arbitrary server and expect it to work properly. But I think we've heard that there are specific cases where people control both the clients and servers and are leveraging sessions for some functionality where those particular clients and servers do agree on things like the scope of the session. It seems like it would be useful to get a list of those use-cases along with some idea of how many people are using them.

But even setting aside examples of what people are actually doing today, I think there's still an important element of what things people want to do. We currently have no way of knowing how many people will actually want to do anything like these hypothetical use-cases. If no one (or very few people) want to implement one of these use-cases, then it's not worth the complexity of supporting it, and we should stop considering it.

My overall point here is that before we commit to supporting a given use-case, we should first have confidence that enough people will actually use it to make it worth supporting it. Right now, I don't see a strong signal of that -- we've had a lot of hypothetical discussion, but very little real-world input on use-cases.

In the absence of concrete use-cases that we have evidence that enough people are interested in, I would lean heavily toward option B.

* **Minimal Complexity:** Protocol complexity is minimal, requiring no
additional changes.

**Disadvantages:**
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One disadvantage that we may want to call out here is that this approach would work for tools only, not (e.g.) resources. In theory, there could be cases where reading a particular resource mutates session state, and that wouldn't be covered by this approach.

That having been said, I'm not sure this is actually a real case. See my comment above about stepping back to agree on use-cases.

@kurtisvg
Copy link
Collaborator Author

We discussed this decision today and were unable to align on a direction.

The final vote was:

  • 4 votes: Application Sessions
  • 3 votes: No sessions!
  • 3 votes: Defer to core maintainers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants