User text appended during tool-in-flight permanently 400-bricks the session (orphan tool_use)

## Summary

If a user message arrives while a streaming inference is between an emitted `tool_use` and its not-yet-returned `tool_result`, the framework appends the user text directly into `messages` and continues. The orphan `tool_use` is never paired. Every subsequent inference flush rebuilds a request that violates Anthropic's structural validator and gets rejected with HTTP 400 (`messages.N: \`tool_use\` ids were found without ...`). The damage is persisted to the session's Chronicle `messages` log, so it survives host restarts and binary upgrades. **The only current mitigation is to start a new session.**

## Severity

Latent foot-gun. Any agent that runs tools while the operator is active can hit this. A "stay quiet" agent (e.g. a conductor that does occasional fleet pokes) is the worst case because the user almost always types into it mid-tool. Once tripped, the session is permanently bricked for that agent — no recovery path inside the framework.

## Concrete reproduction (from prod, post-mortem of a conhost VM)

Session `75da82b0` (conductor), inference 63 at 2026-05-22 10:22:24 returned 400 in 256 ms. Walking the request payload from the new `llm-calls.jsonl` logger:

```
msg[444] user        "Okay; pinging - is everything good?"
msg[445] assistant   tool_use  fleet--status   (toolu_01Dtd...)
msg[446] user        tool_result for status                     ← paired
msg[447] assistant   tool_use  fleet--peek    (toolu_01Tdd...)  ← orphan
msg[448] user        "I think they're all actually stopped now.
                      Let's see what happens when the clerk picks
                      up the loose threads and files new tickets."
```

Tool-call counts in that request: 189 `tool_use` vs. 188 `tool_result` — exactly one unpaired.

Inference 64 (next user nudge) **had the same orphan** at a shifted index (an autobio compression pass had dropped 17 unrelated messages between attempts but preserved the orphan), and was rejected again. User opened a new session shortly after.

The predecessor request at 10:21:37 (same store, 449 messages, 0 orphans) succeeded. The orphan was introduced by the user typing during the in-flight `fleet--peek`.

## Why it happens

Reading `src/framework.ts` and `src/agent.ts`: `driveStream()` runs as a fire-and-forget background promise. While it is between emitting a `tool_use` and `stream.provideToolResults()` (which depends on `dispatchToolCall().then() → ToolResultEvent → handleProcessEvent`), an `ExternalMessageEvent` can land in the queue. The handler appends the user text to `messages` and proceeds. There is no check that the most recent assistant message has all its `tool_use` ids resolved, and no synthesis of placeholder `tool_result`s for the unresolved ones.

Autobio compression downstream respects the orphan rather than repairing it.

## Suggested fix (sketch)

Two layers, both small:

1. **Repair-on-write.** In the path that appends a user *text* message to `messages` (i.e. the `ExternalMessageEvent` handler / equivalent), scan back to the most recent assistant message. For every `tool_use` block in it whose id has no matching `tool_result` in a subsequent user message, synthesize a stub `tool_result` (`is_error: true`, content something like `\"cancelled: user interjected before result returned\"`) and append it as a user message *before* the new user text. The driveStream that was awaiting the real result needs to either be cancelled or have its pending tool dispatch invalidated — picking the cleanest approach here is the design call.

2. **Repair-on-read (defense in depth).** In the request-build path, refuse to serialize any assistant message that contains an orphan `tool_use`. Either drop the trailing tool_use block or throw a clear engine-side error before the API does. Applies equally to any autobio rewrite path.

A one-shot repair script that walks existing damaged sessions and inserts the synthesized stubs would also help — there are already-bricked sessions in the wild.

## Diagnostic data

- Failing request payloads: `llm-calls.jsonl` entries [17] and [19] in the post-mortem VM dump.
- Chronicle inferences: `framework/inference-log` entries 63, 64 in session `75da82b0`.
- Both errors:  `400 invalid_request_error: messages.N: \`tool_use\` ids were found without ...`
- Same orphan id across both: `toolu_01TddAQ7dJdWyxWskNwMsSca` (fleet--peek).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

User text appended during tool-in-flight permanently 400-bricks the session (orphan tool_use) #37

Summary

Severity

Concrete reproduction (from prod, post-mortem of a conhost VM)

Why it happens

Suggested fix (sketch)

Diagnostic data

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

User text appended during tool-in-flight permanently 400-bricks the session (orphan tool_use) #37

Description

Summary

Severity

Concrete reproduction (from prod, post-mortem of a conhost VM)

Why it happens

Suggested fix (sketch)

Diagnostic data

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions