refactor(chat): ChatRunner long-lived Query 化 (OAuth state を turn 跨ぎで保持) (PR-C、stacked on #21)#22
Conversation
|
Warning Rate limit exceeded
To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughSDKのqueryが文字列または非同期イテラブルのプロンプトを受け取り、クエリの戻り値がイテレータハンドル(オプションでclose()を持つ)に変更されました。AsyncIterableInputが追加され、ChatRunnerは長期存続する単一のsdk.queryプロセスへリファクタリングされました。 Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant ChatRunner
participant AsyncIterableInput
participant SDK
Client->>ChatRunner: startConversation()
ChatRunner->>ChatRunner: ensureQuery()
ChatRunner->>AsyncIterableInput: create iterable
ChatRunner->>SDK: query({ prompt: iterable })
SDK-->>ChatRunner: returns SdkQueryHandle
Note over ChatRunner,SDK: Long-lived query subprocess
Client->>ChatRunner: runUserTurn(userMessage)
ChatRunner->>AsyncIterableInput: push(userMessage)
AsyncIterableInput-->>SDK: yield message into prompt iterable
SDK-->>ChatRunner: emit SdkMessageLike (streamed events)
ChatRunner->>ChatRunner: dispatchSdkMessage() → TurnState
ChatRunner-->>Client: respond when turn ends
Client->>ChatRunner: disconnect
ChatRunner->>AsyncIterableInput: close()
AsyncIterableInput-->>SDK: iterable ends
ChatRunner->>SDK: close SdkQueryHandle
ChatRunner-->>Client: connection closed
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Review rate limit: 0/1 reviews remaining, refill in 49 minutes and 46 seconds.Comment |
9e2679d to
6875149
Compare
PR-C の主機能: 1 ChatRunner = 1 sdk.query() = 1 subprocess に固定して、 MCP HTTP transport の OAuth 状態 (PKCE / token) を turn 跨ぎで保持する。 main 最新 (PR-A の MCP 統合 + PR-B の OAuth 2.1 / auth_request UX + 私の修正多数: XML escape / externalToolUseIds / ephemeral runOAuthCallback / 空 assistant プレースホルダ等) と統合した形で再実装。 ## 新規ファイル - packages/ai-engine/src/async-input.ts: SDK の streaming input mode 用 AsyncIterable<SdkUserMessage> 実装。push 可能 + close 即時 teardown (CR Major 反映: close 後 next() 即 done / FIFO waiter キュー) - packages/ai-engine/src/async-input.test.ts: 8 件 (push / waiter / close / FIFO / close 後 即 done) ## agent-runner.ts - SdkLike.query を `prompt: string | AsyncIterable<SdkUserMessageLike>` に拡張 - SdkUserMessageLike + SdkQueryHandle (任意 close()) 型追加 ## chat-runner.ts 新規フィールド: - query / input / outputLoopDone / outputLoopFailed / currentTurn / cachedExternalConfigById - ensureQueryInflight (Promise キャッシュ、再入ガード) 新規 interface: - TurnState: 1 user turn の間だけ生きる mutable state (assistantMsgId / queue / textBuffer / stashedAuthUses / externalConfigById / externalToolUseIds) 新規メソッド: - ensureQuery: 1 度だけ query 立ち上げ + 出力ループ起動。inflight Promise キャッシュで再入ガード (codex Major 3) - startQueryInternal: 内部の起動本体 - runOutputLoop: SDK message を currentTurn の queue に振り分け - dispatchSdkMessage: 1 メッセージ処理 (text/tool_use/tool_result/result) - tearDownQuery / close: リソース cleanup。close は outputLoopDone を退避 してから tearDownQuery を呼ぶ (codex 致命 Minor: close 順序) 挙動変更: - runUserTurn: 入口で currentTurn ガード (codex Major 1: turn 並走防止) - input null チェックを invariant assertion で表明 (codex Major 2) - 空 assistant message 永続化を ensureQuery 前に移動 (long-lived では bg loop が即起動するため、後置きだと race で空 message 残る) - ephemeral runOAuthCallback も long-lived query 経由に統合 (callback URL は input.push 経由で SDK に渡るが chatStore には 永続化されない) main 最新の機能を保持: - auth-detector / handleAuthToolResult / stashedAuthUses - externalToolUseIds (TurnState に統合) - escapeXmlText / escapeXmlAttr (buildChatPrompt) - 空 assistant プレースホルダ「(認証処理を完了しました)」 (dispatchSdkMessage の result 処理に統合) buildMcpServer は 1 引数化、handler 内で this.currentTurn から assistantMsgId / emit を動的解決 (1 度作った MCP サーバを turn 跨ぎ で使い回せる、turn 中は不変)。 ## server.ts - WS close 時に runner.close() を呼ぶ。close 内部の reject は .catch で warn (codex Major: void close() で unhandled rejection 化を回避) ## chat-runner.test.ts - startCapturePromptText helper を追加: prompt が AsyncIterable<...> に変わったので、最初に push された user message の content を 奪取する (string にも互換) - 既存 test の prompt キャプチャ 4 箇所を helper 経由に書き換え ## テスト pnpm typecheck: 4/4 PASS pnpm test: core 93 + storage 88 + ai-engine 213 + frontend 273 = 667 PASS pnpm lint: exit 0
2995b16 to
cdb2200
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
packages/ai-engine/src/chat-runner.ts (1)
90-140: 🏗️ Heavy lift
ChatRunnerの責務を分割したいですこの PR で query ライフサイクル、turn ルーティング、OAuth、MCP bridge、prompt helper まで 1 ファイルに集まり、
packages/ai-engine/src/chat-runner.tsが 500 行制約を大きく超えています。少なくとも query lifecycle / OAuth handling / prompt helpers は別モジュールへ切り出した方が、今回のような状態管理修正を追いやすくなります。As per coding guidelines, "Keep individual files under 500 lines; split larger files into multiple modules".
Also applies to: 162-333, 503-611
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ai-engine/src/chat-runner.ts` around lines 90 - 140, ChatRunner is doing too many responsibilities in one file; extract query lifecycle/OAuth, turn routing/MCP bridge, and prompt helper logic into separate modules to keep file <500 lines. Move long-lived query management (ensureQuery, tearDownQuery, query, input, outputLoopDone, ensureQueryInflight, cachedExternalConfigById and related OAuth/token handling) into a new QueryLifecycle/OAuth module; move per-turn routing and state (TurnState, currentTurn, pendingApprovals, approveTool, createSdkMcpServer-related handlers) into a TurnRouting/MCPBridge module; and move prompt construction helpers into a PromptHelpers module; update ChatRunner to import and delegate to these modules (preserve public APIs and names: ChatRunner.runUserTurn, ChatRunner.approveTool, TurnState, pendingApprovals, ensureQuery) and adjust tests/consumers to use the refactored imports.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/ai-engine/src/chat-runner.ts`:
- Around line 354-361: The current EOF handling in output loop sets
this.outputLoopFailed = true and always enqueues { type: 'chat_turn_ended' } for
the active turn, which treats unexpected subprocess/iterator termination as a
normal shutdown; change the logic in the output loop exit path (where
outputLoopFailed and this.currentTurn are accessed) to distinguish an explicit
shutdown/close() signal from unexpected termination: if the termination reason
is an explicit shutdown/close(), push { type: 'chat_turn_ended' } and finish the
turn.queue as before; otherwise push an { type: 'agent_failed', error: <reason
or generic> } event into turn.queue and finish() it, and keep
this.outputLoopFailed behavior so ensureQuery can recreate the iterator. Ensure
you reference outputLoopFailed, currentTurn, turn.queue, and event types
'chat_turn_ended' and 'agent_failed' when making the change.
- Around line 209-246: After ensureQuery() fails we currently return leaving the
previously appended empty assistant placeholder (assistantMsgId) persisted;
change the catch block to roll back that placeholder before returning: set
this.currentTurn = null, then try to remove the placeholder via
chatStore.deleteMessage(threadId, assistantMsgId) (or if deleteMessage isn't
available, call chatStore.updateMessage/thread update API to replace the empty
assistant message's blocks with an error block containing err.message), then
yield the same error event and return. Ensure you reference assistantMsgId,
this.currentTurn, ensureQuery(), and chatStore in the fix so the removal/update
runs inside the catch path before returning.
- Around line 551-599: The code is pushing the raw callbackUrl into the
persistent conversation context (this.input) which lets sensitive code/state
persist across turns and relies on a prompt-only restriction of allowed tools;
instead, do not append the OAuth callback into the long-lived this.input or chat
history—either (A) invoke the mcp__${mcpServerId}__complete_authentication tool
server-side directly with callbackUrl (bypassing the model conversation) or (B)
create a single ephemeral turn that is executed with enforced allowedTools
limited to only mcp__${mcpServerId}__complete_authentication and that is not
merged into this.input/chatStore; update the code paths around ensureQuery, the
block that pushes into this.input, and the mechanism that runs the turn so the
callbackUrl is handled only in that ephemeral/restricted execution and never
persisted.
---
Nitpick comments:
In `@packages/ai-engine/src/chat-runner.ts`:
- Around line 90-140: ChatRunner is doing too many responsibilities in one file;
extract query lifecycle/OAuth, turn routing/MCP bridge, and prompt helper logic
into separate modules to keep file <500 lines. Move long-lived query management
(ensureQuery, tearDownQuery, query, input, outputLoopDone, ensureQueryInflight,
cachedExternalConfigById and related OAuth/token handling) into a new
QueryLifecycle/OAuth module; move per-turn routing and state (TurnState,
currentTurn, pendingApprovals, approveTool, createSdkMcpServer-related handlers)
into a TurnRouting/MCPBridge module; and move prompt construction helpers into a
PromptHelpers module; update ChatRunner to import and delegate to these modules
(preserve public APIs and names: ChatRunner.runUserTurn, ChatRunner.approveTool,
TurnState, pendingApprovals, ensureQuery) and adjust tests/consumers to use the
refactored imports.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 202ab44a-9844-420c-8e42-eb0979ec12a3
📒 Files selected for processing (6)
packages/ai-engine/src/agent-runner.tspackages/ai-engine/src/async-input.test.tspackages/ai-engine/src/async-input.tspackages/ai-engine/src/chat-runner.test.tspackages/ai-engine/src/chat-runner.tspackages/ai-engine/src/server.ts
CodeRabbit (PR #22) 1 周目への対応。 [Minor / chat-runner.ts:246] ensureQuery 失敗時の空 assistant ロールバック long-lived Query 化に伴い空 assistant message を ensureQuery 前に append するよう変更したが、ensureQuery 失敗時に空バブルが履歴に残る問題があった。 chatStore に message 単位の delete API が無いため、catch 内で replaceMessageBlocks でエラー内容を blocks に書き込む形でロールバックする。 [Major / chat-runner.ts:361] 正常 EOF と subprocess 終了の区別 runOutputLoop の正常 EOF 経路では従来 chat_turn_ended のみを emit していたが、 これだと「予期しない subprocess 終了」も「明示 shutdown」も区別できず、 途中で切れた turn が正常完了に見えてしまう問題があった。 ChatRunner.isClosing フラグを追加し、close() 内で立てる。runOutputLoop の EOF ブランチで isClosing が false なら agent_failed event を先に emit してから chat_turn_ended で turn を閉じる。 [Major / chat-runner.ts:599] OAuth callback の long-lived 経路統合 (見送り) callback URL を this.input に push することで会話 context に turn 跨ぎで残る、allowedTools 単一制約が prompt 依存になる、という懸念。 PR-C の主目的 (OAuth state を turn 跨ぎで保持) と SDK 制約のトレードオフ。 ephemeral 化すると subprocess 分離で state が引き継がれず再認証が必要になる。 本 PR では long-lived 維持を選択し、prompt 内で「他 tool は呼ぶな」と明示する 形でモデル依存の防御を入れる。SDK の MCP transport が状態共有 API を提供する までは追加対応見送り。CR thread に詳細を返信。
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
packages/ai-engine/src/chat-runner.ts (1)
90-145: 🏗️ Heavy liftこの long-lived query 制御は別モジュールに分けたいです
この PR で
ChatRunnerに query lifecycle、OAuth callback、turn routing、MCP dispatch まで集まり、ファイル全体が 1,200 行超になっています。次の race 修正や仕様追加がさらに入りやすい箇所なので、少なくとも lifecycle / OAuth flow / queue 周りは分離した方が保守しやすいです。 As per coding guidelines:Keep individual files under 500 lines; split larger files into multiple modules.Also applies to: 167-637
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/ai-engine/src/chat-runner.ts` around lines 90 - 145, The ChatRunner class is doing too much—extract the long-lived SDK query lifecycle, OAuth callback handling, and turn routing/queue management into a new module/class (e.g., SdkQueryManager) so ChatRunner remains focused on per-turn logic; move logic that touches query, ensureQuery/ensureQueryInflight, input, outputLoopDone/outputLoopFailed, runOutputLoop, OAuth callback handling, cachedExternalConfigById and any MCP registration/dispatch code into that new class, expose a small API (startQuery(), stopQuery(), getCurrentExternalConfig(), registerTurnQueue(assistantMsgId, queue), and awaitApproval/ui-toolUseId plumbing) and replace direct field usage in ChatRunner with calls to these APIs; keep TurnState, pendingApprovals, currentTurn and per-turn persistence inside ChatRunner and ensure the new SdkQueryManager is injected via ChatRunnerDeps so tests can mock it.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/ai-engine/src/chat-runner.ts`:
- Around line 360-386: The catch and output-loop termination paths set
outputLoopFailed and close the turn queue but do not clear or reject the turn's
pending approvals, allowing later approveTool() calls to apply effects for a
failed turn; update both error paths (the catch(err) block and the following
output-loop-ended block) to find the currentTurn and synchronously reject/clear
its pending approvals (e.g., iterate currentTurn.pendingApprovals or the
ChatRunner.pendingApprovals map and call their reject/error handler or a helper
like currentTurn.rejectPendingApprovals('turn_failed')), then remove them from
any global pendingApprovals store so approveTool() cannot run side effects for
that turn, and keep emitting the existing error/chat_turn_ended/finish behavior.
- Around line 564-625: The currentTurn (set to turnState) can remain stuck if
chatStore.appendMessage or later steps throw; wrap the logic that runs after
setting this.currentTurn (everything from after "this.currentTurn = turnState"
through the end of the turn setup) in a try/finally and clear this.currentTurn
in the finally block to guarantee release. Specifically, surround the code that
calls ensureQuery (or at least the section after setting this.currentTurn), the
chatStore.appendMessage call, the yield of 'chat_assistant_message_started', and
the push to this.input with a try { ... } finally { this.currentTurn = null } so
any exception during appendMessage or subsequent prompt setup always clears
this.currentTurn (refer to assistantMsgId, turnState, this.currentTurn,
chatStore.appendMessage, and ensureQuery).
- Around line 309-349: The startQueryInternal flow can race with close(): after
async work like projectStore.getProjectMeta() completes we must recheck shutdown
state so an sdk.query/subprocess isn't created after a close; modify
startQueryInternal (around projectStore.getProjectMeta, before calling sdk.query
and before assigning this.query/this.outputLoopDone) to check this.isClosing (or
a similar shutdown flag) and abort/cleanup early if true, or alternatively have
close() join the inflight starter by awaiting this.ensureQueryInflight (the
ensureQueryInflight join points referenced in ensureQuery()/close()) so the
creation and assignment of sdk.query and outputLoopDone cannot run after
shutdown; update either startQueryInternal or close/ensureQuery to use these
guards consistently to prevent orphaned subprocesses.
- Around line 356-358: The current loop in chat-runner.ts logs every SDK message
(for await (const msg of query)) and can leak sensitive data (OAuth code/state
and MCP outputs); remove the unconditional console.log or demote it to a
debug-only log and ensure it never includes message.content or msg.result even
after redactMcpSecrets; instead log only safe metadata (e.g., msg.type, msg.id,
timestamps) or a truncated/redacted indicator, and ensure runOAuthCallback’s
OAuth URL pushed into this.input is never logged by dispatchSdkMessage or any
logger in this loop.
---
Nitpick comments:
In `@packages/ai-engine/src/chat-runner.ts`:
- Around line 90-145: The ChatRunner class is doing too much—extract the
long-lived SDK query lifecycle, OAuth callback handling, and turn routing/queue
management into a new module/class (e.g., SdkQueryManager) so ChatRunner remains
focused on per-turn logic; move logic that touches query,
ensureQuery/ensureQueryInflight, input, outputLoopDone/outputLoopFailed,
runOutputLoop, OAuth callback handling, cachedExternalConfigById and any MCP
registration/dispatch code into that new class, expose a small API
(startQuery(), stopQuery(), getCurrentExternalConfig(),
registerTurnQueue(assistantMsgId, queue), and awaitApproval/ui-toolUseId
plumbing) and replace direct field usage in ChatRunner with calls to these APIs;
keep TurnState, pendingApprovals, currentTurn and per-turn persistence inside
ChatRunner and ensure the new SdkQueryManager is injected via ChatRunnerDeps so
tests can mock it.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: bed1bfbd-5a31-4792-bb0f-8b9ae8f45b59
📒 Files selected for processing (1)
packages/ai-engine/src/chat-runner.ts
CodeRabbit (PR #22) 2 周目への対応。Heavy lift #3 (OAuth callback の ephemeral 化) は前 PR と同様 long-lived 維持で見送り、CR thread に返信。 [Major / chat-runner.ts:349] close と ensureQuery の競合 close() が startQueryInternal の途中 (await projectStore.getProjectMeta() 等) で走ると、shutdown 後に sdk.query() が作られて subprocess が孤立する race があった。close() で ensureQueryInflight を join してから tearDown する。 [Major / chat-runner.ts:358] SDK メッセージの常時ログによる機密漏洩 runOutputLoop の console.log が全 SDK メッセージを redactMcpSecrets でラップ してログしていたが、message.content や result の OAuth callback URL の code/state がサーバーログに残るリスクがあった。type だけ出す形に絞り、 本文は出さない。redactMcpSecrets の import も削除。 [Major / chat-runner.ts:386] 異常終了後の pendingApprovals 残存 runOutputLoop の catch / EOF ブランチで queue を閉じるだけだったので、 turn 失敗後に UI から approveTool() が来ると create_node / create_edge 等 の side effect が走る恐れがあった。rejectAllPendingApprovals helper を 新設し、両ブランチで pendingApprovals を一括 reject(false) する。 [Major / chat-runner.ts:625] runOAuthCallback の currentTurn 解放保証 this.currentTurn = turnState の直後から全体を try/finally で囲み、 appendMessage 等の中間ステップで throw しても currentTurn が解放される ことを保証 (turn_in_progress で次の turn を受け付けられなくなるのを防ぐ)。 runUserTurn 側も同じパターンで全体を try/finally に統一。
Summary
PR #18 を 4 PR に割り直したうちの ChatRunner を long-lived Query 化 する部分。base = #21 (PR-B) にスタック。
PR-B (auth_request UX) では per-turn `sdk.query()` のままで、SDK subprocess が turn ごとに再生成され、MCP HTTP transport の OAuth state (PKCE / token) が消えてしまう問題があった。1 ChatRunner = 1 `sdk.query()` = 1 subprocess に固定して、OAuth state を chat thread 内で保持する設計に書き換える。
変更内容
`2995b16` refactor: ChatRunner long-lived Query
新規:
型拡張:
chat-runner.ts:
server.ts:
chat-runner.test.ts:
テスト
```
pnpm typecheck # 4/4 PASS
pnpm test # 91 + 88 + 213 + 272 = 664 PASS
```
期待挙動
Stack
Test plan
Summary by CodeRabbit
リリースノート
New Features
Bug Fixes
Tests