feat(observability): persist stream-close events to error log + audit log#494
Open
icebear0828 wants to merge 4 commits into
Open
feat(observability): persist stream-close events to error log + audit log#494icebear0828 wants to merge 4 commits into
icebear0828 wants to merge 4 commits into
Conversation
… log Premature stream close, stream-client-abort, stream-client-disconnect, and stream-error previously only wrote `console.warn` to the dev stdout tee log. Production has no tee, and the existing error-log foundation (PR #482) only captured uncaught exceptions, so these recurring failures never reached the Errors tab (PR #483) or `/admin/logs`. Each "fix attempt" had to start with grep over a dev log instead of a structured sample with rid + account + closeCode + eventCount. New `src/logs/stream-close-event.ts` fans out one `appendErrorLog` + one `enqueueLogEntry` per event, wired into six call sites: two client-abort paths in proxy-handler, the UpstreamPrematureCloseError branch (carries eventCount/hadReasoning/responseId/variantHash), two response-processor catches (client-write-failed and upstream-error with written diagnostics and upstreamStatus), and the two streamPassthrough internal EOF paths in responses.ts. Also hardens `error-log.ts:readAppVersion` against the "config not loaded" path that unit-test invocations hit. Tests: tests/unit/logs/stream-close-event.test.ts covers all four kinds, the missing-rid fallback, and numeric upstreamStatus → audit status passthrough. Full suite stays green (1931 passing, 1 skipped).
…itest The previous commit's recordStreamCloseEvent helper is invoked from proxy-handler and response-processor, which integration tests exercise. Those tests don't all mock @src/paths.js, so `getDataDir()` resolved to the developer's real `data/` and `appendErrorLog` left 17 stray entries in `data/error-log.jsonl` on every `npm test` run. `appendErrorLog` now short-circuits when `process.env.VITEST` is set, unless `VITEST_FORCE_APPEND_ERROR_LOG=1` opts back in. The three test files that intentionally exercise the disk writer (`tests/unit/logs/error-log.test.ts`, `tests/unit/logs/stream-close-event.test.ts`, `tests/unit/routes/admin/error-logs.test.ts`) set the flag in `beforeEach` and unset in `afterEach`, so their behavior is unchanged. Integration tests that pass through `recordStreamCloseEvent` incidentally now no-op instead of polluting the data dir.
…records Codex review #1: streamPassthrough's two `recordStreamCloseEvent` calls in `responses.ts` fell back to the synthetic `requestId="stream-close"` because the generator had no access to the surrounding request's rid, account, or variantHash. Audit entries for `/v1/responses` premature closes were therefore impossible to correlate with the upstream POST, defeating the point of the Errors-tab plumbing. Threading the context end-to-end without invasive surgery: - `FormatAdapter.streamTranslator` gains an optional 9th `streamContext?: StreamCloseContextBase` parameter. - `response-processor.ts:streamResponse` builds the context from its existing `diagnostics` (requestId/tag/accountEntryId/variantHash) plus `model`, and forwards it on every `adapter.streamTranslator(...)` call. - `responses.ts:streamPassthrough` accepts a new 7th `streamContext?` argument and feeds it into both premature-close records (the `parseStream` throw branch and the no-terminal branch). - The other three adapters (`messages.ts`, `chat.ts`, `gemini.ts`) accept the new param at the type level and ignore it; their inner translators surface upstream throws via response-processor's outer try/catch, which already carries the full diagnostics. Tests: three new `streamPassthrough` cases assert that the supplied streamContext propagates into `recordStreamCloseEvent` on natural EOF, on mid-stream throw, and that omitting it still produces a usable fallback entry. Vitest mock for `@src/logs/stream-close-event.js` captures the invocations without touching disk. CHANGELOG entry updated to reflect the corrected call-site count (7, not 6).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Premature stream close, stream-client-abort, stream-client-disconnect, and
stream-error 之前只走
console.warn,需要 grep dev tee log 才能定位,生产模式根本没有 tee。PR #482 / #483 的 Errors tab + 角标只接 uncaught,捕获不到这些 expected-but-bad 故障。结果是上一次"修 premature close"基于一份不完整事故现场,已经修了几轮还在复现。本 PR 加一个
recordStreamCloseEvent帮助函数同时写data/error-log.jsonl(Errors tab 按签名分组)和logStore(/admin/logs审计流),覆盖 6 个调用点,把 rid + account + responseId + closeCode + eventCount + variantHash 这些诊断字段结构化保留。下次复现 premature close 可以直接看 Errors tab 按StreamUpstreamPrematureClose分组拉样本。Changes
src/logs/stream-close-event.ts:新 helperrecordStreamCloseEvent,类型client-abort/client-write-failed/upstream-error/upstream-premature4 种,按 kind 派发 ERROR_NAMES + BASE_MESSAGES,prune 空字段src/routes/shared/proxy-handler.ts:两处s.onAbort→kind=client-abort;UpstreamPrematureCloseError分支 →kind=upstream-premature(带 eventCount/hadReasoning/responseId/variantHash);扩展StreamDiagnostics字段 accountEntryId/variantHash 并在 streaming 路径透传src/routes/shared/response-processor.ts:catch (writeErr)→kind=client-write-failed(带 writtenChunks/Bytes/lastSentEvent/sentTerminal);上游 throw 的 catch →kind=upstream-error(带 upstreamStatus)src/routes/responses.ts:streamPassthrough内部catch (err)和if (!sawTerminal)两处 →kind=upstream-prematuresrc/logs/error-log.ts:readAppVersion加 try/catch,配置未加载时回退 "unknown"(修复 unit-test 直接调用 helper 时撞 `Config not loaded` 的 crash)tests/unit/logs/stream-close-event.test.ts:5 个单测覆盖 4 种 kind + 缺失 rid 兜底 + numeric upstreamStatus → audit status 透传Test Plan
Notes
Linked Issues
无。