fix(sentry): unpack better-sqlite3 native chain + skip recoverable groom/oversized-session telemetry#1142
Conversation
📝 WalkthroughWalkthroughThree targeted fixes: ChangesSentry noise reduction for expected errors
ASAR Unpack Expansion
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
package.json (1)
92-94: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAdd a regression assertion for the new unpacked dependencies.
src/__tests__/main/stats/integration.test.ts:762-775currently only checks fornode_modules/better-sqlite3/**/*, so this fix can regress silently ifbindingsorfile-uri-to-pathget dropped later. Please extend that test to assert these two newasarUnpackentries as well.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@package.json` around lines 92 - 94, The unpacked dependency list was expanded, but the regression test still only asserts the better-sqlite3 entry. Update the integration test in stats integration to also check for the new asarUnpack entries for bindings and file-uri-to-path, using the existing assertion pattern around the unpack list so future drops are caught. Refer to the stats integration test that validates the unpacked dependency set and extend the expectations there.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@package.json`:
- Around line 92-94: The unpacked dependency list was expanded, but the
regression test still only asserts the better-sqlite3 entry. Update the
integration test in stats integration to also check for the new asarUnpack
entries for bindings and file-uri-to-path, using the existing assertion pattern
around the unpack list so future drops are caught. Refer to the stats
integration test that validates the unpacked dependency set and extend the
expectations there.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d2a88312-3d91-45fe-af59-9009a8b01230
📒 Files selected for processing (3)
package.jsonsrc/__tests__/shared/sentryFilters.test.tssrc/shared/sentryFilters.ts
Greptile SummaryThis PR fixes two production crash-noise paths. The main changes are:
Confidence Score: 5/5This looks safe to merge after a small observability cleanup.
src/shared/sentryFilters.ts Important Files Changed
Reviews (1): Last reviewed commit: "fix(sentry): unpack better-sqlite3 nativ..." | Re-trigger Greptile |
| // a bug we can fix from inside the app. Real grooming failures (timeouts, | ||
| // spawn errors, genuine agent crashes) carry different messages and still | ||
| // surface. | ||
| if (/Grooming error: Session not found/i.test(haystack)) return true; |
There was a problem hiding this comment.
Recoverability Is Inferred From Text
When a grooming agent reports any error that the parser normalizes to Session not found, this new filter drops the event solely from the wrapped message. The parser can produce that friendly message from broad session.*not found output, so a real session-id propagation bug during grooming can be hidden from Sentry instead of being reported.
Context Used: CLAUDE.md (source)
…oom/oversized-session telemetry Three stable-channel (main) field-crash fixes, validated against Sentry (smash-labs/maestro) on releases 0.17.1/0.17.2 (channel:stable). Fixes 2 and 3 mirror rc PR #1141 exactly so the rc->main merge converges with no conflict. 1. Native bindings packaging (MAESTRO-TE, MAESTRO-Q3, MAESTRO-JV, plus the MAESTRO-TC/TD "Could not locate the bindings file" cluster). better-sqlite3 depends on bindings, which depends on file-uri-to-path, but asarUnpack only unpacked better-sqlite3 and node-pty. When the unpacked bindings.js required file-uri-to-path (still inside app.asar), resolution could not cross back into the archive ("Cannot find module 'file-uri-to-path'"); the mixed packed/unpacked layout also produced "Could not locate the bindings file" when bindings computed an in-archive search root for the .node. Unpack the full native dependency closure (bindings + file-uri-to-path) so the whole chain resolves from app.asar.unpacked. 2. Recoverable grooming session loss (MAESTRO-JB, 65 events). Group-chat context summary spawns a batch agent; if the participant's provider session was deleted mid-summary the agent emits a recoverable "Session not found" error (error-patterns.ts session_not_found) and we already fall back to a fresh session. Skip captureException for that case in groupChat.ts; real summary failures still report. 3. Oversized session files (MAESTRO-M9, regressed onto stable). The getGlobalStats parse loops in agentSessions.ts read each session file into a single string; a file too large throws "RangeError: Invalid string length". The #1115 carve-out was lost in the statsCache refactor that reached main. Re-skip the expected RangeError in both the Claude and Codex loops, mirroring the storage-layer pattern.
424d7d9 to
32d432e
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/main/ipc/handlers/agentSessions.ts`:
- Around line 1002-1014: The oversized-session RangeError path in agentSessions
needs to cache a sentinel result instead of only warning and continuing, because
getGlobalStats will re-read the same file on every refresh. Update the parsing
flow around the session aggregation logic and the file-processing branches in
agentSessions so the expected “too large to parse” case stores a zeroed/sentinel
cache entry keyed by fileMtimeMs, and only retries when the file timestamp
changes; keep the existing non-RangeError failure handling separate from this
special-case cache behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: b59f37f9-69f5-4bad-a460-57a37d116fd2
📒 Files selected for processing (3)
package.jsonsrc/main/ipc/handlers/agentSessions.tssrc/main/ipc/handlers/groupChat.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- package.json
| // A session file too large to read into a single V8 string throws | ||
| // `RangeError: Invalid string length` (MAESTRO-M9). That's an expected | ||
| // boundary for huge sessions, not a bug - skip it and keep aggregating | ||
| // the rest. Mirrors the storage-layer carve-out in | ||
| // claude-/codex-session-storage.ts. | ||
| if (error instanceof RangeError) { | ||
| logger.warn(`Claude session file too large to parse: ${file.sessionKey}`, LOG_CONTEXT); | ||
| } else { | ||
| void captureException(error); | ||
| logger.warn(`Failed to parse Claude session: ${file.sessionKey}`, LOG_CONTEXT, { | ||
| error, | ||
| }); | ||
| } |
There was a problem hiding this comment.
🚀 Performance & Scalability | 🟠 Major | ⚡ Quick win
Cache oversized-session skips instead of retrying them on every stats refresh.
These branches only log and continue. Because Lines 961-968 reprocess any session without a cached entry, the same oversized file will be re-read, re-throw, and re-warn on every agentSessions:getGlobalStats call. Persist a sentinel/zeroed cache entry keyed by fileMtimeMs for the expected “too large” case so it is only retried after the file changes.
Also applies to: 1038-1048
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/main/ipc/handlers/agentSessions.ts` around lines 1002 - 1014, The
oversized-session RangeError path in agentSessions needs to cache a sentinel
result instead of only warning and continuing, because getGlobalStats will
re-read the same file on every refresh. Update the parsing flow around the
session aggregation logic and the file-processing branches in agentSessions so
the expected “too large to parse” case stores a zeroed/sentinel cache entry
keyed by fileMtimeMs, and only retries when the file timestamp changes; keep the
existing non-RangeError failure handling separate from this special-case cache
behavior.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32d432e4af
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // report that to Sentry (MAESTRO-JB); only surface unexpected | ||
| // grooming failures. | ||
| const message = error instanceof Error ? error.message : String(error); | ||
| if (!/Session not found/i.test(message)) { |
There was a problem hiding this comment.
Match Codex missing-session grooming errors
When a Codex group-chat participant's stored thread has been pruned, the session_not_found parser emits Previous Codex session could not be found. Starting fresh conversation. from src/main/parsers/error-patterns.ts:529-531, and groomContext wraps that text before this catch sees it. This regex only drops the literal phrase Session not found, so those recoverable reset failures still call captureException and keep producing the Sentry noise this change is meant to remove.
Useful? React with 👍 / 👎.
Summary
Three field-crash fixes for the stable (main) channel, sourced from Sentry (
smash-labs/maestro) and validated against the code + the affected releases (0.17.1 / 0.17.2,channel:stable). Fixes 2 and 3 are byte-identical mirrors of rc PR #1141 so the rc->main merge converges with no conflict; fix 1 is unique to main.1. Native bindings packaging (escalating on current main)
Sentry: MAESTRO-TE (
Cannot find module 'file-uri-to-path', escalating), MAESTRO-Q3 / MAESTRO-JV / MAESTRO-TC / MAESTRO-TD (Could not locate the bindings file). ~40+ events on 0.17.2.better-sqlite3->bindings->file-uri-to-path, butasarUnpackonly listedbetter-sqlite3andnode-pty:bindings.jsdoesrequire('file-uri-to-path')at load. Withfile-uri-to-pathstill insideapp.asar, resolution fromapp.asar.unpackedcannot cross back into the archive -> "Cannot find module 'file-uri-to-path'" (MAESTRO-TE). The require stack confirms it:app.asar.unpacked/.../bindings/bindings.jsfailing to load a module left in the asar.bindingscompute an in-archive search root for the compiled.node, producing "Could not locate the bindings file. Tried: .../app.asar/.../better_sqlite3.node" (MAESTRO-Q3/JV, viaStatsDB.initializeandinitCueDb).Fix: unpack the full native dependency closure so the whole chain resolves consistently from
app.asar.unpacked:node-ptyneeds nothing extra (its only dep,node-addon-api, is build-time headers; it loads its.nodedirectly).2. Recoverable grooming session loss (MAESTRO-JB, 65 events, stable)
Group-chat context summary spawns a batch agent. If the participant's provider session was deleted mid-summary, the agent emits a recoverable
Session not founderror (error-patterns.tssession_not_found,recoverable: true) and the code already falls back to a fresh session. The explicitcaptureExceptioningroupChat.tsreported it anyway. Skip it for that message; real summary failures still report. (Mirrors rc #1141.)3. Oversized session files (MAESTRO-M9, regressed onto stable)
getGlobalStatsreads each session file into a single string; a file too large throwsRangeError: Invalid string length. The #1115 carve-out was lost whengetGlobalStatswas refactored ontostatsCache(that refactor reached main, so M9 came back onchannel:stable, release 0.17.1). Re-skip the expectedRangeErrorin both the Claude and Codex parse loops inagentSessions.ts, mirroring the storage-layer pattern (claude-/codex-session-storage.ts). (Mirrors rc #1141.)Out of scope (intentionally not touched)
channel:rc(zero stable events). The code is latent on main but does not fire on stable; rc fix(sentry): silence expected-condition telemetry noise on rc #1141 fixes it and it reaches main on the next merge. Not a main field issue.agent-probe-failed):markFailedalready guards withisExpectedConnectionFailure; residual events are genuine signal.Failed to load required prompt), MAESTRO-T8/T9 (shellLogs is not iterable): root cause not obvious enough to fix confidently.spawn E2BIG/EINVAL, ENOSPC: already filtered or build/environment issues.Testing
vitest runforagentSessions.test.ts(20) +groupChat.test.ts(54) -> all pass.prettier --checkandeslintclean on changed files; type-check shows no errors in changed files.The added lines in fixes 2 and 3 are verified byte-identical to rc PR #1141.
Summary by CodeRabbit