[codex] fix forked session double-counting and optimize loading#910
[codex] fix forked session double-counting and optimize loading#910jackcpku wants to merge 2 commits intoryoppippi:mainfrom
Conversation
📝 WalkthroughWalkthroughReplaces bulk JSONL event loading with streaming, on-the-fly accumulation into summary maps and separates aggregation (accumulators) from row construction across daily, monthly, and session reports; commands now stream events via callbacks and report builders consume summaries. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,220,255,0.5)
participant CLI as CLI Command
end
rect rgba(200,255,200,0.5)
participant Loader as Data Loader (stream)
end
rect rgba(255,235,200,0.5)
participant Acc as Accumulator (summaries)
end
rect rgba(255,200,220,0.5)
participant Report as Report Rows Builder
end
rect rgba(240,240,240,0.5)
participant Pricing as PricingSource
end
CLI->>Loader: request events (since, until, timezone, onEvent)
Loader->>Loader: stream parse JSONL lines
Loader->>Acc: onEvent(event)
Acc-->>Loader: acknowledge
CLI->>Acc: after load -> check summaries.size
CLI->>Report: build...Rows(summaries, options)
Report->>Pricing: fetch pricing for unique models
Pricing-->>Report: pricing data
Report-->>CLI: rows (formatted report)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
apps/codex/src/data-loader.ts (1)
241-245: UseResult.try()for JSON parsing.The file already imports
Resultfrom@praha/byethrow. Replace the try-catch withResult.try()to match the codebase convention for handling throw-prone operations:const parsed = Result.try(() => JSON.parse(trimmed)); if (Result.isFailure(parsed)) { return undefined; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@apps/codex/src/data-loader.ts` around lines 241 - 245, Replace the manual try/catch around JSON.parse with the Result.try pattern: call Result.try(() => JSON.parse(trimmed)), check Result.isFailure on the returned result and return undefined on failure, and otherwise replace the previous parsed variable with the successful result's value (e.g., parsed = result.value) so downstream code still uses parsed as the parsed object; use the symbols Result.try, Result.isFailure, JSON.parse, trimmed, and parsed to locate and update the code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/codex/src/data-loader.ts`:
- Around line 280-295: The bug is that when a payload provides only
last_token_usage, state.previousTotals isn't updated, so later
falls-back-to-total_token_usage subtraction (subtractRawUsage) uses an outdated
baseline and double-counts; to fix, after computing raw from lastUsage (when
lastUsage != null) update state.previousTotals to reflect the same baseline used
(set state.previousTotals = totalUsage if totalUsage exists or to a value
derived from lastUsage/last_token_usage equivalent), ensuring functions
normalizeRawUsage, subtractRawUsage, and the variables lastUsage, totalUsage,
raw, and state.previousTotals remain consistent so subsequent total-token deltas
subtract from the correct baseline.
- Around line 444-469: The current try/catch around the for-await loop swallows
exceptions thrown by options.onEvent; fix this by separating streaming/parsing
errors from accumulator errors: inside the loop (function parseTokenUsageEvent,
isEventWithinRange, lines, events, sessionId, state), only parse and push events
while handling streaming I/O in the try/catch, and remove/avoid awaiting
options.onEvent there; after the loop finishes (or in a separate step), iterate
the collected events and call await options.onEvent(event) so any error from
options.onEvent bubbles up instead of being caught as a file-read error; ensure
lines.close() still runs in finally.
---
Nitpick comments:
In `@apps/codex/src/data-loader.ts`:
- Around line 241-245: Replace the manual try/catch around JSON.parse with the
Result.try pattern: call Result.try(() => JSON.parse(trimmed)), check
Result.isFailure on the returned result and return undefined on failure, and
otherwise replace the previous parsed variable with the successful result's
value (e.g., parsed = result.value) so downstream code still uses parsed as the
parsed object; use the symbols Result.try, Result.isFailure, JSON.parse,
trimmed, and parsed to locate and update the code.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7e167500-791c-47a8-8836-bea0ddf2ded9
📒 Files selected for processing (7)
apps/codex/src/commands/daily.tsapps/codex/src/commands/monthly.tsapps/codex/src/commands/session.tsapps/codex/src/daily-report.tsapps/codex/src/data-loader.tsapps/codex/src/monthly-report.tsapps/codex/src/session-report.ts
Why this PR matters
token_countprefix, and the previous loader counted that replay again. That inflated usage totals for fork-heavy workflows such as subagents.Headline impact
dailycould appear hung and use multi-GB RSSBenchmark impact
CODEX_HOME=/Users/jack/tmp/codex-home-one-day node dist/index.js daily --offline --since 20260327 --until 20260327 >/dev/null1.82s real,1243414528max RSS3.04s real,554549248max RSS55%lower peak RSS, but wall time is slower on this tiny subsetdaily --offline --since 20260327 --until 202603279.75s real,695648256max RSS5.02s,5.42 GBpeak1.15s,152 MBpeak4.4xfaster, about36xless memory46G224 MB220xless data to read when filters are narrowRoot cause
The previous implementation loaded
~/.codex/sessions/**/*.jsonlwithreadFile, split the whole file into lines, validated/parsing each line in the hot path, pushed every event into arrays, and only then applied reporting filters and aggregation.It also ignored
session_meta.payload.forked_from_id, so when a child session replayed ancestortoken_counthistory the loader treated that replay as fresh usage and double-counted it.What changed
apps/codex/src/data-loader.tsfromreadFile + split + per-line schema parsingto streaming line-by-line parsingdaily,monthly, andsessioncommands to build summaries directly during load instead of collecting and sorting every event firsttoken_countprefixes in forked sessions by followingforked_from_idancestry during loadValidation
Checks run:
pnpm --filter @ccusage/codex test -- src/data-loader.tspnpm --filter @ccusage/codex lintpnpm --filter @ccusage/codex typecheckpnpm --filter @ccusage/codex buildCLI validation:
cd /Users/jack/tmp/ccusage_repo-baseline/apps/codex && CODEX_HOME=/Users/jack/tmp/codex-home-one-day /usr/bin/time -l node dist/index.js daily --offline --since 20260327 --until 20260327 >/dev/nullnode dist/index.js daily --offline --since 20260327 --until 20260327on the real local Codex history completed in9.75s realwith695648256max RSSCODEX_HOME=/Users/jack/tmp/codex-home-one-day node dist/index.js daily --offline --since 20260327 --until 20260327completed in3.04s realwith554549248max RSSsessionoutput on the real local history for2026-03-27monthlyoutput on a one-day Codex subset home