fix(gain): use weighted savings rate in per-command stats#891
fix(gain): use weighted savings rate in per-command stats#891FlorianBruniaux wants to merge 4 commits intodevelopfrom
Conversation
Add checkout/merge/rebase to git pattern, run/publish to cargo pattern, and build/exec to pnpm pattern in PATTERNS regex. Mark each as RtkStatus::Passthrough in subcmd_status so discover correctly classifies them as "already handled via passthrough" instead of "TOP UNHANDLED". Also update test_registry_covers_all_git/cargo_subcommands to include the new subcommands, and add 11 focused tests for classify + rewrite. Impact: 1277 commands (git checkout 493, git merge 154, git rebase 137, pnpm build 171, pnpm exec 173, cargo run 78, cargo publish 71) will appear in "MISSED SAVINGS" instead of "TOP UNHANDLED COMMANDS". Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Adds a one-liner listing all supported ecosystems (ruff, pytest, pip, golangci-lint, etc.) in the Architecture section to satisfy the pre-push documentation validation script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
AVG(savings_pct) produced a simple unweighted mean that diluted the reported rate for high-volume commands — grep showed 14.8% while the true weighted rate was 98.6%, a 84-point gap. Replace AVG(savings_pct) with SUM(saved_tokens)*100.0/SUM(input_tokens) in the per-command SQL query so the rate reflects actual byte savings rather than an average of per-invocation percentages. Also: - Rename "Avg%" column header to "Rate" (more accurate label) - Add doc warning on CommandStats type alias against using AVG() - Add regression test proving weighted != unweighted on skewed data Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
There was a problem hiding this comment.
Pull request overview
This PR corrects rtk gain per-command savings to use a weighted savings rate (based on total tokens) rather than an unweighted average of per-invocation percentages, improving accuracy for high-volume commands.
Changes:
- Update the per-command SQL aggregation to compute
SUM(saved_tokens) * 100 / SUM(input_tokens)and rename the UI column header fromAvg%toRate. - Add documentation warning on the
CommandStatstuple semantics and add a regression test proving weighted vs unweighted behavior. - Expand
discovercommand classification coverage for additional git/cargo/pnpm subcommands (passthrough routing) and document supported ecosystems.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/core/tracking.rs | Switch per-command savings from AVG(savings_pct) to a weighted rate; document semantics; add regression test. |
| src/analytics/gain.rs | Update per-command table header from Avg% to Rate to reflect weighted rate semantics. |
| src/discover/rules.rs | Extend discover patterns/rules to recognize more git/cargo/pnpm subcommands and mark them passthrough. |
| src/discover/registry.rs | Update/extend tests ensuring registry covers the expanded subcommand set and passthrough classification/rewrites. |
| CLAUDE.md | Add a one-line list of supported ecosystems. |
| /// # Warning | ||
| /// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`. | ||
| /// Do NOT aggregate this column with `AVG()` — that would produce an unweighted mean that | ||
| /// under-weights high-volume commands. Always use `SUM(saved) / SUM(input)` instead. |
There was a problem hiding this comment.
The warning mixes “percentage” and “ratio” formulas: earlier it defines the field as SUM(saved_tokens) * 100.0 / SUM(input_tokens), but the last sentence says to use SUM(saved) / SUM(input) (missing the * 100.0 and column names). Please make the recommended aggregation formula consistent and explicitly in percent form to avoid copy/paste mistakes.
| /// under-weights high-volume commands. Always use `SUM(saved) / SUM(input)` instead. | |
| /// under-weights high-volume commands. Always recompute it as | |
| /// `SUM(saved_tokens) * 100.0 / SUM(input_tokens)` instead. |
|
|
||
| let (_name, _count, _saved, rate, _time) = entry; | ||
|
|
||
| // Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99% |
There was a problem hiding this comment.
The regression test comment’s expected weighted value looks off: (10 + 95_000) * 100 / (100 + 100_000) is ~94.9%, not ~94.99%. Consider correcting the comment (or compute the expected rate and assert within a small epsilon) so the test documents the math accurately.
| // Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99% | |
| // Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.9% |
There was a problem hiding this comment.
Exact final value is 94.9150849150849150 so yeah much more near 94.9 than 94.99
| Supported ecosystems: git/gh/gt, cargo, go/golangci-lint, npm/pnpm/npx, ruff/pytest/pip/mypy, rspec/rubocop/rake, dotnet, playwright/vitest/jest, docker/kubectl/aws. | ||
|
|
There was a problem hiding this comment.
PR description is centered on gain’s weighted savings calculation, but this change also updates general repository guidance (supported ecosystems list). Please either mention this documentation update in the PR description or move it to a separate doc-only PR to keep the change set focused.
| r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree|checkout|merge|rebase)", | ||
| r"^gh\s+(pr|issue|run|repo|api|release)", | ||
| r"^cargo\s+(build|test|clippy|check|fmt|install)", | ||
| r"^pnpm\s+(list|ls|outdated|install)", | ||
| r"^cargo\s+(build|test|clippy|check|fmt|install|run|publish)", | ||
| r"^pnpm\s+(list|ls|outdated|install|build|exec)", |
There was a problem hiding this comment.
PR description focuses on weighted savings rate in gain, but this hunk also expands Discover’s supported command patterns/statuses (git checkout/merge/rebase, cargo run/publish, pnpm build/exec). Please update the PR description to include this scope change (or split into a separate PR) so reviewers can evaluate the intent/risk appropriately.
| /// Type alias for command statistics tuple: (command, count, saved_tokens, weighted_savings_rate, avg_time_ms) | ||
| /// | ||
| /// # Warning | ||
| /// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`. |
There was a problem hiding this comment.
| /// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`. | |
| /// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) / SUM(input_tokens) * 100.0 . |
Seems more idiomatic to express a percentage
|
|
||
| let (_name, _count, _saved, rate, _time) = entry; | ||
|
|
||
| // Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99% |
There was a problem hiding this comment.
Exact final value is 94.9150849150849150 so yeah much more near 94.9 than 94.99
| // Patterns ordered to match RULES indices exactly. | ||
| pub const PATTERNS: &[&str] = &[ | ||
| r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree)", | ||
| r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree|checkout|merge|rebase)", |
There was a problem hiding this comment.
Using alphabetically sorted list would ease maintenance
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Summary
AVG(savings_pct)withSUM(saved_tokens)*100.0/SUM(input_tokens)in the per-command SQL query — the old unweighted mean diluted high-volume commands (grepshowed 14.8% instead of the real 98.6%)Avg%column header toRate(more accurate)CommandStatstype alias against usingAVG()onsavings_pctRoot cause
Each invocation stores its own
savings_pct.AVG()treated a 10-token call identically to a 10M-token call. Small calls at 0% savings (passthroughs, short greps) massively diluted the average —grephad 184M tokens saved out of 193M input, yet showed 14.8% instead of 95%.Credit
Hat tip to Mathieu Grenier who spotted the inconsistency on LinkedIn: grep at 184M saved on 193M input showing 14.8% while global efficiency was 98%. Good instinct, real bug.
Test plan
cargo test --allpasses (1135 tests)rtk gainshowsRate >= 90%for grep/read (was 14-43%)rate > 90%on skewed data (10% small call + 95% large call)