Skip to content

fix(gain): use weighted savings rate in per-command stats#891

Open
FlorianBruniaux wants to merge 4 commits intodevelopfrom
fix/gain-weighted-avg
Open

fix(gain): use weighted savings rate in per-command stats#891
FlorianBruniaux wants to merge 4 commits intodevelopfrom
fix/gain-weighted-avg

Conversation

@FlorianBruniaux
Copy link
Copy Markdown
Collaborator

Summary

  • Replace AVG(savings_pct) with SUM(saved_tokens)*100.0/SUM(input_tokens) in the per-command SQL query — the old unweighted mean diluted high-volume commands (grep showed 14.8% instead of the real 98.6%)
  • Rename Avg% column header to Rate (more accurate)
  • Add doc warning on CommandStats type alias against using AVG() on savings_pct
  • Add regression test with intentionally skewed data proving weighted != unweighted

Root cause

Each invocation stores its own savings_pct. AVG() treated a 10-token call identically to a 10M-token call. Small calls at 0% savings (passthroughs, short greps) massively diluted the average — grep had 184M tokens saved out of 193M input, yet showed 14.8% instead of 95%.

Credit

Hat tip to Mathieu Grenier who spotted the inconsistency on LinkedIn: grep at 184M saved on 193M input showing 14.8% while global efficiency was 98%. Good instinct, real bug.

Test plan

  • cargo test --all passes (1135 tests)
  • rtk gain shows Rate >= 90% for grep/read (was 14-43%)
  • Global efficiency % unchanged (already correct, uses a different calculation)
  • Regression test fix: improve command robustness and flag support #14 asserts rate > 90% on skewed data (10% small call + 95% large call)

FlorianBruniaux and others added 3 commits March 28, 2026 01:06
Add checkout/merge/rebase to git pattern, run/publish to cargo pattern,
and build/exec to pnpm pattern in PATTERNS regex. Mark each as
RtkStatus::Passthrough in subcmd_status so discover correctly classifies
them as "already handled via passthrough" instead of "TOP UNHANDLED".

Also update test_registry_covers_all_git/cargo_subcommands to include
the new subcommands, and add 11 focused tests for classify + rewrite.

Impact: 1277 commands (git checkout 493, git merge 154, git rebase 137,
pnpm build 171, pnpm exec 173, cargo run 78, cargo publish 71) will
appear in "MISSED SAVINGS" instead of "TOP UNHANDLED COMMANDS".

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Adds a one-liner listing all supported ecosystems (ruff, pytest, pip,
golangci-lint, etc.) in the Architecture section to satisfy the pre-push
documentation validation script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
AVG(savings_pct) produced a simple unweighted mean that diluted the
reported rate for high-volume commands — grep showed 14.8% while the
true weighted rate was 98.6%, a 84-point gap.

Replace AVG(savings_pct) with SUM(saved_tokens)*100.0/SUM(input_tokens)
in the per-command SQL query so the rate reflects actual byte savings
rather than an average of per-invocation percentages.

Also:
- Rename "Avg%" column header to "Rate" (more accurate label)
- Add doc warning on CommandStats type alias against using AVG()
- Add regression test proving weighted != unweighted on skewed data

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Copilot AI review requested due to automatic review settings March 28, 2026 00:36
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR corrects rtk gain per-command savings to use a weighted savings rate (based on total tokens) rather than an unweighted average of per-invocation percentages, improving accuracy for high-volume commands.

Changes:

  • Update the per-command SQL aggregation to compute SUM(saved_tokens) * 100 / SUM(input_tokens) and rename the UI column header from Avg% to Rate.
  • Add documentation warning on the CommandStats tuple semantics and add a regression test proving weighted vs unweighted behavior.
  • Expand discover command classification coverage for additional git/cargo/pnpm subcommands (passthrough routing) and document supported ecosystems.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/core/tracking.rs Switch per-command savings from AVG(savings_pct) to a weighted rate; document semantics; add regression test.
src/analytics/gain.rs Update per-command table header from Avg% to Rate to reflect weighted rate semantics.
src/discover/rules.rs Extend discover patterns/rules to recognize more git/cargo/pnpm subcommands and mark them passthrough.
src/discover/registry.rs Update/extend tests ensuring registry covers the expanded subcommand set and passthrough classification/rewrites.
CLAUDE.md Add a one-line list of supported ecosystems.

/// # Warning
/// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`.
/// Do NOT aggregate this column with `AVG()` — that would produce an unweighted mean that
/// under-weights high-volume commands. Always use `SUM(saved) / SUM(input)` instead.
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning mixes “percentage” and “ratio” formulas: earlier it defines the field as SUM(saved_tokens) * 100.0 / SUM(input_tokens), but the last sentence says to use SUM(saved) / SUM(input) (missing the * 100.0 and column names). Please make the recommended aggregation formula consistent and explicitly in percent form to avoid copy/paste mistakes.

Suggested change
/// under-weights high-volume commands. Always use `SUM(saved) / SUM(input)` instead.
/// under-weights high-volume commands. Always recompute it as
/// `SUM(saved_tokens) * 100.0 / SUM(input_tokens)` instead.

Copilot uses AI. Check for mistakes.

let (_name, _count, _saved, rate, _time) = entry;

// Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99%
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regression test comment’s expected weighted value looks off: (10 + 95_000) * 100 / (100 + 100_000) is ~94.9%, not ~94.99%. Consider correcting the comment (or compute the expected rate and assert within a small epsilon) so the test documents the math accurately.

Suggested change
// Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99%
// Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.9%

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact final value is 94.9150849150849150 so yeah much more near 94.9 than 94.99

Comment on lines +78 to +79
Supported ecosystems: git/gh/gt, cargo, go/golangci-lint, npm/pnpm/npx, ruff/pytest/pip/mypy, rspec/rubocop/rake, dotnet, playwright/vitest/jest, docker/kubectl/aws.

Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description is centered on gain’s weighted savings calculation, but this change also updates general repository guidance (supported ecosystems list). Please either mention this documentation update in the PR description or move it to a separate doc-only PR to keep the change set focused.

Copilot uses AI. Check for mistakes.
Comment on lines +18 to +21
r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree|checkout|merge|rebase)",
r"^gh\s+(pr|issue|run|repo|api|release)",
r"^cargo\s+(build|test|clippy|check|fmt|install)",
r"^pnpm\s+(list|ls|outdated|install)",
r"^cargo\s+(build|test|clippy|check|fmt|install|run|publish)",
r"^pnpm\s+(list|ls|outdated|install|build|exec)",
Copy link

Copilot AI Mar 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description focuses on weighted savings rate in gain, but this hunk also expands Discover’s supported command patterns/statuses (git checkout/merge/rebase, cargo run/publish, pnpm build/exec). Please update the PR description to include this scope change (or split into a separate PR) so reviewers can evaluate the intent/risk appropriately.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@KuSh KuSh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

/// Type alias for command statistics tuple: (command, count, saved_tokens, weighted_savings_rate, avg_time_ms)
///
/// # Warning
/// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) * 100.0 / SUM(input_tokens)`.
/// The 4th field is a **weighted** savings rate: `SUM(saved_tokens) / SUM(input_tokens) * 100.0 .

Seems more idiomatic to express a percentage


let (_name, _count, _saved, rate, _time) = entry;

// Weighted rate = (10 + 95_000) * 100.0 / (100 + 100_000) ≈ 94.99%
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exact final value is 94.9150849150849150 so yeah much more near 94.9 than 94.99

// Patterns ordered to match RULES indices exactly.
pub const PATTERNS: &[&str] = &[
r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree)",
r"^git\s+(?:-[Cc]\s+\S+\s+)*(status|log|diff|show|add|commit|push|pull|branch|fetch|stash|worktree|checkout|merge|rebase)",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using alphabetically sorted list would ease maintenance

Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants