feat(cli): add rule scorecard for per-rule keep/tune/retire verdicts#243
Merged
Conversation
…dule The `rule backtest` and `rule coverage` commands own the JSON report documents the rest of the toolkit consumes, but their report structs lived inside each command module. Lift them into a shared `commands::reports` module that the producers build and serialize, so a future consumer can deserialize the very same types and the two cannot drift. The lifted structs are pure wire shapes. The runtime-only knobs (the backtest unexpected policy, the coverage fail-on-gaps flag) are no longer struct fields; they are threaded through the rendering and exit-code methods instead, so the shared types are exactly the JSON shape. Behavior-neutral and pinned by the existing backtest/coverage golden tests.
`rsigma rule scorecard` fuses the rule-side outputs the toolkit already emits into the per-rule keep/tune/retire verdict table a detection program reviews on a cadence. It reads JSON the toolkit already produces, so it adds no new collection or evaluation: an offline fusion-and-verdict layer. - Joins the required backtest report (precision proxy, recall, corpus false-positive signal) and coverage report (ATT&CK mapping, per-technique rule count) into a per-rule record keyed by rule_id, optionally enriched by a Prometheus production-volume snapshot or endpoint (joined by rule_title, with colliding titles summed and flagged), a Prometheus query-API range window for last-fired, and a triage disposition feed for the live false-positive ratio and MTTD/MTTR. Each cell records its source; a missing optional input degrades the verdict rather than blocking it. - Verdict bands default to the SOC quality-metrics thresholds and are configurable. A retire candidate that is the sole coverage for an ATT&CK technique is downgraded to tune with a coverage-risk note. - Renders through the global output-format layer plus a markdown/HTML report, with --fail-on for CI and the house exit codes (0/1/2/3). - The Prometheus exposition-snapshot parser is hand-rolled and std-only (the single untrusted-input surface, fuzzed by fuzz_scorecard_promtext); the query-API path reuses the existing ureq client. No new dependencies. A scorecard config section follows the layered-config conventions: the verdict thresholds carry single-source defaults pinned to the clap flags, and every input (including the two required reports) plus the report path can be supplied from the config file. rule coverage likewise now accepts its rule paths from coverage.rules.
Add the rule scorecard CLI reference and a Detection Scorecard guide page covering the keep/tune/retire verdict model and the review cadence, wire both into the nav, document the scorecard config section and the new coverage.rules key, cross-link the backtest, coverage, and CI/CD pages, refresh both READMEs, and add the CHANGELOG entry.
The reports refactor renamed `Report` to `BacktestReport` but left the module doc comment linking to `Report::build`, which `cargo doc -D warnings -D rustdoc::broken-intra-doc-links` rejects. Point it at `BacktestReport::build`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
rsigma rule scorecard, the fusion-and-verdict layer that turns the toolkit's existing rule-side outputs into the per-rule keep/tune/retire table a detection program reviews on a cadence. It reads JSON the toolkit already emits, so it adds no new collection or evaluation: an offlinerule-group command with no engine or hot-path involvement.rule_id, optionally enriched by a Prometheus production-volume snapshot or endpoint (joined byrule_title, colliding titles summed and flagged), a Prometheus query-API range window for last-fired, and a triage disposition feed for the live false-positive ratio and MTTD/MTTR. Each cell records its source; a missing optional input degrades the verdict rather than blocking it.--output-formatlayer (table/json/ndjson/csv/tsv) plus a--reportmarkdown or HTML artifact grouped by verdict, with--fail-on <none|tune|retire>for CI and the house exit codes (0/1/2/3).scorecardconfig section follows the layered-config conventions; the verdict thresholds carry single-source defaults pinned to the clap flags by a drift-guard test, and every input (including the two required reports) plus the report path can come from the config file.rule coveragelikewise now accepts its rule paths fromcoverage.rules.ureqclient.Three commits: a behavior-neutral refactor lifting the backtest/coverage report structs into a shared module (so the producers and the scorecard consumer share one definition), the command itself plus the configurable inputs, and the docs.
Test plan
cargo fmt --all -- --checkcargo clippy --workspace --all-targets --all-features -- -D warningscargo test -p rsigma(unit + integration, including the JSON and markdown goldens and the config-layering paths)cargo +nightly fuzz build fuzz_scorecard_promtextand a short run of the exposition parsermkdocs build --strictrule backtest/rule coveragegoldens still pass (the refactor is behavior-neutral)