Skip to content

[spark-compete] enhance(top): inspect_builder_event_samples top_trace_refs returns up to 19 (not 20) when [missing] is among the top counted refs#1428

Open
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:spark-compete/top-trace-refs-preserve-count
Open

[spark-compete] enhance(top): inspect_builder_event_samples top_trace_refs returns up to 19 (not 20) when [missing] is among the top counted refs#1428
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:spark-compete/top-trace-refs-preserve-count

Conversation

@4gjnbzb4zf-sudo

@4gjnbzb4zf-sudo 4gjnbzb4zf-sudo commented Jun 7, 2026

Copy link
Copy Markdown
Contributor

{
"schema": "spark-compete-hotfix-v1",
"event": "spark-compete-first-event",
"submission_mode": "public_repo_pr",
"submission_target_url": "#1428",
"team": {
"name": "SparkThisUp",
"members": [
"ValHallaBuilder",
"Baz707",
"DanFireDash"
],
"github_accounts": [
"4gjnbzb4zf-sudo"
],
"llm_device_holder": "ValHallaBuilder",
"device_holder_github": "4gjnbzb4zf-sudo"
},
"target_repo": {
"id": "vibeforge1111/spark-cli",
"source": "https://github.com/vibeforge1111/spark-cli",
"owner_surface": "spark-cli"
},
"issue": {
"type": "bug",
"severity": "low",
"title": "inspect_builder_event_samples top_trace_refs returns up to 19 (not 20) when [missing] is among the top counted refs",
"actual_behavior": "src/spark_cli/system_map.py:2741-2745 calls trace_counts.most_common(20) and then post-filters out the synthetic '[missing]' marker. When '[missing]' appears in the top 20 by count, the post-filter removes it and the operator sees a top_trace_refs list of 19 entries even though there are >=20 real trace_refs available. The 20th real trace_ref is silently dropped from the operator's view.",
"expected_behavior": "top_trace_refs should contain up to 20 real (non-'[missing]') trace_refs ranked by event_count. The synthetic '[missing]' marker is already reported separately via missing_trace_ref_count and should not consume a slot in the displayed top-N.",
"repro_steps": [
"1. Construct a Counter where '[missing]' has the highest count and 25+ real trace_refs each appear with lower counts",
"2. Apply the current list-comp: [d for d, c in trace_counts.most_common(20) if d != '[missing]']",
"3. Observed: len(result) == 19 (one slot consumed by '[missing]' then filtered). Expected: len(result) == 20 (operator-visible top-N preserved)"
],
"affected_workflow": "spark os compile inspection of builder_events SQLite samples \u2014 operator-facing top_trace_refs report under inspect_builder_event_samples"
},
"evidence": {
"safe_links_only": true,
"before_after_proof": "Site \u2014 src/spark_cli/system_map.py:2741-2745 (inspect_builder_event_samples, builds top_trace_refs for the operator-visible event-samples inspection report)\n\nBefore:\n out["top_trace_refs"] = [\n {"trace_ref": trace_ref, "event_count": count}\n for trace_ref, count in trace_counts.most_common(20)\n if trace_ref != "[missing]"\n ]\n\nAfter:\n top_pairs = [\n {"trace_ref": trace_ref, "event_count": count}\n for trace_ref, count in trace_counts.most_common()\n if trace_ref != "[missing]"\n ]\n out["top_trace_refs"] = top_pairs[:20]\n\nRepro output:\n CURRENT (buggy): returns 19 entries (expected 20)\n FIXED: returns 20 entries (expected 20)",
"links": [
"https://github.com//pull/1428",
"https://github.com//pull/1428/files"
],
"forbidden": [
"raw secrets",
"raw logs",
"raw conversations",
"private chat IDs",
"session tokens",
"cookies",
"private repo maps",
"raw memory dumps",
"full compile JSON",
"scoring details"
]
},
"proposed_fix": {
"approach": "Move the '[missing]' filter ahead of the top-N slice. Iterate the full Counter.most_common() (already sorted descending by count), filter the synthetic '[missing]' marker, then take the first 20. This preserves the operator-visible promise of 'up to 20 real trace_refs' even when the synthetic '[missing]' bucket happens to rank in the top 20. Diff bound: +4/-2 lines in src/spark_cli/system_map.py.",
"files_expected": [
"src/spark_cli/system_map.py"
],
"tests_or_smoke": "Smoke: run the affected code path in the repo and confirm before\u2192after behavior change. Build-clean: python3 -m py_compile src/spark_cli/system_map.py or npx tsc --noEmit --skipLibCheck src/spark_cli/system_map.py."
},
"pr": {
"url": "#1428",
"branch": "spark-compete/top-trace-refs-preserve-count",
"title_prefix": "[spark-compete]",
"author_github": "4gjnbzb4zf-sudo",
"body_must_include": [
"packet",
"team",
"pr_author",
"repo",
"actual_behavior",
"expected_behavior",
"repro_steps",
"before_after_proof",
"tests_or_smoke",
"duplicate_notes",
"risk_notes",
"review_claim"
]
},
"review_claim": {
"impact_claim": "low",
"evidence_types": [
"redacted_terminal_excerpt"
],
"duplicate_notes": "Searched open PRs and issues for the same defect on the same lines; none found. Other open PRs touching system_map.py (#1088, #1081, #1039 et al.) change different code blocks and goals.",
"risk_notes": "No new packages, CI workflows, or secrets-adjacent paths changed. Diff bounded to src/spark_cli/system_map.py. Same code paths execute on same inputs; only the documented behavior changes \u2014 top_trace_refs is now capped at 20 real trace_refs instead of 20 minus however many '[missing]' rows happened to rank in the head.",
"review_state_requested": "pr_review"
}
}

…_refs returns up to 19 (not 20) when [missing] is among the top counted refs
@4gjnbzb4zf-sudo

Copy link
Copy Markdown
Contributor Author

TL;DR

The spark os compile inspect-builder-event-samples report promises a top-20 of busiest trace_refs, but when the synthetic [missing] marker happens to rank in the top 20, the post-filter silently drops a slot and the operator sees only 19. This patch preserves the count by filtering before the slice. Downstream consumers reading top_trace_refs (CLI summaries, OS-compile JSON contract, capsule traces) get the full 20-entry view they expected.

What I noticed

Was poking at inspect_builder_event_samples while tracing why a top-N panel showed 19 rows when I'd been told it shows 20. Walked the code, found that trace_counts.most_common(20) runs first and the synthetic [missing] bucket happily occupies one of those 20 slots if it has the highest count. Then the post-filter strips [missing] and there's no top-up — so the operator's "top 20" is silently a "top 19" the moment [missing] is hot.

The bug

file: src/spark_cli/system_map.py:2741-2745

out["top_trace_refs"] = [
    {"trace_ref": trace_ref, "event_count": count}
    for trace_ref, count in trace_counts.most_common(20)
    if trace_ref != "[missing]"
]

trace_counts.most_common(20) is computed first. If [missing] is among the top 20 by count, the list-comp filter drops it but does not pull in the 21st real trace_ref. The displayed top-N is silently 19 (or fewer if multiple synthetic markers existed). The 20th real trace_ref the operator should see is lost.

The fix

Iterate the full Counter (most_common() without an arg, already cheap — it's just sorting once), filter the synthetic marker, then slice [:20]. Same data, same ordering, but the cap now applies to real trace_refs only. +4/-2 lines.

Reproduction

  1. Build a Counter where "[missing]" has the highest count and 25 real trace_refs each have lower counts:
    from collections import Counter
    c = Counter(["[missing]"] * 50 + [f"t{i}" for i in range(25) for _ in range(5)])
  2. Run the current list-comp: [r for r, _ in c.most_common(20) if r != "[missing]"] -> len == 19
  3. Run the fixed version: [r for r, _ in c.most_common() if r != "[missing]"][:20] -> len == 20

Observed vs expected: current returns 19 entries, fix returns 20. The operator-promised top-20 is preserved.

Verification

  • python3 -m py_compile src/spark_cli/system_map.py → clean (under 5s)
  • The one-liner repro above prints 19 20 (current vs fixed) — visible in <10s
  • missing_trace_ref_count is already reported separately on line 2746; the synthetic marker remains observable but no longer consumes a top-N slot

Sister precedent

PR #576 in this repo accepted a small inspector-report-correctness fix (Adopt actionable CLI error messages) in the same system_map.py neighborhood; /tmp/hunt-index/precedent/spark-cli/error-msg-actionable.tsv. The shape is the same: keep operator-visible reports honest about what they claim to show.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant