Skip to content

[spark-compete] Narrow bare except in inspect_builder_trace_groups + inspect_builder_trace_health to (sqlite3.Error, OSError)#535

Closed
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:sentinel/exception-narrow/system-map-trace-groups-and-health
Closed

[spark-compete] Narrow bare except in inspect_builder_trace_groups + inspect_builder_trace_health to (sqlite3.Error, OSError)#535
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:sentinel/exception-narrow/system-map-trace-groups-and-health

Conversation

@4gjnbzb4zf-sudo

@4gjnbzb4zf-sudo 4gjnbzb4zf-sudo commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

{
"schema": "spark-compete-hotfix-v1",
"event": "spark-compete-first-event",
"submission_mode": "public_repo_pr",
"submission_target_url": "#535",
"team": {
"name": "SparkThisUp",
"members": ["ValHallaBuilder", "Baz707", "DanFireDash"],
"github_accounts": ["4gjnbzb4zf-sudo"],
"llm_device_holder": "ValHallaBuilder",
"device_holder_github": "4gjnbzb4zf-sudo"
},
"target_repo": {
"id": "vibeforge1111/spark-cli",
"source": "https://github.com/vibeforge1111/spark-cli",
"owner_surface": "spark-cli"
},
"issue": {
"type": "bug",
"severity": "low",
"title": "inspect_builder_trace_groups and inspect_builder_trace_health swallow programming bugs as generic Exception under the system-map readout",
"actual_behavior": "src/spark_cli/system_map.py defines two read-only sqlite introspection helpers on the builder_events table: inspect_builder_trace_groups (L2761) groups events by trace_ref for the trace-group summary, and inspect_builder_trace_health (L2943) aggregates per-trace health flags (missing_trace_refs, orphan_parent_event_ids, open_high_severity_events). Both open the read-only sqlite3 URI connection inside a try/finally, run a fixed query plan, and use the outer try block to surface 'state.db is unreadable' as out['error']. The except clause is except Exception as exc: out['error'] = f'{type(exc).__name__}: {exc}', so any unrelated runtime error (KeyError from a typo in a column name, AttributeError from a refactor) gets folded into the same 'state.db error' field; the operator looks at the system-map output and assumes the db is broken when really a programming bug in the helper hid the real failure.",
"expected_behavior": "The same out['error'] short-circuit fires only for the documented failure modes: sqlite3.Error (DatabaseError when state.db is not a sqlite file, OperationalError when the URI is unparseable, etc.) and OSError (when the file disappears between the .exists() check and the connect). Programming bugs propagate as a real traceback.",
"repro_steps": [
"gh pr checkout ",
"PYTHONPATH=src python -c "from spark_cli.system_map import inspect_builder_trace_groups, inspect_builder_trace_health; from pathlib import Path; import tempfile; d=Path(tempfile.mkdtemp())/'spark-intelligence'; d.mkdir(); (d/'state.db').write_text('not a sqlite db'); print(inspect_builder_trace_groups(d)['error']); print(inspect_builder_trace_health(d)['error'])" -> 'DatabaseError: file is not a database' for both, surfaced via the narrow tuple.",
"PYTHONPATH=src python -m unittest tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_groups_records_sqlite_error_when_db_corrupt tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_health_records_sqlite_error_when_db_corrupt -> 2 tests OK.",
"Inverse: drop a typo into one of the conn.execute(...) column names (e.g. rename 'event_id' to 'event_idd'); before the change, the resulting sqlite3.OperationalError on the bad column is still routed through 'state.db error' (correct), but if you introduce a Python-side typo (e.g. groups.appendZZZ(...)), the AttributeError gets silently swallowed and the operator sees nothing. After the change, the AttributeError propagates so the operator sees the real bug."
],
"affected_workflow": "system_map.py is the system-map compile module driven by spark report system-map, which is used to render the operator-facing builder snapshot showing trace groups (which workflows landed events recently) and trace health (how many traces are orphaned or missing parent links). When inspect_builder_trace_groups or inspect_builder_trace_health silently swallow a programming bug, the operator's snapshot reports 'state.db error' and they assume the database is broken, instead of seeing the real Python traceback that would point at the broken helper. The trace-health flag aggregation is the same shape used by spark intelligence-builder's health surface, so a typo-hidden failure here also corrupts the health flag rollup that downstream consumers report on."
},
"evidence": {
"safe_links_only": true,
"before_after_proof": "Two paired try/except patterns. Before: except Exception as exc: out['error'] = f'{type(exc).__name__}: {exc}'. After: except (sqlite3.Error, OSError) as exc: out['error'] = f'{type(exc).__name__}: {exc}'. The except bodies are byte-identical; only the catch tuple changes. sqlite3.Error covers DatabaseError, OperationalError, ProgrammingError, and IntegrityError, which are the only classes the read-only sqlite3 path can raise inside the try block; OSError covers FileNotFoundError when the file vanishes between the .exists() check and the connect. Diff: 2 insertions, 2 deletions in src/spark_cli/system_map.py plus 26 insertions in tests/test_system_map.py for two corrupt-db assertion tests.",
"links": ["https://github.com//pull/535"],
"forbidden": ["pdf","zip","exe","unknown downloads","shortened links","archives","binaries","tokens","browser cookies","wallet material","raw logs","raw conversations","raw memory","raw patches","private repo maps","private scoring details"]
},
"proposed_fix": {
"approach": "Replace except Exception as exc: with except (sqlite3.Error, OSError) as exc: in both inspect_builder_trace_groups (L2848) and inspect_builder_trace_health (L3111). The narrow tuple is exhaustive for the read-only path: sqlite3.Error is the base class for every exception the sqlite3 driver raises (DatabaseError when the file is not a sqlite db, OperationalError when the URI is unparseable, ProgrammingError when the cursor is misused), and OSError covers FileNotFoundError when state.db disappears between .exists() and connect. The two new tests assert that a corrupt state.db still routes through the narrow tuple and produces a non-generic error class name. Two corrupt-db tests are added in tests/test_system_map.py.",
"files_expected": ["src/spark_cli/system_map.py", "tests/test_system_map.py"],
"tests_or_smoke": "Manual smoke verified: missing state.db returns early via the .exists() check (no traceback). A corrupt state.db routes through the narrow tuple and surfaces as DatabaseError: file is not a database for both helpers. The two new corrupt-db unit tests pass: PYTHONPATH=src python3 -m unittest tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_groups_records_sqlite_error_when_db_corrupt tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_health_records_sqlite_error_when_db_corrupt -> 2 tests OK. AST parse of system_map.py is clean."
},
"pr": {
"branch": "sentinel/exception-narrow/system-map-trace-groups-and-health",
"title_prefix": "[spark-compete]",
"author_github": "4gjnbzb4zf-sudo",
"body_must_include": ["packet","team","pr_author","repo","actual_behavior","expected_behavior","repro_steps","before_after_proof","tests_or_smoke","duplicate_notes","risk_notes","review_claim"],
"url": "#535"
},
"review_claim": {
"impact_claim": "low",
"evidence_types": ["redacted_terminal_excerpt"],
"duplicate_notes": "Pre-flight gh pr list --search 'inspect_builder_trace_groups OR inspect_builder_trace_health narrow' returned zero hits. Our own #526 narrowed six other builder_events sqlite helpers in this same file (inspect_builder_state_db, inspect_builder_request_id_overlap, inspect_builder_trace_ref_overlap, inspect_builder_memory_tables, inspect_builder_event_trace, inspect_builder_event_samples). inspect_builder_trace_groups (L2848) and inspect_builder_trace_health (L3111) are the two remaining sqlite introspection helpers in system_map.py that #526 did not touch. No competitor PR touches these helpers. Same fix shape as our own #526; the narrow tuple is identical because the read-only sqlite path has the same failure modes.",
"risk_notes": "Local scope: one file changed for the fix, plus tests. The except bodies are byte-identical pre/post for both helpers. The missing-state.db fast path is unchanged. Corrupt-db and connect-failure conditions still trigger the documented out['error'] short-circuit. The only observable behavior change is that programming bugs in the helpers no longer get silently swallowed -- the goal of the fix.",
"review_state_requested": "pr_review"
}
}

…elpers to (sqlite3.Error, OSError)

inspect_builder_trace_groups (L2848) and inspect_builder_trace_health (L3111)
follow the same shape as the six helpers narrowed in vibeforge1111#526: open a read-only
sqlite3 connection inside a try/finally, run a fixed query plan, and use the
outer try block solely to surface 'state.db is unreadable' as out['error'].
With `sqlite3.connect(..., mode=ro)` the only failure modes are sqlite3.Error
(DatabaseError when the file is not a sqlite db, OperationalError when the
URI is unparseable, etc.) and OSError when the file disappears between the
.exists() check and the connect. The bare Exception swallows programming
bugs (KeyError from a typo in a column name, AttributeError from a refactor)
into the same out['error'] field, hiding them from the operator.

Narrow tuple is exhaustive for the read path; except body is byte-identical
pre/post. Tests assert a corrupt state.db still routes through the narrow
classes and produces a non-generic error class name.
@4gjnbzb4zf-sudo

Copy link
Copy Markdown
Contributor Author

QA write-up — for reviewer eyes

TL;DR. inspect_builder_trace_groups (L2761) and inspect_builder_trace_health (L2943) in src/spark_cli/system_map.py are the last two read-only sqlite introspection helpers on builder_events that #526 did not touch. Both open a read-only sqlite3 URI connection inside a try/finally and use the outer try block to surface "state.db is unreadable" as out["error"]. Narrow each to (sqlite3.Error, OSError), same shape as the six helpers in #526.

Before

def inspect_builder_trace_groups(builder_home: Path, *, group_limit=12, ...) -> dict[str, Any]:
    ...
    try:
        conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
        conn.row_factory = sqlite3.Row
        try:
            ...
        finally:
            conn.close()
    except Exception as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"
    return out


def inspect_builder_trace_health(builder_home: Path) -> dict[str, Any]:
    ...
    try:
        conn = sqlite3.connect(f"file:{db_path}?mode=ro", uri=True)
        conn.row_factory = sqlite3.Row
        try:
            ...
            out["health_flags"] = flags
        finally:
            conn.close()
    except Exception as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"
    return out

If a typo lands on a Python-side statement inside the try block (e.g. groups.appendZZZ(...)), the AttributeError gets swallowed and the operator's system-map snapshot reports out["error"]: AttributeError: ... as if the database were broken — the real Python bug is hidden behind a sqlite-shaped error label.

After

    except (sqlite3.Error, OSError) as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"

Same out-error short-circuit fires for the documented failure modes: sqlite3.Error (DatabaseError when state.db is not a sqlite file, OperationalError when the URI is unparseable, ProgrammingError when the cursor is misused), OSError (FileNotFoundError when state.db vanishes between the .exists() check and the connect). Programming bugs propagate as a real traceback.

Smoke

$ PYTHONPATH=src python3 -c "
from pathlib import Path
import tempfile, sqlite3
from spark_cli.system_map import inspect_builder_trace_groups, inspect_builder_trace_health

d = Path(tempfile.mkdtemp()) / 'spark-intelligence'
d.mkdir()
print('missing-db trace_groups error key present:', 'error' in inspect_builder_trace_groups(d))
print('missing-db trace_health error key present:', 'error' in inspect_builder_trace_health(d))

(d / 'state.db').write_text('this is not a sqlite database')
r1 = inspect_builder_trace_groups(d)
r2 = inspect_builder_trace_health(d)
print('corrupt trace_groups:', r1.get('error'))
print('corrupt trace_health:', r2.get('error'))
print('sqlite3.DatabaseError IS sqlite3.Error:', issubclass(sqlite3.DatabaseError, sqlite3.Error))

try:
    raise AttributeError('typo')
except (sqlite3.Error, OSError):
    print('FAIL: AttributeError was caught')
except AttributeError:
    print('OK: AttributeError propagates past narrow tuple')
"
missing-db trace_groups error key present: False
missing-db trace_health error key present: False
corrupt trace_groups: DatabaseError: file is not a database
corrupt trace_health: DatabaseError: file is not a database
sqlite3.DatabaseError IS sqlite3.Error: True
OK: AttributeError propagates past narrow tuple

Both new corrupt-db unit tests pass:

$ PYTHONPATH=src python3 -m unittest \
    tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_groups_records_sqlite_error_when_db_corrupt \
    tests.test_system_map.SparkSystemMapTests.test_inspect_builder_trace_health_records_sqlite_error_when_db_corrupt -v
test_inspect_builder_trace_groups_records_sqlite_error_when_db_corrupt ... ok
test_inspect_builder_trace_health_records_sqlite_error_when_db_corrupt ... ok
Ran 2 tests in 0.001s
OK

What didn't change

One file changed for the fix (two lines), plus tests. Except bodies byte-identical pre/post for both helpers. Missing-state.db fast-path is unchanged. The corrupt-db / connect-failure conditions still route through the documented out["error"] short-circuit. Only observable behavior change: programming bugs in the helpers no longer get silently swallowed.

@vibeforge1111

Copy link
Copy Markdown
Owner

Thanks, this was useful. We maintainer-adopted this exception-boundary improvement in #603 and merged it into master at 70b1034a1ce490fddfb3a5d0ba5d6ab098a9f89f.

What shipped:

  • Spawner UI liveness failures now catch only expected URL/timeout failures.
  • System-map filesystem readers now catch expected OSError failures.
  • Builder SQLite readers now catch expected (sqlite3.Error, OSError) failures.

Verification on the adopted PR:

  • focused Spawner/system-map tests passed
  • full tests/test_system_map.py passed
  • full tests/test_cli.py passed
  • compileall, git diff --check, registry-pin verification, CodeQL, scorecard, secret-scan, and test-and-audit all passed

Source credit is recorded for this PR, and #603 itself should not receive participant credit. Points are still gated because this PR body had spark-compete-hotfix-v1 schema text but not a clean fenced JSON packet that the validator can extract. To unlock review for points, add a clean fenced JSON packet in a PR comment or body update, then validate it at https://compete.sparkswarm.ai/api/packet/validate.

Closing this PR as adopted into #603. No code action is needed unless you believe #603 missed part of the issue.

@vibeforge1111

Copy link
Copy Markdown
Owner

R22/R23 final scoring update: this PR did receive final R22/R23 public leaderboard credit. Earlier gate, credit-review, or points-lock wording described the pre-final review state, not a final rejection or permanent zero. Final R22/R23 credit for this PR: 48 points. Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants