Skip to content

[spark-compete] Narrow bare except in 4 filesystem-walker helpers in system_map.py to OSError#536

Closed
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:sentinel/exception-narrow/system-map-filesystem-walkers
Closed

[spark-compete] Narrow bare except in 4 filesystem-walker helpers in system_map.py to OSError#536
4gjnbzb4zf-sudo wants to merge 1 commit into
vibeforge1111:masterfrom
4gjnbzb4zf-sudo:sentinel/exception-narrow/system-map-filesystem-walkers

Conversation

@4gjnbzb4zf-sudo

@4gjnbzb4zf-sudo 4gjnbzb4zf-sudo commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

{
"schema": "spark-compete-hotfix-v1",
"event": "spark-compete-first-event",
"submission_mode": "public_repo_pr",
"submission_target_url": "#536",
"team": {
"name": "SparkThisUp",
"members": ["ValHallaBuilder", "Baz707", "DanFireDash"],
"github_accounts": ["4gjnbzb4zf-sudo"],
"llm_device_holder": "ValHallaBuilder",
"device_holder_github": "4gjnbzb4zf-sudo"
},
"target_repo": {
"id": "vibeforge1111/spark-cli",
"source": "https://github.com/vibeforge1111/spark-cli",
"owner_surface": "spark-cli"
},
"issue": {
"type": "bug",
"severity": "low",
"title": "Four filesystem-walker helpers in system_map.py swallow programming bugs as generic Exception while reporting them as a filesystem error",
"actual_behavior": "src/spark_cli/system_map.py has four small filesystem-traversal helpers used to fill out the operator-facing system-map snapshot: inspect_file_metadata (L1377, single stat() call), count_files_under (L1488, rglob walker), count_schema_files (L1523, glob walker), and summarize_memory_run_artifacts (L2198, iterdir + per-child stat). Each helper wraps a single stdlib Path traversal call in try: ... except Exception as exc: out['error'] = f'{type(exc).__name__}: {exc}'. The try body never calls anything beyond stdlib filesystem APIs (stat / rglob / glob / iterdir / Path.is_dir / Path.is_file / Path.startswith / Counter.update), all of which raise only OSError subclasses. The bare Exception folds unrelated runtime errors (NameError from a refactor, TypeError from a logic bug, AttributeError from a typo) into the same 'filesystem error' field; the operator sees out['error'] in the system-map snapshot and assumes a directory is broken or unreadable when really a programming bug in the helper hid the real failure.",
"expected_behavior": "The same out['error'] short-circuit fires only for the documented failure modes: PermissionError when a subtree is denied, FileNotFoundError when the path vanishes mid-walk, NotADirectoryError when the path is replaced with a regular file, and the broader OSError parent. Programming bugs propagate as a real traceback.",
"repro_steps": [
"gh pr checkout ",
"PYTHONPATH=src python3 -m unittest tests.test_system_map.SparkSystemMapTests.test_inspect_file_metadata_records_os_error_when_stat_fails tests.test_system_map.SparkSystemMapTests.test_count_files_under_records_os_error_when_rglob_fails tests.test_system_map.SparkSystemMapTests.test_count_schema_files_records_os_error_when_glob_fails tests.test_system_map.SparkSystemMapTests.test_summarize_memory_run_artifacts_records_os_error_when_iterdir_fails -> 4 tests OK; each asserts a PermissionError still routes through the narrow OSError tuple with a non-generic class name.",
"Subclass check: PYTHONPATH=src python3 -c "print(issubclass(PermissionError, OSError), issubclass(FileNotFoundError, OSError), issubclass(NotADirectoryError, OSError))" -> True True True.",
"Inverse: drop a Python-side typo into one of the try bodies (e.g. rename extension_counts[suffix] += 1 to extension_counts[suffix] +1); before the change, the SyntaxError or AttributeError gets swallowed and the operator sees out['error']: ... as if the directory were unreadable. After the change, the AttributeError propagates so the operator sees the real bug."
],
"affected_workflow": "system_map.py is the system-map compile module driven by spark report system-map. The four helpers feed the per-section snapshot the operator looks at to confirm Spark's local layout: inspect_file_metadata reports per-config-file size and modified-time, count_files_under reports overall directory shape and extension distribution, count_schema_files reports declared schemas under the schemas directory, and summarize_memory_run_artifacts reports memory-QA run counts and timestamps. When any one of these silently swallows a programming bug, the operator sees a filesystem error in the section it produced and assumes the local directory is broken -- they never see the real Python traceback that would point at the broken helper, so they spend time auditing filesystem permissions instead of reading the code."
},
"evidence": {
"safe_links_only": true,
"before_after_proof": "Four paired try/except patterns. Before: except Exception as exc: out['error'] = f'{type(exc).__name__}: {exc}'. After: except OSError as exc: out['error'] = f'{type(exc).__name__}: {exc}'. Except bodies byte-identical; only the catch class changes. OSError covers PermissionError, FileNotFoundError, NotADirectoryError, IsADirectoryError, and FileExistsError, which are the only classes stdlib Path traversal (stat / rglob / glob / iterdir) can raise. Diff: 4 insertions, 4 deletions in src/spark_cli/system_map.py plus 70 insertions in tests/test_system_map.py for four focused unittest.mock.patch tests.",
"links": ["https://github.com//pull/536"],
"forbidden": ["pdf","zip","exe","unknown downloads","shortened links","archives","binaries","tokens","browser cookies","wallet material","raw logs","raw conversations","raw memory","raw patches","private repo maps","private scoring details"]
},
"proposed_fix": {
"approach": "Replace except Exception as exc: with except OSError as exc: in all four helpers. The narrow class is exhaustive for stdlib Path traversal: stat / rglob / glob / iterdir / Path.is_dir / Path.is_file all raise only OSError subclasses, and the helpers do nothing else inside their try blocks beyond those traversal calls plus pure-Python dict/counter updates. The except bodies are byte-identical pre/post. Four unittest.mock.patch tests are added in tests/test_system_map.py; each test injects PermissionError into stat / rglob / glob / iterdir respectively and asserts the helper still routes through the narrow tuple with a non-generic class name.",
"files_expected": ["src/spark_cli/system_map.py", "tests/test_system_map.py"],
"tests_or_smoke": "Manual smoke: missing-path inputs still return early via the .exists() fast-path (no traceback, no error key). The four new unittest.mock.patch tests pass: PYTHONPATH=src python3 -m unittest tests.test_system_map.SparkSystemMapTests.test_inspect_file_metadata_records_os_error_when_stat_fails tests.test_system_map.SparkSystemMapTests.test_count_files_under_records_os_error_when_rglob_fails tests.test_system_map.SparkSystemMapTests.test_count_schema_files_records_os_error_when_glob_fails tests.test_system_map.SparkSystemMapTests.test_summarize_memory_run_artifacts_records_os_error_when_iterdir_fails -> 4 tests OK. AST parse of system_map.py is clean."
},
"pr": {
"branch": "sentinel/exception-narrow/system-map-filesystem-walkers",
"title_prefix": "[spark-compete]",
"author_github": "4gjnbzb4zf-sudo",
"body_must_include": ["packet","team","pr_author","repo","actual_behavior","expected_behavior","repro_steps","before_after_proof","tests_or_smoke","duplicate_notes","risk_notes","review_claim"],
"url": "#536"
},
"review_claim": {
"impact_claim": "low",
"evidence_types": ["redacted_terminal_excerpt"],
"duplicate_notes": "Pre-flight gh pr list --search 'inspect_file_metadata OR count_schema_files OR summarize_memory_run_artifacts narrow' returned zero hits. Our own #526, #532, #533, #534, #535 narrowed other categories in system_map.py (six sqlite helpers, read_json + read_toml, inspect_spawner_prd_auto_trace, git_summary + run_git, inspect_builder_trace_groups + inspect_builder_trace_health). These four filesystem-walker helpers (L1387, L1513, L1543, L2229) are a distinct category and were not touched by any prior PR. No competitor PR touches them either.",
"risk_notes": "Local scope: one file changed for the fix (four lines), plus tests. The except bodies are byte-identical pre/post for all four helpers. The .exists() fast-path is unchanged in each. The OSError narrow is exhaustive for stdlib Path traversal -- all the exceptions stat / rglob / glob / iterdir can raise are OSError subclasses, and the try bodies do nothing else beyond pure-Python aggregation. The only observable behavior change is that programming bugs no longer get silently swallowed.",
"review_state_requested": "pr_review"
}
}

… OSError

inspect_file_metadata (L1387, stat()), count_files_under (L1513, rglob),
count_schema_files (L1543, glob), and summarize_memory_run_artifacts
(L2229, iterdir + child.stat()) all wrap a single Path-traversal call in
try/except Exception and surface the failure as out['error'].

Each helper's try body contains only stdlib Path traversal: stat(),
rglob(), glob(), iterdir(). The only exceptions those can raise are
OSError subclasses (PermissionError when a subtree is denied,
FileNotFoundError when the path vanishes mid-walk, NotADirectoryError
when the path is replaced with a regular file). The bare Exception
swallows any unrelated runtime error (NameError from a refactor,
TypeError from a logic bug) into the same out['error'] field; the
operator's system-map snapshot reports a filesystem error and they
assume the directory is broken when really a programming bug in the
helper hid the real failure.

Narrow tuple is exhaustive for stdlib filesystem traversal; except body
is byte-identical pre/post for all four helpers. Four focused tests use
unittest.mock.patch to inject PermissionError into stat/rglob/glob/iterdir
and assert each helper still routes through the narrow tuple with a
non-generic class name.
@4gjnbzb4zf-sudo

Copy link
Copy Markdown
Contributor Author

QA write-up — for reviewer eyes

TL;DR. Four filesystem-walker helpers in src/spark_cli/system_map.py -- inspect_file_metadata (L1377, single stat()), count_files_under (L1488, rglob walker), count_schema_files (L1523, glob walker), and summarize_memory_run_artifacts (L2198, iterdir + per-child stat()) -- each wrap a single stdlib Path traversal call in try/except Exception and surface the failure as out["error"]. The try bodies contain only stdlib Path traversal plus pure-Python aggregation; the only exceptions those can raise are OSError subclasses. Narrow each to OSError.

Before

def inspect_file_metadata(path: Path) -> dict[str, Any]:
    ...
    try:
        stat = path.stat()
    except Exception as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"
        return out
    ...


def count_files_under(path: Path, *, max_files: int = 5000) -> dict[str, Any]:
    ...
    try:
        for child in path.rglob("*"):
            ...
    except Exception as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"
        return out
    ...


def count_schema_files(path: Path, *, max_files: int = 500) -> dict[str, Any]:
    ...
    try:
        for child in sorted(path.glob("*.schema.json"), ...):
            ...
    except Exception as exc:
        ...


def summarize_memory_run_artifacts(builder_home: Path) -> dict[str, Any]:
    ...
    try:
        for prefix in prefixes:
            runs = [child for child in artifacts.iterdir() if ...]
            latest_mtime = max((child.stat().st_mtime for child in runs), default=None)
            ...
    except Exception as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"
    return out

If a typo or refactor lands a NameError or AttributeError inside one of these try bodies, the operator sees out["error"] in the system-map snapshot and assumes a filesystem permission issue -- they audit the directory, find nothing wrong, and the real Python bug stays hidden.

After

    except OSError as exc:
        out["error"] = f"{type(exc).__name__}: {exc}"

Same out["error"] short-circuit fires for the documented failure modes: PermissionError when a subtree is denied, FileNotFoundError when the path vanishes mid-walk, NotADirectoryError when the path is replaced with a regular file. Programming bugs propagate as a real traceback.

Smoke

$ PYTHONPATH=src python3 -c "
print('PermissionError IS OSError:', issubclass(PermissionError, OSError))
print('FileNotFoundError IS OSError:', issubclass(FileNotFoundError, OSError))
print('NotADirectoryError IS OSError:', issubclass(NotADirectoryError, OSError))

try:
    raise AttributeError('typo')
except OSError:
    print('FAIL: AttributeError caught')
except AttributeError:
    print('OK: AttributeError propagates past narrow class')
"
PermissionError IS OSError: True
FileNotFoundError IS OSError: True
NotADirectoryError IS OSError: True
OK: AttributeError propagates past narrow class

Four unittest.mock.patch tests pass:

$ PYTHONPATH=src python3 -m unittest \
    tests.test_system_map.SparkSystemMapTests.test_inspect_file_metadata_records_os_error_when_stat_fails \
    tests.test_system_map.SparkSystemMapTests.test_count_files_under_records_os_error_when_rglob_fails \
    tests.test_system_map.SparkSystemMapTests.test_count_schema_files_records_os_error_when_glob_fails \
    tests.test_system_map.SparkSystemMapTests.test_summarize_memory_run_artifacts_records_os_error_when_iterdir_fails -v
test_inspect_file_metadata_records_os_error_when_stat_fails ... ok
test_count_files_under_records_os_error_when_rglob_fails ... ok
test_count_schema_files_records_os_error_when_glob_fails ... ok
test_summarize_memory_run_artifacts_records_os_error_when_iterdir_fails ... ok
Ran 4 tests in 0.002s
OK

What didn't change

One file changed for the fix (four lines), plus tests. Except bodies byte-identical pre/post for all four helpers. The .exists() fast-path in each is unchanged. The OSError narrow is exhaustive for stdlib Path traversal -- nothing else inside the try bodies can raise outside that class. The only observable behavior change is that programming bugs no longer get silently swallowed.

@vibeforge1111

Copy link
Copy Markdown
Owner

Thanks, this was useful. We maintainer-adopted this exception-boundary improvement in #603 and merged it into master at 70b1034a1ce490fddfb3a5d0ba5d6ab098a9f89f.

What shipped:

  • Spawner UI liveness failures now catch only expected URL/timeout failures.
  • System-map filesystem readers now catch expected OSError failures.
  • Builder SQLite readers now catch expected (sqlite3.Error, OSError) failures.

Verification on the adopted PR:

  • focused Spawner/system-map tests passed
  • full tests/test_system_map.py passed
  • full tests/test_cli.py passed
  • compileall, git diff --check, registry-pin verification, CodeQL, scorecard, secret-scan, and test-and-audit all passed

Source credit is recorded for this PR, and #603 itself should not receive participant credit. Points are still gated because this PR body had spark-compete-hotfix-v1 schema text but not a clean fenced JSON packet that the validator can extract. To unlock review for points, add a clean fenced JSON packet in a PR comment or body update, then validate it at https://compete.sparkswarm.ai/api/packet/validate.

Closing this PR as adopted into #603. No code action is needed unless you believe #603 missed part of the issue.

@vibeforge1111

Copy link
Copy Markdown
Owner

R22/R23 final scoring update: this PR did receive final R22/R23 public leaderboard credit. Earlier gate, credit-review, or points-lock wording described the pre-final review state, not a final rejection or permanent zero. Final R22/R23 credit for this PR: 48 points. Thanks for the contribution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants