[spark-compete] Narrow bare except in spawner-ui liveness urlopen in cli.py to (urllib.error.URLError, TimeoutError)#537
Conversation
…b.error.URLError, TimeoutError)
evaluate_module_health (cli.py L5063) issues a GET against the spawner-ui
liveness endpoint and wraps the urllib.request.urlopen call in
except Exception: healthy=False; detail=f'Spawner UI live health failed: {exc}'.
The try body contains only urllib.request.Request construction and
urllib.request.urlopen; the only exceptions those calls can raise are
urllib.error.URLError (parent of HTTPError, covering connection-refused,
DNS failure, and socket-level failures) and TimeoutError (when the
socket-level timeout fires).
The bare Exception swallows unrelated runtime errors (NameError from
a typo in ready_check_headers, AttributeError if module.manifest gets
mis-typed) into the same 'live health failed' detail, hiding programming
bugs behind a fake 'spawner is down' status. Operators looking at
spark status see 'Spawner UI live health failed' and start debugging
the spawner-ui process when really the bug is in this helper.
Narrow tuple is exhaustive for the urlopen path; except body is
byte-identical pre/post. Two unittest.mock.patch tests assert URLError
and TimeoutError both still route through the narrow tuple and produce
the documented 'live health failed' detail.
QA write-up — for reviewer eyesTL;DR. Beforetry:
request = urllib.request.Request(health_url, headers=ready_check_headers(health_url))
with urllib.request.urlopen(request, timeout=ready_timeout_seconds(module)) as response:
healthy = 200 <= int(response.status) < 300
detail = f"Spawner UI live health {'OK' if healthy else 'failed'}: HTTP {response.status}"
except Exception as exc:
healthy = False
detail = f"Spawner UI live health failed: {exc}"If a typo or refactor lands a NameError or AttributeError inside the try block (e.g. a rename misses Afterexcept (urllib.error.URLError, TimeoutError) as exc:
healthy = False
detail = f"Spawner UI live health failed: {exc}"Same "live health failed" short-circuit fires for the documented failure modes: SmokeTwo unittest.mock.patch tests pass: Sibling What didn't changeOne file changed for the fix (one line), plus tests. Except body byte-identical pre/post. The |
|
Thanks, this was useful. We maintainer-adopted this exception-boundary improvement in #603 and merged it into What shipped:
Verification on the adopted PR:
Source credit is recorded for this PR, and #603 itself should not receive participant credit. Points are still gated because this PR body had Closing this PR as adopted into #603. No code action is needed unless you believe #603 missed part of the issue. |
|
R22/R23 final scoring update: this PR did receive final R22/R23 public leaderboard credit. Earlier gate, credit-review, or points-lock wording described the pre-final review state, not a final rejection or permanent zero. Final R22/R23 credit for this PR: 48 points. Thanks for the contribution. |
{
"schema": "spark-compete-hotfix-v1",
"event": "spark-compete-first-event",
"submission_mode": "public_repo_pr",
"submission_target_url": "#537",
"team": {
"name": "SparkThisUp",
"members": ["ValHallaBuilder", "Baz707", "DanFireDash"],
"github_accounts": ["4gjnbzb4zf-sudo"],
"llm_device_holder": "ValHallaBuilder",
"device_holder_github": "4gjnbzb4zf-sudo"
},
"target_repo": {
"id": "vibeforge1111/spark-cli",
"source": "https://github.com/vibeforge1111/spark-cli",
"owner_surface": "spark-cli"
},
"issue": {
"type": "bug",
"severity": "low",
"title": "evaluate_module_health swallows programming bugs as a fake 'Spawner UI live health failed' status",
"actual_behavior": "src/spark_cli/cli.py L5039 evaluate_module_health issues a GET against the spawner-ui liveness endpoint and wraps the urllib.request.urlopen call in
try: ... except Exception as exc: healthy=False; detail=f'Spawner UI live health failed: {exc}'. The try body contains only urllib.request.Request construction and urllib.request.urlopen plus an int() cast on response.status; the only exceptions those calls can raise are urllib.error.URLError (the parent of HTTPError, covering connection-refused, DNS failure, and socket-level failures) and TimeoutError (when the socket-level timeout fires). The bare Exception folds unrelated runtime errors (NameError from a typo in the helper, AttributeError if module.manifest gets mis-typed by a refactor) into the same 'Spawner UI live health failed' detail string; operators looking atspark statussee 'live health failed' and start debugging the spawner-ui process when really the bug is in this branch of cli.py.","expected_behavior": "The 'live health failed' detail string fires only for the documented failure modes: urllib.error.URLError when the GET cannot complete (connection refused, DNS, HTTP error) and TimeoutError when the socket-level timeout fires. Programming bugs propagate as a real traceback.",
"repro_steps": [
"gh pr checkout ",
"PYTHONPATH=src python3 -m unittest tests.test_cli.SparkCliTests.test_spawner_health_records_url_error_via_narrow_tuple tests.test_cli.SparkCliTests.test_spawner_health_records_timeout_via_narrow_tuple -> 2 tests OK; URLError and TimeoutError both still route through the narrow tuple and produce the documented detail.",
"PYTHONPATH=src python3 -c "import urllib.error; print(issubclass(urllib.error.HTTPError, urllib.error.URLError)); print(issubclass(urllib.error.URLError, OSError)); print(issubclass(TimeoutError, OSError))" -> True True True. HTTPError is a URLError; URLError and TimeoutError are both OSError subclasses, so the narrow tuple is the right shape for stdlib urlopen.",
"Inverse: drop a Python-side typo into the try body (e.g. rename
urllib.request.urlopentourllib.request.urlopenZZZ); before the change, the AttributeError gets swallowed andspark statusreports 'Spawner UI live health failed: module ...has no attribute urlopenZZZ' as if the spawner were down. After the change, the AttributeError propagates so the operator sees the real bug."],
"affected_workflow": "evaluate_module_health is called by
spark statusto surface per-module health to the operator. The spawner-ui liveness check is the gating signal that tells the operator whether the spawner is reachable and serving the Spark Live container's mission-control endpoints. When this branch silently swallows a Python bug,spark statusreports 'Spawner UI live health failed' and the operator starts debugging the spawner-ui process, port mapping, or the SPARK_LIVE_CONTAINER setup; they never see the real Python traceback that would point at the cli.py helper. Downstream, that wrong status feeds into the mission-control posture and the spawner-ui ready-watch loop -- they assume the spawner is unreachable and skip starting missions that depend on it."},
"evidence": {
"safe_links_only": true,
"before_after_proof": "Single try/except pattern at cli.py L5058. Before:
except Exception as exc: healthy=False; detail=f'Spawner UI live health failed: {exc}'. After:except (urllib.error.URLError, TimeoutError) as exc: healthy=False; detail=f'Spawner UI live health failed: {exc}'. The except body is byte-identical; only the catch tuple changes. urllib.error.URLError is the parent class for HTTPError (4xx/5xx and protocol errors) and wraps the connection-refused / DNS / socket failures urlopen produces; TimeoutError covers the socket-level timeout from urlopen(timeout=N). Both are OSError subclasses. Diff: 1 insertion, 1 deletion in src/spark_cli/cli.py plus 51 insertions in tests/test_cli.py for two focused unittest.mock.patch tests.","links": ["https://github.com//pull/537"],
"forbidden": ["pdf","zip","exe","unknown downloads","shortened links","archives","binaries","tokens","browser cookies","wallet material","raw logs","raw conversations","raw memory","raw patches","private repo maps","private scoring details"]
},
"proposed_fix": {
"approach": "Replace
except Exception as exc:withexcept (urllib.error.URLError, TimeoutError) as exc:in evaluate_module_health (cli.py L5063). The narrow tuple is exhaustive for the urlopen path: urllib.error.URLError is the parent of HTTPError and covers every connection-level and HTTP-protocol failure urlopen produces, and TimeoutError covers the socket-timeout case the urlopen(timeout=N) keyword can surface directly. urllib.error is already imported at the top of cli.py (line 25). Two unittest.mock.patch tests assert URLError and TimeoutError both still route through the narrow tuple and produce the documented 'Spawner UI live health failed' detail.","files_expected": ["src/spark_cli/cli.py", "tests/test_cli.py"],
"tests_or_smoke": "Manual smoke verified: subclass tree -- HTTPError IS URLError True, URLError IS OSError True, TimeoutError IS OSError True. AST parse of cli.py is clean. The two new unittest.mock.patch tests pass: PYTHONPATH=src python3 -m unittest tests.test_cli.SparkCliTests.test_spawner_health_records_url_error_via_narrow_tuple tests.test_cli.SparkCliTests.test_spawner_health_records_timeout_via_narrow_tuple -> 2 tests OK. Sibling test_spawner_health_uses_liveness_endpoint_in_hosted_mode still passes."
},
"pr": {
"branch": "sentinel/exception-narrow/cli-spawner-runtime-health-readycheck",
"title_prefix": "[spark-compete]",
"author_github": "4gjnbzb4zf-sudo",
"body_must_include": ["packet","team","pr_author","repo","actual_behavior","expected_behavior","repro_steps","before_after_proof","tests_or_smoke","duplicate_notes","risk_notes","review_claim"],
"url": "#537"
},
"review_claim": {
"impact_claim": "low",
"evidence_types": ["redacted_terminal_excerpt"],
"duplicate_notes": "Pre-flight
gh pr list --search 'spawner_runtime_health OR evaluate_module_health narrow'returned zero hits. Our own #527 narrowed urllib+json paths elsewhere in cli.py, #528 narrowed save_json, and #532-#536 covered system_map.py. This single-site fix in evaluate_module_health is a distinct branch (L5063, spawner-ui liveness urlopen) that no prior or competitor PR touches.","risk_notes": "Local scope: one file changed for the fix (one line), plus tests. The except body is byte-identical pre/post. The narrow tuple is exhaustive for the urlopen path -- HTTPError flows through URLError, socket-level connection failures flow through URLError (which itself is an OSError subclass), and socket-timeout flows through TimeoutError. Two unittest.mock.patch tests assert both classes still route to the documented detail string. The only observable behavior change is that programming bugs no longer surface as a fake 'Spawner UI live health failed' status.",
"review_state_requested": "pr_review"
}
}