Parser fix#31
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit d0fed4a. Configure here.
| # If the suite timed out or crashed (no summary), the run is incomplete — | ||
| # mark everything FAILED so new tests added by the golden patch show as F2P. | ||
| if "tests summary: ok:" not in text_output: | ||
| results = {k: "FAILED" for k in results} |
There was a problem hiding this comment.
Override marks passing tests as FAILED, inflating F2P
Medium Severity
When individual test lines are found but the output lacks "tests summary: ok:", the override results = {k: "FAILED" for k in results} marks tests that actually passed as "FAILED". The downstream F2P calculation in evaluator.py then miscounts these as F2P (fail-to-pass) instead of P2P (pass-to-pass), inflating F2P and deflating P2P metrics. The comment's stated rationale ("new tests added by the golden patch show as F2P") doesn't require this override — new tests either don't appear in pre-patch output (caught by new_passing logic) or already fail without the golden patch.
Reviewed by Cursor Bugbot for commit d0fed4a. Configure here.


No description provided.