Parser fix by aakash-barthwal · Pull Request #31 · Mercor-Intelligence/apex-swe

aakash-barthwal · 2026-04-30T22:45:54Z

No description provided.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit d0fed4a. Configure here.}

cursor · 2026-04-30T22:53:56Z

+        # If the suite timed out or crashed (no summary), the run is incomplete —
+        # mark everything FAILED so new tests added by the golden patch show as F2P.
+        if "tests summary: ok:" not in text_output:
+            results = {k: "FAILED" for k in results}


Override marks passing tests as FAILED, inflating F2P

Medium Severity

When individual test lines are found but the output lacks "tests summary: ok:", the override results = {k: "FAILED" for k in results} marks tests that actually passed as "FAILED". The downstream F2P calculation in evaluator.py then miscounts these as F2P (fail-to-pass) instead of P2P (pass-to-pass), inflating F2P and deflating P2P metrics. The comment's stated rationale ("new tests added by the golden patch show as F2P") doesn't require this override — new tests either don't appear in pre-patch output (caught by new_passing logic) or already fail without the golden patch.

^{Reviewed by Cursor Bugbot for commit d0fed4a. Configure here.}

sig

d0fed4a

aakash-barthwal changed the title ~~sig~~ Parser fix Apr 30, 2026

cursor Bot reviewed Apr 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser fix#31

Parser fix#31
aakash-barthwal wants to merge 1 commit into
mainfrom
new_fix

aakash-barthwal commented Apr 30, 2026

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aakash-barthwal commented Apr 30, 2026

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 30, 2026

Choose a reason for hiding this comment

Override marks passing tests as FAILED, inflating F2P

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant