From 6d65650752a068798a2b5de55ab137fd4e8fbbcf Mon Sep 17 00:00:00 2001 From: Bo Date: Fri, 22 May 2026 12:05:31 -0400 Subject: [PATCH] fix(evals): unstale 3 release-gate eval hard-fails behind legit refactors (soc-2gd6) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The v2.42.0 release gate was red on 3 score-0/near-0 evals. All three are eval-staleness behind legitimate recent changes — verified, NOT gaming or security weakening (operator chose "update eval to match source of truth"): | Eval | Was | Cause | Fix | |---|---|---|---| | hook-manifest-command-counts | 0 | session-pr-counter.sh (PR #362) is the legit 37th hook script; eval hardcoded 43/36 | bump expected counts 43→44, 36→37 | | push-worktree landing-plane | 0.14 | #387 tiered-AGENTS split moved the "Landing the Plane" section to AGENTS-WORKFLOW.md (and dropped 2 lines) | redirect eval target AGENTS.md→AGENTS-WORKFLOW.md + restore the 2 dropped policy lines | | security-toolchain ci-soft-gate-policy | 0 | the gate is intentionally HARD (no continue-on-error); the job already runs security-gate.sh --mode quick + uploads artifacts | drop the stale continue-on-error requirement from the eval (security stays HARD) | Security note: the security-toolchain-gate stays a HARD blocking gate. The only eval bit removed was the stale "soft gate" assertion; the actual scan (security-gate.sh --mode quick) + artifact upload + summary-blocking are unchanged. How tested: - hook-manifest jq check → hook-manifest-counts-ok - security smoke ci-policy → security-toolchain-ci-policy-ok - all 7 landing-plane strings present in AGENTS-WORKFLOW.md - shellcheck clean on the edited smoke Sibling pattern: same "update eval to match legitimately-changed source of truth" move as the cli-command-surface canary bumps in #396/#397. Fitness: release-gate eval hard-fails 3 → 0. (5 minor evals 0.71-0.99 + the vil lane remain — separate remediation, NOT in this PR.) Closes-scenario: soc-2gd6#eval-hard-fails Bounded-context: BC4-Validation Evidence: evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh --- AGENTS-WORKFLOW.md | 2 ++ .../security-toolchain-governance-smoke.sh | 1 - .../hook-manifest-runtime-contracts.json | 2 +- evals/agentops-core/push-worktree-closeout.json | 14 +++++++------- 4 files changed, 10 insertions(+), 9 deletions(-) diff --git a/AGENTS-WORKFLOW.md b/AGENTS-WORKFLOW.md index c33f46749..ef37b0073 100644 --- a/AGENTS-WORKFLOW.md +++ b/AGENTS-WORKFLOW.md @@ -164,6 +164,8 @@ This moves the tag to HEAD, pushes, rebuilds the GitHub release, updates the Hom - NEVER say "ready to push when you are" - YOU must push - If push fails, resolve and retry until it succeeds - NEVER leave a foreign branch-attached worktree without a recorded disposition +- Keep the canonical root clean and attached to `main`. +- Run `bash scripts/check-worktree-disposition.sh` before push and session close. - If `bd dolt push` says no remote is configured, do not treat that as a session failure. Record it as unavailable, then continue with the mandatory Git push. See [bd server-mode tracker closeout](docs/runbooks/bd-server-mode-closeout.md). diff --git a/evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh b/evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh index 1a2e9e669..dd3026bdd 100755 --- a/evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh +++ b/evals/agentops-core/fixtures/security-toolchain-governance-smoke.sh @@ -275,7 +275,6 @@ job_start = workflow.index(" security-toolchain-gate:") job_end = workflow.index("\n skill-integrity:", job_start) job = workflow[job_start:job_end] required_job_bits = [ - "continue-on-error: true", "./scripts/security-gate.sh --mode quick", "uses: actions/upload-artifact@", "if: always()", diff --git a/evals/agentops-core/hook-manifest-runtime-contracts.json b/evals/agentops-core/hook-manifest-runtime-contracts.json index 56d572d94..6842a49de 100644 --- a/evals/agentops-core/hook-manifest-runtime-contracts.json +++ b/evals/agentops-core/hook-manifest-runtime-contracts.json @@ -116,7 +116,7 @@ "timeout_seconds": 60, "inputs": { "cwd": "../..", - "shell": "jq -e '[.. | .command? // empty] as $cmds | ($cmds | length == 43) and ($cmds | map(capture(\"(?