fix(#268): align qa-* test from_session with ntok-bound alias — restore 13/13 L1 green#269
Conversation
…re 13/13 L1 green After #265 cleared the safe-rm.sh cascade, 4 L1 contract tests surfaced as failing with one shared signature: {"ok":false, "error":"from_session_identity_mismatch", "message":"network token from_session does not match token-bound node alias"} Root cause: bf564fb (#203 fix, 2026-05-28) tightened the server-side check so that an NTOK request's `from_session` must match the alias the NTOK is bound to. Four qa-* tests pre-date that fix and still send the wrong value: | test | NTOK bound to | was | now | |---------------------------------------------|---------------------|------------------|----------------------| | qa-hub-09-task-state-machine | agent-09 | "admin" | "agent-09" | | qa-hub-06b-cross-user-isolation | alice-agent | "alice" | "alice-agent" | | qa-dash-08-cross-account-views | alice-secret-agent | "alice" | "alice-secret-agent" | | qa-dash-10-incremental-poll | dash10-agent | "admin" | "dash10-agent" | Verified real isolation contract still intact: ran patched qa-hub-06b and all 11 PINs PASS — bob still gets 403/400 on every cross-tenant probe (networks list / tasks list / status / messages / direct IDOR / inject / mint). The #203 identity check has been silently doing its job throughout; these tests were just sending nonsense values that the cascade hid. Also fixed `[[ -n "$TASK_ID" ]]` null-guard in qa-dash-08: `jq -r .message_id` emits the literal string "null" when the field is missing, so `-n null` is true and the test would proceed with TASK_ID="null", causing send_reply to fail with garbage in_reply_to and the failure to be mis-attributed. Tightened to `[[ -n "$TASK_ID" && "$TASK_ID" != "null" ]]` so the assertion fails at the actual point of breakage. Local verification — all 4 cases now PASS in Docker: qa-hub-09-task-state-machine → PASS (replied/failed/cancelled + terminal no-op) qa-hub-06b-cross-user-isolation → PASS (all 11 isolation PINs incl. IDOR/inject/mint) qa-dash-08-cross-account-views → PASS (nodes/stats/completions/tasks/tokens/task_events) qa-dash-10-incremental-poll → PASS (since= filter on both endpoints + format gap pinned) After this merges, main `anet QA (v0)` should be 13/13 PASS — first green since 826c8a7 (12 days, 133 commits). Out of scope: a broader null-guard hygiene sweep across other tests (`jq -r .field` returns "null") — recorded against #268 as a follow-up hygiene task, not gated on this PR. Refs: #268 (4 L1 fails surfaced after #265), #265 (cascade fix), #257
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7baaec7db6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| # dispatch + reply (so /api/completions gets a row) | ||
| ARG=$(jq -nc --arg net "$ALICE_NET" \ | ||
| '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice"}') | ||
| '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice-secret-agent"}') |
There was a problem hiding this comment.
Keep the from_name probe seeded with a matching sender
With this change, the seeded alice-confidential-task row is written with from_name=alice-secret-agent because send_task stores from_session as the task sender, but step [7] still queries /api/tasks?from_name=alice and looks for that content. The /api/tasks handler applies an exact from_name = ? predicate (server/src/index.ts:1979), so this probe now returns zero rows even if the documented from_name=alice dashboard IDOR path regresses; seed this row through Alice's UTOK or update the probe to match the actual sender.
Useful? React with 👍 / 👎.
| # dispatch + reply (so /api/completions gets a row) | ||
| ARG=$(jq -nc --arg net "$ALICE_NET" \ | ||
| '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice"}') | ||
| '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice-secret-agent"}') |
There was a problem hiding this comment.
Seed a real completion before checking completion isolation
Now that this setup gets past send_task, the /api/completions probe still has no Alice completion to protect: send_reply only updates the task/reply path, while completion rows are inserted by report_completion (server/src/tools.ts:442-463). As a result step [5] passes with an empty .completions array even if /api/completions stops applying tenant scope; add a report_completion call for Alice before Bob's /api/completions check.
Useful? React with 👍 / 👎.
…lease-notes shape (#270) * feat(#261): release-gate workflow — install smoke + PINNED audit + release-notes shape Encodes the manual Method B release SOP as automated, report-only gates that run on every tag push (or workflow_dispatch). Closes the gap behind several recent ship-blockers that the human checklist let through. Three gates, all independent jobs (parallel after a shared tarball build): Gate 1 — install-path smoke (real TTY) - node:24-slim (NOT alpine, to keep glibc coverage; alpine masked a feishu-image agent-runtime regression last cycle) - npm install -g from the just-built tarball (NOT npm.org), so broken bundles fail BEFORE publish (catches #136 preview.4 / #137 class) - 5 cases: --version matches tag / hub --help shows stop+status+start / hub start + /health 200 + admin-utok 600 / login --hub --user --pw / node create wizard reaches first prompt under script -qc + expect (catches non-TTY-silent-exit regressions; memory: feedback_picker_interactive_expect_e2e) Gate 2 — PINNED_* version pin audit - Greps agent-network/bin/cli.ts for PINNED_(SERVER|NODE|DASHBOARD)_VERSION - Asserts each pin is published on npm via `npm view <pkg>@<ver>` - Audits ONLY pins that actually exist in cli.ts (dashboard is Vercel- deployed, may not be pinned — loop just skips missing entries, no false "missing PINNED_DASHBOARD" alarms) - Prevents the v0.10.0 / #194 class: PINNED_SERVER_VERSION points at an unpublished version → anet hub start silently hangs at fetch Gate 3 — release notes shape - Source A: docs/tests/release-vX.Y.Z*.md convention (prefix match, then grep fallback in the same dir) - Source B (fallback): `gh release view <tag> --json body` for the case where notes were pasted into the GitHub release but not into the repo file (double-safety so location drift doesn't silently pass Gate 3) - Asserts both `## Install` AND `## Upgrade` sections exist (prevents v0.10.2 class: only Upgrade section, new users had no install path), and that the Install section's version literal matches the gated tag (prevents stale-copy-from-old-release class) Design notes: - Report-only by design. Does NOT block `npm publish` (publishing remains a manual step on the maintainer's machine). Does NOT auto- publish. The verdict is informational; maintainer decides whether to proceed. The goal is to make the checks impossible to skip. - Total wall-clock under 10min (build + 3 parallel gates + verdict). - workflow_dispatch supports gating an arbitrary published version too (useful for retro-checking historical releases). Verified locally: - YAML parses cleanly with python3 yaml.safe_load - tarball-from-source pattern matches `npm publish` exactly (npm pack honors prepublishOnly + .npmignore) - The 3 historical ship-blockers (#136/#137 install/#194 PINNED/v0.10.2 notes) are all covered by at least one gate Refs: #261 (this issue), #265+#269 (CI baseline restored — release gate now has a trustworthy main-CI signal to layer on top of) * fix(#261 review): strip internal slugs / fix alpine→slim header / explicit --password Three review fixes from 通信龙: 1. Remove 4 `[[feedback_*]]` internal memory slugs from the file header comment — these were private notes that should not land in the public OSS repo. The user-visible facts (v0.10.0 PINNED mismatch, v0.10.2 notes split, #136 / #137 silent-exit class) are kept. 2. Header comment said "Gate 1 — install-path smoke (node:24-alpine…)" but the actual job uses node:24-slim (chosen over alpine for glibc coverage). Header now reads node:24-slim. The body rationale comment still mentions alpine because that's the chosen-over-X reasoning. 3. Gate 1 case 4 originally read the admin password from `jq -r .bootstrap_password admin-utok.json` with `|| echo anethub` fallback. Neither path is reliable: 3e4e190 (#261 P0-2) does not persist the bootstrap password into admin-utok.json (only username / user_id / token / created_at), and the `anethub` literal was the pre-fix default that no longer applies. Replaced with an explicit `--password "$GATE_PW"` (random per-run) passed to both `anet hub start` and `anet login`, so the smoke is deterministic and the bootstrap-password storage shape can change without breaking the gate. Also widened the /health poll from a single `sleep 5` to 30×1s so slow Docker startups don't false-fail Gate 1. * fix(#261 Gate 1): SIGPIPE-safe finite pipeline for password gen `tr -dc … < /dev/urandom | head -c 16` triggers SIGPIPE when head closes its input — under `set -o pipefail` the inner `tr` exits non-zero and the gate aborts with rc=141. Switched to a finite-input form: GATE_PW="ReleaseGate-$(head -c 32 /dev/urandom | sha256sum | head -c 16)" `head -c 32` reads a fixed 32 bytes then closes; the downstream `sha256sum` and `head -c 16` consume bounded output, so no upstream process gets SIGPIPE. Output is 16 lowercase hex chars from a cryptographic hash of 32 random bytes — sufficient unique per-run. Verified end-to-end in a clean `node:24-slim` container: case 3 — anet hub start with explicit --password ✓ /health 200 + admin-utok.json mode 600 case 4 — anet login with same --password ✓ login OK (matches "Logged in" marker) Both Gate 1 case 3 + case 4 PASS, rc=0. Locally confirms the new explicit-password approach works on the actual published preview tarball. --------- Co-authored-by: vansin <smartflowaiteam@gmail.com>
… Networks visibility (#273) Before this change, test-all.sh started one bun server up-front and ran all 4 suites against it. Two suites (V3 Networks, Config Priority) also try to `bun run src/index.ts &` themselves; that background bind hits EADDRINUSE (harmless in the bg case — the parent's server still serves), but the state-contamination from Base E2E (137 tests' worth of users/networks/ tasks in commhub.db) breaks downstream suites: - V3 Networks: completes 22 tests but the captured Results line interacts with the shared-DB pollution in ways that make run_suite's regex match the wrong line → reported as "0 ran (suite crashed)" even though the suite exits 0. - Config Priority: depends on `anet node create` with hand-written config.json; under the polluted shared DB this can't reach its Results line. This PR replaces the single up-front server start with a `reset_server` helper called between every suite. Each helper invocation: 1. pkill the previous bun server 2. waits for :9200 to actually free up (pkill is async — binding too soon re-triggers EADDRINUSE) 3. rm -rf /root/.commhub /root/.anet (fresh DB + fresh client state) 4. starts a new bun server in background 5. polls /health until 200 (max 15s) Local verification on the patched image, full test-all.sh run: before after ─────────────────────── ─────────────────────── Base E2E: 90 / 45 Base E2E: 90 / 45 (unchanged) V3 Auth: 25 / 0 V3 Auth: 25 / 0 (unchanged) V3 Networks: 0 ran ⚠ V3 Networks: 19 / 3 ✅ visible Config Pri: 0 ran ⚠ Config Pri: 0 ran ⚠ (separate, see below) TOTAL: 115 / 47 TOTAL: 134 / 49 Net: +19 PASS surfaced + +2 real fails surfaced (V3 Networks alpha/beta task — likely same root as Base E2E A3 in #266 audit). Config Priority still reports "0 ran" — that's a separate issue: `anet node create` now needs a real login first, but Config Priority intentionally writes a fake `token:"global-tok"` to verify the config priority resolution. That's a test-design problem (not orchestration), and needs a small refactor to do `anet login` first then layer the priority overrides on top. Tracked separately in #266 (Bucket C). Refs: #266 (Docker E2E audit), #265+#269 (CI baseline restored) Co-authored-by: vansin <smartflowaiteam@gmail.com>
…est-infra hygiene (#277) Following #266 round-1 audit, server commit 5c7bff2 had tightened the UTOK write path to require an explicit network_id on every send_task / send_reply / cancel_task / report_status. tests/docker-e2e.sh predates that tightening: 12 raw `curl POST /mcp` invocations each hardcoded their own JSON payload (bypassing the existing mcp_call helper) and none of them carried network_id. This refactor unifies the test against one entry point: - mcp_call() hoisted from line ~252 to line ~64, immediately after the new NETWORK_ID bootstrap. Now reachable from every tool-call site that follows. - Bootstrap section added: after the user registers, create an e2e-network and capture its network_id (fall back to /api/auth/me networks[0] if creation already exists from prior run). - mcp_call() now injects NETWORK_ID into ARGS if the caller hasn't supplied one (via jq, conditional on `has("network_id")`). Callers that want a different network can still override explicitly. - The 12 raw curls become `mcp_call "TOOL" '{...}'` invocations. Net: 51 ins / 57 del / 36-line reduction. Honest scope: this is test-infra hygiene and a prerequisite for any future server contract tightening (one change point: the helper). It does NOT by itself reduce the Base E2E fail count — local verification confirms: before refactor (after #273): 90 pass / 45 fail after refactor: 90 pass / 45 fail What changed under the hood: the immediate `permission_denied: network_id required` error site is gone (verified by per-tool curl probe), but the cascade has more layers than the round-1 audit hypothesized. Investigating one failing test revealed at least two more roots, both deferred to follow-up: Layer 2 — e2e-agent registration: test 8 launches `agent-node --alias e2e-agent --runtime codex-sdk` in the background without network/token wiring. The agent never registers with CommHub → all downstream alias-targeted tools (send_task, send_ack, send_reply) return alias_not_found. Layer 3 — assertion looseness: many tests grep server output for the literal "ok" (e.g. `echo "$RESP" | grep -q "ok"`). Both `{"ok":true}` and `{"ok":false,"error":"..."}` contain "ok", so those assertions produce false-positive PASS regardless of the actual server verdict. This means some of the 90 baseline passes are not real passes; the count under-states how broken things actually are. The original "Bucket A = 1 root, 25 cascades (~#268 pattern)" framing in the round-1 audit is therefore only partially correct: contracts tightening is one root, but the test set has been quietly accumulating multiple independent rots that the cascade was hiding. Refs: #266 round-1 audit (which this revises), #265 + #269 + #273 + #270 Co-authored-by: vansin <smartflowaiteam@gmail.com>
Author
Agent: 通信测试马 (claude-code-cli)
Dispatch: 通信龙 task 0d89d6f2 → push directly (mechanical low-risk, same pattern as #265)
Refs: #268 (4 L1 fails), #265 (cascade fix), #257
Summary
#265 cleared the L1 cascade; the next QA run showed 9/13 PASS, 4 fail — all 4 with the same
from_session_identity_mismatchsignature. This PR fixes the 4 test scripts.Root cause
bf564fb(#203 fix, 2026-05-28) tightened a server-side check: an NTOK request'sfrom_sessionmust match the alias the NTOK is bound to. Fourqa-*tests pre-date that fix and still send the wrong value, so the server (correctly) rejects them.agent-09"admin""agent-09"alice-agent"alice""alice-agent"alice-secret-agent"alice""alice-secret-agent"dash10-agent"admin""dash10-agent"Real isolation contract VERIFIED INTACT (not a security bug)
The biggest risk if this had been a real bug: cross-user isolation broken. I patched only
qa-hub-06b'sfrom_sessionand re-ran — all 11 isolation PINs PASS, including:The
#203check has silently been doing its job all along — these tests were just sending nonsense values that the safe-rm cascade was hiding.Bonus null-guard fix (qa-dash-08)
jq -r .message_idemits the literal string"null"when the field is missing, so[[ -n "$TASK_ID" ]]is true forTASK_ID="null". The test proceeded with garbage and mis-attributed the failure tosend_reply. Tightened to[[ -n "$TASK_ID" && "$TASK_ID" != "null" ]]so the assertion fails at the actual point of breakage.A wider null-guard hygiene sweep is out of scope for this PR — noted in #268 as a follow-up.
Test plan
qa-hub-09-task-state-machine→ PASS (replied/failed/cancelled + terminal no-op)qa-hub-06b-cross-user-isolation→ PASS (11 isolation PINs incl. IDOR/inject/mint)qa-dash-08-cross-account-views→ PASS (nodes/stats/completions/tasks/tokens/task_events)qa-dash-10-incremental-poll→ PASS (since= filter on both endpoints + format gap)anet QA (v0)on this PR returns greenanet QA (v0)on main returns 13/13 PASS for the first time since826c8a7Out of scope
Base E2E45 fail) — tracked in E2E Base 137 — 45 fail on main (independent of #257 L1 cascade) #266, independent harness.