fix(#268): align qa-* test from_session with ntok-bound alias — restore 13/13 L1 green by vansin · Pull Request #269 · sleep2agi/agent-network

vansin · 2026-06-28T00:40:38Z

Author

Agent: 通信测试马 (claude-code-cli)
Dispatch: 通信龙 task 0d89d6f2 → push directly (mechanical low-risk, same pattern as #265)
Refs: #268 (4 L1 fails), #265 (cascade fix), #257

Summary

#265 cleared the L1 cascade; the next QA run showed 9/13 PASS, 4 fail — all 4 with the same from_session_identity_mismatch signature. This PR fixes the 4 test scripts.

Root cause

bf564fb (#203 fix, 2026-05-28) tightened a server-side check: an NTOK request's from_session must match the alias the NTOK is bound to. Four qa-* tests pre-date that fix and still send the wrong value, so the server (correctly) rejects them.

test	NTOK bound to	was	now
qa-hub-09-task-state-machine	`agent-09`	`"admin"`	`"agent-09"`
qa-hub-06b-cross-user-isolation	`alice-agent`	`"alice"`	`"alice-agent"`
qa-dash-08-cross-account-views	`alice-secret-agent`	`"alice"`	`"alice-secret-agent"`
qa-dash-10-incremental-poll	`dash10-agent`	`"admin"`	`"dash10-agent"`

Real isolation contract VERIFIED INTACT (not a security bug)

The biggest risk if this had been a real bug: cross-user isolation broken. I patched only qa-hub-06b's from_session and re-ran — all 11 isolation PINs PASS, including:

[4] PIN: bob's /api/networks does NOT include alice-private    ✓
[5] PIN: bob's /api/tasks does NOT include alice's task        ✓
[6] PIN: bob's /api/status does NOT include alice-agent        ✓
[7] PIN: bob's /api/messages does NOT contain alice's secret   ✓
[8] PIN: bob DIRECT IDOR — query alice's network_id            ✓ (HTTP 403)
[9] PIN: bob CANNOT POST /api/task to alice-agent              ✓ (HTTP 403)
[10] PIN: bob CANNOT mint ntok for alice's network             ✓ (HTTP 400)

The #203 check has silently been doing its job all along — these tests were just sending nonsense values that the safe-rm cascade was hiding.

Bonus null-guard fix (qa-dash-08)

jq -r .message_id emits the literal string "null" when the field is missing, so [[ -n "$TASK_ID" ]] is true for TASK_ID="null". The test proceeded with garbage and mis-attributed the failure to send_reply. Tightened to [[ -n "$TASK_ID" && "$TASK_ID" != "null" ]] so the assertion fails at the actual point of breakage.

A wider null-guard hygiene sweep is out of scope for this PR — noted in #268 as a follow-up.

Test plan

Local Docker build + run all 4 cases → PASS
- qa-hub-09-task-state-machine → PASS (replied/failed/cancelled + terminal no-op)
- qa-hub-06b-cross-user-isolation → PASS (11 isolation PINs incl. IDOR/inject/mint)
- qa-dash-08-cross-account-views → PASS (nodes/stats/completions/tasks/tokens/task_events)
- qa-dash-10-incremental-poll → PASS (since= filter on both endpoints + format gap)
anet QA (v0) on this PR returns green
After merge: anet QA (v0) on main returns 13/13 PASS for the first time since 826c8a7

Out of scope

Docker E2E (Base E2E 45 fail) — tracked in E2E Base 137 — 45 fail on main (independent of #257 L1 cascade) #266, independent harness.
Broader null-guard sweep — tracked against L1 contract tests — 4 real bugs surfaced after #265 cascade cleared (alice send_reply / send_task / messages-since / from_session_identity_mismatch) #268 as hygiene follow-up.

…re 13/13 L1 green After #265 cleared the safe-rm.sh cascade, 4 L1 contract tests surfaced as failing with one shared signature: {"ok":false, "error":"from_session_identity_mismatch", "message":"network token from_session does not match token-bound node alias"} Root cause: bf564fb (#203 fix, 2026-05-28) tightened the server-side check so that an NTOK request's `from_session` must match the alias the NTOK is bound to. Four qa-* tests pre-date that fix and still send the wrong value: | test | NTOK bound to | was | now | |---------------------------------------------|---------------------|------------------|----------------------| | qa-hub-09-task-state-machine | agent-09 | "admin" | "agent-09" | | qa-hub-06b-cross-user-isolation | alice-agent | "alice" | "alice-agent" | | qa-dash-08-cross-account-views | alice-secret-agent | "alice" | "alice-secret-agent" | | qa-dash-10-incremental-poll | dash10-agent | "admin" | "dash10-agent" | Verified real isolation contract still intact: ran patched qa-hub-06b and all 11 PINs PASS — bob still gets 403/400 on every cross-tenant probe (networks list / tasks list / status / messages / direct IDOR / inject / mint). The #203 identity check has been silently doing its job throughout; these tests were just sending nonsense values that the cascade hid. Also fixed `[[ -n "$TASK_ID" ]]` null-guard in qa-dash-08: `jq -r .message_id` emits the literal string "null" when the field is missing, so `-n null` is true and the test would proceed with TASK_ID="null", causing send_reply to fail with garbage in_reply_to and the failure to be mis-attributed. Tightened to `[[ -n "$TASK_ID" && "$TASK_ID" != "null" ]]` so the assertion fails at the actual point of breakage. Local verification — all 4 cases now PASS in Docker: qa-hub-09-task-state-machine → PASS (replied/failed/cancelled + terminal no-op) qa-hub-06b-cross-user-isolation → PASS (all 11 isolation PINs incl. IDOR/inject/mint) qa-dash-08-cross-account-views → PASS (nodes/stats/completions/tasks/tokens/task_events) qa-dash-10-incremental-poll → PASS (since= filter on both endpoints + format gap pinned) After this merges, main `anet QA (v0)` should be 13/13 PASS — first green since 826c8a7 (12 days, 133 commits). Out of scope: a broader null-guard hygiene sweep across other tests (`jq -r .field` returns "null") — recorded against #268 as a follow-up hygiene task, not gated on this PR. Refs: #268 (4 L1 fails surfaced after #265), #265 (cascade fix), #257

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7baaec7db6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-28T00:45:47Z

 # dispatch + reply (so /api/completions gets a row)
 ARG=$(jq -nc --arg net "$ALICE_NET" \
-  '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice"}')
+  '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice-secret-agent"}')


Keep the from_name probe seeded with a matching sender

With this change, the seeded alice-confidential-task row is written with from_name=alice-secret-agent because send_task stores from_session as the task sender, but step [7] still queries /api/tasks?from_name=alice and looks for that content. The /api/tasks handler applies an exact from_name = ? predicate (server/src/index.ts:1979), so this probe now returns zero rows even if the documented from_name=alice dashboard IDOR path regresses; seed this row through Alice's UTOK or update the probe to match the actual sender.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-28T00:45:48Z

 # dispatch + reply (so /api/completions gets a row)
 ARG=$(jq -nc --arg net "$ALICE_NET" \
-  '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice"}')
+  '{alias:"alice-secret-agent",task:"alice-confidential-task",priority:"normal",network_id:$net,from_session:"alice-secret-agent"}')


Seed a real completion before checking completion isolation

Now that this setup gets past send_task, the /api/completions probe still has no Alice completion to protect: send_reply only updates the task/reply path, while completion rows are inserted by report_completion (server/src/tools.ts:442-463). As a result step [5] passes with an empty .completions array even if /api/completions stops applying tenant scope; add a report_completion call for Alice before Bob's /api/completions check.

Useful? React with 👍 / 👎.

…lease-notes shape (#270) * feat(#261): release-gate workflow — install smoke + PINNED audit + release-notes shape Encodes the manual Method B release SOP as automated, report-only gates that run on every tag push (or workflow_dispatch). Closes the gap behind several recent ship-blockers that the human checklist let through. Three gates, all independent jobs (parallel after a shared tarball build): Gate 1 — install-path smoke (real TTY) - node:24-slim (NOT alpine, to keep glibc coverage; alpine masked a feishu-image agent-runtime regression last cycle) - npm install -g from the just-built tarball (NOT npm.org), so broken bundles fail BEFORE publish (catches #136 preview.4 / #137 class) - 5 cases: --version matches tag / hub --help shows stop+status+start / hub start + /health 200 + admin-utok 600 / login --hub --user --pw / node create wizard reaches first prompt under script -qc + expect (catches non-TTY-silent-exit regressions; memory: feedback_picker_interactive_expect_e2e) Gate 2 — PINNED_* version pin audit - Greps agent-network/bin/cli.ts for PINNED_(SERVER|NODE|DASHBOARD)_VERSION - Asserts each pin is published on npm via `npm view <pkg>@<ver>` - Audits ONLY pins that actually exist in cli.ts (dashboard is Vercel- deployed, may not be pinned — loop just skips missing entries, no false "missing PINNED_DASHBOARD" alarms) - Prevents the v0.10.0 / #194 class: PINNED_SERVER_VERSION points at an unpublished version → anet hub start silently hangs at fetch Gate 3 — release notes shape - Source A: docs/tests/release-vX.Y.Z*.md convention (prefix match, then grep fallback in the same dir) - Source B (fallback): `gh release view <tag> --json body` for the case where notes were pasted into the GitHub release but not into the repo file (double-safety so location drift doesn't silently pass Gate 3) - Asserts both `## Install` AND `## Upgrade` sections exist (prevents v0.10.2 class: only Upgrade section, new users had no install path), and that the Install section's version literal matches the gated tag (prevents stale-copy-from-old-release class) Design notes: - Report-only by design. Does NOT block `npm publish` (publishing remains a manual step on the maintainer's machine). Does NOT auto- publish. The verdict is informational; maintainer decides whether to proceed. The goal is to make the checks impossible to skip. - Total wall-clock under 10min (build + 3 parallel gates + verdict). - workflow_dispatch supports gating an arbitrary published version too (useful for retro-checking historical releases). Verified locally: - YAML parses cleanly with python3 yaml.safe_load - tarball-from-source pattern matches `npm publish` exactly (npm pack honors prepublishOnly + .npmignore) - The 3 historical ship-blockers (#136/#137 install/#194 PINNED/v0.10.2 notes) are all covered by at least one gate Refs: #261 (this issue), #265+#269 (CI baseline restored — release gate now has a trustworthy main-CI signal to layer on top of) * fix(#261 review): strip internal slugs / fix alpine→slim header / explicit --password Three review fixes from 通信龙: 1. Remove 4 `[[feedback_*]]` internal memory slugs from the file header comment — these were private notes that should not land in the public OSS repo. The user-visible facts (v0.10.0 PINNED mismatch, v0.10.2 notes split, #136 / #137 silent-exit class) are kept. 2. Header comment said "Gate 1 — install-path smoke (node:24-alpine…)" but the actual job uses node:24-slim (chosen over alpine for glibc coverage). Header now reads node:24-slim. The body rationale comment still mentions alpine because that's the chosen-over-X reasoning. 3. Gate 1 case 4 originally read the admin password from `jq -r .bootstrap_password admin-utok.json` with `|| echo anethub` fallback. Neither path is reliable: 3e4e190 (#261 P0-2) does not persist the bootstrap password into admin-utok.json (only username / user_id / token / created_at), and the `anethub` literal was the pre-fix default that no longer applies. Replaced with an explicit `--password "$GATE_PW"` (random per-run) passed to both `anet hub start` and `anet login`, so the smoke is deterministic and the bootstrap-password storage shape can change without breaking the gate. Also widened the /health poll from a single `sleep 5` to 30×1s so slow Docker startups don't false-fail Gate 1. * fix(#261 Gate 1): SIGPIPE-safe finite pipeline for password gen `tr -dc … < /dev/urandom | head -c 16` triggers SIGPIPE when head closes its input — under `set -o pipefail` the inner `tr` exits non-zero and the gate aborts with rc=141. Switched to a finite-input form: GATE_PW="ReleaseGate-$(head -c 32 /dev/urandom | sha256sum | head -c 16)" `head -c 32` reads a fixed 32 bytes then closes; the downstream `sha256sum` and `head -c 16` consume bounded output, so no upstream process gets SIGPIPE. Output is 16 lowercase hex chars from a cryptographic hash of 32 random bytes — sufficient unique per-run. Verified end-to-end in a clean `node:24-slim` container: case 3 — anet hub start with explicit --password ✓ /health 200 + admin-utok.json mode 600 case 4 — anet login with same --password ✓ login OK (matches "Logged in" marker) Both Gate 1 case 3 + case 4 PASS, rc=0. Locally confirms the new explicit-password approach works on the actual published preview tarball. --------- Co-authored-by: vansin <smartflowaiteam@gmail.com>

… Networks visibility (#273) Before this change, test-all.sh started one bun server up-front and ran all 4 suites against it. Two suites (V3 Networks, Config Priority) also try to `bun run src/index.ts &` themselves; that background bind hits EADDRINUSE (harmless in the bg case — the parent's server still serves), but the state-contamination from Base E2E (137 tests' worth of users/networks/ tasks in commhub.db) breaks downstream suites: - V3 Networks: completes 22 tests but the captured Results line interacts with the shared-DB pollution in ways that make run_suite's regex match the wrong line → reported as "0 ran (suite crashed)" even though the suite exits 0. - Config Priority: depends on `anet node create` with hand-written config.json; under the polluted shared DB this can't reach its Results line. This PR replaces the single up-front server start with a `reset_server` helper called between every suite. Each helper invocation: 1. pkill the previous bun server 2. waits for :9200 to actually free up (pkill is async — binding too soon re-triggers EADDRINUSE) 3. rm -rf /root/.commhub /root/.anet (fresh DB + fresh client state) 4. starts a new bun server in background 5. polls /health until 200 (max 15s) Local verification on the patched image, full test-all.sh run: before after ─────────────────────── ─────────────────────── Base E2E: 90 / 45 Base E2E: 90 / 45 (unchanged) V3 Auth: 25 / 0 V3 Auth: 25 / 0 (unchanged) V3 Networks: 0 ran ⚠ V3 Networks: 19 / 3 ✅ visible Config Pri: 0 ran ⚠ Config Pri: 0 ran ⚠ (separate, see below) TOTAL: 115 / 47 TOTAL: 134 / 49 Net: +19 PASS surfaced + +2 real fails surfaced (V3 Networks alpha/beta task — likely same root as Base E2E A3 in #266 audit). Config Priority still reports "0 ran" — that's a separate issue: `anet node create` now needs a real login first, but Config Priority intentionally writes a fake `token:"global-tok"` to verify the config priority resolution. That's a test-design problem (not orchestration), and needs a small refactor to do `anet login` first then layer the priority overrides on top. Tracked separately in #266 (Bucket C). Refs: #266 (Docker E2E audit), #265+#269 (CI baseline restored) Co-authored-by: vansin <smartflowaiteam@gmail.com>

…est-infra hygiene (#277) Following #266 round-1 audit, server commit 5c7bff2 had tightened the UTOK write path to require an explicit network_id on every send_task / send_reply / cancel_task / report_status. tests/docker-e2e.sh predates that tightening: 12 raw `curl POST /mcp` invocations each hardcoded their own JSON payload (bypassing the existing mcp_call helper) and none of them carried network_id. This refactor unifies the test against one entry point: - mcp_call() hoisted from line ~252 to line ~64, immediately after the new NETWORK_ID bootstrap. Now reachable from every tool-call site that follows. - Bootstrap section added: after the user registers, create an e2e-network and capture its network_id (fall back to /api/auth/me networks[0] if creation already exists from prior run). - mcp_call() now injects NETWORK_ID into ARGS if the caller hasn't supplied one (via jq, conditional on `has("network_id")`). Callers that want a different network can still override explicitly. - The 12 raw curls become `mcp_call "TOOL" '{...}'` invocations. Net: 51 ins / 57 del / 36-line reduction. Honest scope: this is test-infra hygiene and a prerequisite for any future server contract tightening (one change point: the helper). It does NOT by itself reduce the Base E2E fail count — local verification confirms: before refactor (after #273): 90 pass / 45 fail after refactor: 90 pass / 45 fail What changed under the hood: the immediate `permission_denied: network_id required` error site is gone (verified by per-tool curl probe), but the cascade has more layers than the round-1 audit hypothesized. Investigating one failing test revealed at least two more roots, both deferred to follow-up: Layer 2 — e2e-agent registration: test 8 launches `agent-node --alias e2e-agent --runtime codex-sdk` in the background without network/token wiring. The agent never registers with CommHub → all downstream alias-targeted tools (send_task, send_ack, send_reply) return alias_not_found. Layer 3 — assertion looseness: many tests grep server output for the literal "ok" (e.g. `echo "$RESP" | grep -q "ok"`). Both `{"ok":true}` and `{"ok":false,"error":"..."}` contain "ok", so those assertions produce false-positive PASS regardless of the actual server verdict. This means some of the 90 baseline passes are not real passes; the count under-states how broken things actually are. The original "Bucket A = 1 root, 25 cascades (~#268 pattern)" framing in the round-1 audit is therefore only partially correct: contracts tightening is one root, but the test set has been quietly accumulating multiple independent rots that the cascade was hiding. Refs: #266 round-1 audit (which this revises), #265 + #269 + #273 + #270 Co-authored-by: vansin <smartflowaiteam@gmail.com>

s2agi merged commit 18a52d5 into main Jun 28, 2026
2 of 4 checks passed

s2agi deleted the fix/268-test-from-session branch June 28, 2026 00:41

vansin mentioned this pull request Jun 28, 2026

feat(#261): release-gate workflow — install smoke + PINNED audit + release-notes shape #270

Merged

3 tasks

s2agi mentioned this pull request Jun 28, 2026

📦 [roadmap] 下一个大版本规划 — IM 正式化 + Dashboard 军团管理 + 生产级可靠/安全/性能 #262

Open

23 tasks

chatgpt-codex-connector Bot reviewed Jun 28, 2026

View reviewed changes

This was referenced Jun 28, 2026

E2E Base 137 — 45 fail on main (independent of #257 L1 cascade) #266

Open

fix(#266 Bucket B): reset server+DB between e2e suites — recover V3 Networks visibility (+19 tests) #273

Merged

vansin mentioned this pull request Jun 28, 2026

refactor(#266 A1): hoist mcp_call helper + auto-inject NETWORK_ID — test-infra hygiene (does NOT reduce fail count) #277

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(#268): align qa-* test from_session with ntok-bound alias — restore 13/13 L1 green#269

fix(#268): align qa-* test from_session with ntok-bound alias — restore 13/13 L1 green#269
s2agi merged 1 commit into
mainfrom
fix/268-test-from-session

vansin commented Jun 28, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

vansin commented Jun 28, 2026

Author

Summary

Root cause

Real isolation contract VERIFIED INTACT (not a security bug)

Bonus null-guard fix (qa-dash-08)

Test plan

Out of scope

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants