fix(#266 Bucket B): reset server+DB between e2e suites — recover V3 Networks visibility (+19 tests)#273
Conversation
… Networks visibility
Before this change, test-all.sh started one bun server up-front and ran all
4 suites against it. Two suites (V3 Networks, Config Priority) also try to
`bun run src/index.ts &` themselves; that background bind hits EADDRINUSE
(harmless in the bg case — the parent's server still serves), but the
state-contamination from Base E2E (137 tests' worth of users/networks/
tasks in commhub.db) breaks downstream suites:
- V3 Networks: completes 22 tests but the captured Results line interacts
with the shared-DB pollution in ways that make run_suite's regex
match the wrong line → reported as "0 ran (suite crashed)" even
though the suite exits 0.
- Config Priority: depends on `anet node create` with hand-written
config.json; under the polluted shared DB this can't reach its
Results line.
This PR replaces the single up-front server start with a `reset_server`
helper called between every suite. Each helper invocation:
1. pkill the previous bun server
2. waits for :9200 to actually free up (pkill is async — binding too
soon re-triggers EADDRINUSE)
3. rm -rf /root/.commhub /root/.anet (fresh DB + fresh client state)
4. starts a new bun server in background
5. polls /health until 200 (max 15s)
Local verification on the patched image, full test-all.sh run:
before after
─────────────────────── ───────────────────────
Base E2E: 90 / 45 Base E2E: 90 / 45 (unchanged)
V3 Auth: 25 / 0 V3 Auth: 25 / 0 (unchanged)
V3 Networks: 0 ran ⚠ V3 Networks: 19 / 3 ✅ visible
Config Pri: 0 ran ⚠ Config Pri: 0 ran ⚠ (separate, see below)
TOTAL: 115 / 47 TOTAL: 134 / 49
Net: +19 PASS surfaced + +2 real fails surfaced (V3 Networks alpha/beta
task — likely same root as Base E2E A3 in #266 audit).
Config Priority still reports "0 ran" — that's a separate issue:
`anet node create` now needs a real login first, but Config Priority
intentionally writes a fake `token:"global-tok"` to verify the config
priority resolution. That's a test-design problem (not orchestration),
and needs a small refactor to do `anet login` first then layer the
priority overrides on top. Tracked separately in #266 (Bucket C).
Refs: #266 (Docker E2E audit), #265+#269 (CI baseline restored)
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e7a95a7381
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for _ in $(seq 1 30); do | ||
| if ! (exec 3<>/dev/tcp/127.0.0.1/9200) 2>/dev/null; then break; fi | ||
| exec 3>&- 2>/dev/null || true | ||
| sleep 0.25 | ||
| done |
There was a problem hiding this comment.
Abort when the old server never releases the port
If the previous server takes longer than this loop to exit, or if another non-matching process owns :9200, the loop simply falls through and the function proceeds to delete the DB and start a new Bun process anyway. That new process will hit EADDRINUSE, while the later /health probe can still succeed against the stale server, so the next suite runs with the old in-memory/open DB state even though reset_server reports success. Track whether the port actually became free and return failure before deleting state or starting the replacement server.
Useful? React with 👍 / 👎.
| return 1 | ||
| } | ||
|
|
||
| reset_server |
There was a problem hiding this comment.
When reset_server returns non-zero (for example, Bun exits or /health never becomes ready), the runner still invokes the suite because the return value is ignored. Several suite scripts start their own bun run src/index.ts & while run_suite captures stdout; if no parent server is actually up, that background server can stay alive and keep the command-substitution pipe open, turning a setup failure into a CI timeout instead of a clear failed reset. Gate each suite on a successful reset or count the reset failure immediately.
Useful? React with 👍 / 👎.
| reset_server | ||
| run_suite "V3 Networks (22)" "/app/test-networks.sh 2>&1" | ||
| reset_server | ||
| run_suite "Config Priority (16)" "/app/test-config.sh 2>&1" |
There was a problem hiding this comment.
Do not pre-bind 9200 for self-starting suites
For the V3 Networks and Config Priority suites, this reset starts a parent server on :9200 immediately before invoking scripts that also run cd /app/server && bun run src/index.ts & (tests/docker-e2e-networks.sh:13 and tests/docker-config-priority.sh:17). Those suite-local server starts now always hit EADDRINUSE, so the suites silently exercise the parent server instead of their own startup path or any per-suite server environment. Use a stop-and-wipe reset for self-starting suites, or remove the suite-local server starts so the ownership is unambiguous.
Useful? React with 👍 / 👎.
…est-infra hygiene (#277) Following #266 round-1 audit, server commit 5c7bff2 had tightened the UTOK write path to require an explicit network_id on every send_task / send_reply / cancel_task / report_status. tests/docker-e2e.sh predates that tightening: 12 raw `curl POST /mcp` invocations each hardcoded their own JSON payload (bypassing the existing mcp_call helper) and none of them carried network_id. This refactor unifies the test against one entry point: - mcp_call() hoisted from line ~252 to line ~64, immediately after the new NETWORK_ID bootstrap. Now reachable from every tool-call site that follows. - Bootstrap section added: after the user registers, create an e2e-network and capture its network_id (fall back to /api/auth/me networks[0] if creation already exists from prior run). - mcp_call() now injects NETWORK_ID into ARGS if the caller hasn't supplied one (via jq, conditional on `has("network_id")`). Callers that want a different network can still override explicitly. - The 12 raw curls become `mcp_call "TOOL" '{...}'` invocations. Net: 51 ins / 57 del / 36-line reduction. Honest scope: this is test-infra hygiene and a prerequisite for any future server contract tightening (one change point: the helper). It does NOT by itself reduce the Base E2E fail count — local verification confirms: before refactor (after #273): 90 pass / 45 fail after refactor: 90 pass / 45 fail What changed under the hood: the immediate `permission_denied: network_id required` error site is gone (verified by per-tool curl probe), but the cascade has more layers than the round-1 audit hypothesized. Investigating one failing test revealed at least two more roots, both deferred to follow-up: Layer 2 — e2e-agent registration: test 8 launches `agent-node --alias e2e-agent --runtime codex-sdk` in the background without network/token wiring. The agent never registers with CommHub → all downstream alias-targeted tools (send_task, send_ack, send_reply) return alias_not_found. Layer 3 — assertion looseness: many tests grep server output for the literal "ok" (e.g. `echo "$RESP" | grep -q "ok"`). Both `{"ok":true}` and `{"ok":false,"error":"..."}` contain "ok", so those assertions produce false-positive PASS regardless of the actual server verdict. This means some of the 90 baseline passes are not real passes; the count under-states how broken things actually are. The original "Bucket A = 1 root, 25 cascades (~#268 pattern)" framing in the round-1 audit is therefore only partially correct: contracts tightening is one root, but the test set has been quietly accumulating multiple independent rots that the cascade was hiding. Refs: #266 round-1 audit (which this revises), #265 + #269 + #273 + #270 Co-authored-by: vansin <smartflowaiteam@gmail.com>
Author
Agent: 通信测试马 (claude-code-cli)
Dispatch: 通信龙 task d3b16d33 (Bucket B is my test-infra lane, mechanical)
Refs: #266 (Docker E2E audit), #265+#269 (CI baseline restored)
Summary
Replaces the single up-front server start in
tests/test-all.shwith areset_serverhelper called between every suite. Each call kills the previous bun server, waits for :9200 to free, wipes~/.commhub+~/.anet, starts a fresh server, and waits for/health200.This restores V3 Networks visibility — the suite was completing its 22 tests but
run_suite's regex was matching the wrong line under shared-state pollution, so CI reported0 ran (suite crashed).Before vs after (local Docker)
Net: +19 PASS surfaced, +2 real fails surfaced (V3 Networks
alpha task missing/beta task missing— likely same root as Base E2E Bucket A3 per #266 audit).Why this works
pkill -f 'bun.*src/index.ts'is async — binding :9200 too soon re-triggers EADDRINUSE. The wait loop (/dev/tcp/127.0.0.1/9200open probe) blocks until the previous bun has actually released the port.rm -rf /root/.commhub /root/.anetgives each suite a fresh SQLite DB + clean client state, so V3 Networks doesn't inherit Base E2E's 137 tests' worth of users/networks/tasks.Why Config Priority is still red (not this PR's scope)
anet node createnow needs a realanet loginfirst, buttest-config.shintentionally writes a faketoken:"global-tok"to verify the config-priority resolution chain. That's a test-design problem requiring a small refactor (login first, then layer the priority overrides on top), not orchestration. Tracked as Bucket C in #266 — needs its own PR.Test plan
test-all.shrun → before/after numbers as aboveResults: 20 passed, 2 failedline (standalone post-reset_server probe confirms)Docker E2E Testsworkflow shows V3 Networks reporting real numbers (no longer "0 ran")Docker E2E Testsshows +19 visible tests (TOTAL 134/49, was 115/47)What's NOT in scope
Refs: #266 (Docker E2E audit comment 4823611314 has the full bucketization)