Skip to content

feat(#261): release-gate workflow — install smoke + PINNED audit + release-notes shape#270

Merged
s2agi merged 3 commits into
mainfrom
feat/261-release-gate-workflow
Jun 28, 2026
Merged

feat(#261): release-gate workflow — install smoke + PINNED audit + release-notes shape#270
s2agi merged 3 commits into
mainfrom
feat/261-release-gate-workflow

Conversation

@vansin

@vansin vansin commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Author

Agent: 通信测试马 (claude-code-cli)
Dispatch: 通信龙 task 161b0b63 (#257#268 line closed, CI baseline green, time to layer release gate on top)
Refs: #261 (P1 — no release CI gate), depends on #265+#269 (CI baseline restored)

Summary

Encodes the manual Method B release SOP as automated, report-only gates that run on every tag push (or workflow_dispatch). Closes the gap behind several recent ship-blockers that the human checklist let through.

build-tarball ──► gate-1 install smoke (real TTY)
              ├─► gate-2 PINNED_* audit
              └─► gate-3 release-notes shape  ──► verdict

What each gate catches

Gate 1 — install-path smoke (real TTY)

Installs the just-built tarball (not what's on npm) into a clean node:24-slim container and exercises 5 cases:

case what it pins
anet --version matches tag wrong version baked into the bundle
anet hub --help shows start, stop, status #241-class regression to contextual --help routing
anet hub start → /health 200 + admin-utok mode 600 core boot path
anet login --hub URL --user --pw succeeds login flag contract
anet node create reaches wizard's first prompt under script -qc + expect #136 preview.4 / #137 silent-exit class — caught only with real TTY, not non-TTY < /dev/null

node:24-slim chosen over alpine for glibc/musl coverage (alpine masked a feishu-image agent-runtime regression last cycle).

Gate 2 — PINNED_* version pin audit

Greps agent-network/bin/cli.ts for PINNED_(SERVER|NODE|DASHBOARD)_VERSION and asserts each existing pin is actually published on npm.

Gate 3 — release notes shape

Double-source check so location drift doesn't silently pass:

  1. Source Adocs/tests/release-vX.Y.Z*.md convention (prefix match, then grep fallback in same dir).
  2. Source B (fallback)gh release view <tag> --json body for the case where notes were pasted only into the GitHub release.

Then asserts:

  • Both ## Install AND ## Upgrade sections exist (v0.10.2 shipped with only ## Upgrade — new users had no install path).
  • The ## Install section's version literal matches the gated tag (catches stale notes someone copied from an old release but forgot to bump).

Verdict (aggregate)

A final job summarizes pass/fail across the 3 gates. Report-only by design:

  • Does not block npm publish (publishing stays a manual step on the maintainer's machine).
  • Does not auto-publish.
  • Maintainer reads the verdict and decides whether to proceed.

Design rationale

decision choice why
base image node:24-slim not alpine — keep glibc-only binary coverage (memory: [[reference_anet_agent_docker_needs_glibc_bun]])
dashboard PIN skip when missing not all releases pin dashboard in CLI; audit only what's wired in
notes source file → gh release view fallback location can drift; double-safety prevents silent pass
publish action none report-only, no surprises; maintainer SOP unchanged
trigger tag push + workflow_dispatch tag = real gate; dispatch = retro-check historical versions
timeout <10min total parallel gates after shared build

What's NOT in scope

  • Auto-publish to npm — explicitly not done.
  • Replacing the v0.11 onboarding 5-scenario suite (different layer; that's contract-test, this is install-path).
  • Hardening lint.yml / qa.yml / e2e-docker.yml — those are PR/push gates, this is tag gate.

Verification

  • python3 -c 'import yaml; yaml.safe_load(open(".github/workflows/release.yml"))' → parses cleanly, 5 jobs as expected
  • All 3 historical ship-blockers (#136 install-path / #137 wizard silent exit / #194 PINNED hang / v0.10.2 missing Install section) covered by at least one gate

Test plan

  • PR CI: existing workflows (lint, e2e-docker, anet QA) still pass — release.yml does not trigger on PR (only tag push + dispatch)
  • After merge: next tag push fires release-gate workflow
  • Verdict job surfaces correct PASS/FAIL aggregate

Reviewer focus per 通信龙 (gate-by-gate read)

  1. Gate 1 — TTY drive correctness (printf-built expect script, script -qc wrapper)
  2. Gate 2 — pin grep regex + npm view audit semantics
  3. Gate 3 — notes location fallback ordering + ## Install version-literal check
  4. Top-level — report-only enforcement (verdict job exits 0 even on gate failure)

…lease-notes shape

Encodes the manual Method B release SOP as automated, report-only gates
that run on every tag push (or workflow_dispatch). Closes the gap behind
several recent ship-blockers that the human checklist let through.

Three gates, all independent jobs (parallel after a shared tarball build):

  Gate 1 — install-path smoke (real TTY)
    - node:24-slim (NOT alpine, to keep glibc coverage; alpine masked a
      feishu-image agent-runtime regression last cycle)
    - npm install -g from the just-built tarball (NOT npm.org), so broken
      bundles fail BEFORE publish (catches #136 preview.4 / #137 class)
    - 5 cases: --version matches tag / hub --help shows stop+status+start
      / hub start + /health 200 + admin-utok 600 / login --hub --user --pw
      / node create wizard reaches first prompt under script -qc + expect
      (catches non-TTY-silent-exit regressions; memory:
      feedback_picker_interactive_expect_e2e)

  Gate 2 — PINNED_* version pin audit
    - Greps agent-network/bin/cli.ts for PINNED_(SERVER|NODE|DASHBOARD)_VERSION
    - Asserts each pin is published on npm via `npm view <pkg>@<ver>`
    - Audits ONLY pins that actually exist in cli.ts (dashboard is Vercel-
      deployed, may not be pinned — loop just skips missing entries, no
      false "missing PINNED_DASHBOARD" alarms)
    - Prevents the v0.10.0 / #194 class: PINNED_SERVER_VERSION points at
      an unpublished version → anet hub start silently hangs at fetch

  Gate 3 — release notes shape
    - Source A: docs/tests/release-vX.Y.Z*.md convention (prefix match,
      then grep fallback in the same dir)
    - Source B (fallback): `gh release view <tag> --json body` for the
      case where notes were pasted into the GitHub release but not into
      the repo file (double-safety so location drift doesn't silently
      pass Gate 3)
    - Asserts both `## Install` AND `## Upgrade` sections exist
      (prevents v0.10.2 class: only Upgrade section, new users had no
      install path), and that the Install section's version literal
      matches the gated tag (prevents stale-copy-from-old-release class)

Design notes:
  - Report-only by design. Does NOT block `npm publish` (publishing
    remains a manual step on the maintainer's machine). Does NOT auto-
    publish. The verdict is informational; maintainer decides whether
    to proceed. The goal is to make the checks impossible to skip.
  - Total wall-clock under 10min (build + 3 parallel gates + verdict).
  - workflow_dispatch supports gating an arbitrary published version too
    (useful for retro-checking historical releases).

Verified locally:
  - YAML parses cleanly with python3 yaml.safe_load
  - tarball-from-source pattern matches `npm publish` exactly (npm pack
    honors prepublishOnly + .npmignore)
  - The 3 historical ship-blockers (#136/#137 install/#194 PINNED/v0.10.2
    notes) are all covered by at least one gate

Refs: #261 (this issue), #265+#269 (CI baseline restored — release gate
now has a trustworthy main-CI signal to layer on top of)
…licit --password

Three review fixes from 通信龙:

1. Remove 4 `[[feedback_*]]` internal memory slugs from the file header
   comment — these were private notes that should not land in the public
   OSS repo. The user-visible facts (v0.10.0 PINNED mismatch, v0.10.2
   notes split, #136 / #137 silent-exit class) are kept.

2. Header comment said "Gate 1 — install-path smoke (node:24-alpine…)"
   but the actual job uses node:24-slim (chosen over alpine for glibc
   coverage). Header now reads node:24-slim. The body rationale comment
   still mentions alpine because that's the chosen-over-X reasoning.

3. Gate 1 case 4 originally read the admin password from
   `jq -r .bootstrap_password admin-utok.json` with `|| echo anethub`
   fallback. Neither path is reliable: 3e4e190 (#261 P0-2) does not
   persist the bootstrap password into admin-utok.json (only username
   / user_id / token / created_at), and the `anethub` literal was the
   pre-fix default that no longer applies. Replaced with an explicit
   `--password "$GATE_PW"` (random per-run) passed to both
   `anet hub start` and `anet login`, so the smoke is deterministic
   and the bootstrap-password storage shape can change without
   breaking the gate. Also widened the /health poll from a single
   `sleep 5` to 30×1s so slow Docker startups don't false-fail Gate 1.
`tr -dc … < /dev/urandom | head -c 16` triggers SIGPIPE when head closes
its input — under `set -o pipefail` the inner `tr` exits non-zero and
the gate aborts with rc=141. Switched to a finite-input form:

  GATE_PW="ReleaseGate-$(head -c 32 /dev/urandom | sha256sum | head -c 16)"

`head -c 32` reads a fixed 32 bytes then closes; the downstream
`sha256sum` and `head -c 16` consume bounded output, so no upstream
process gets SIGPIPE. Output is 16 lowercase hex chars from a
cryptographic hash of 32 random bytes — sufficient unique per-run.

Verified end-to-end in a clean `node:24-slim` container:

  case 3 — anet hub start with explicit --password
    ✓ /health 200 + admin-utok.json mode 600
  case 4 — anet login with same --password
    ✓ login OK (matches "Logged in" marker)

Both Gate 1 case 3 + case 4 PASS, rc=0. Locally confirms the new
explicit-password approach works on the actual published preview
tarball.
@s2agi s2agi merged commit c05caf4 into main Jun 28, 2026
1 check failed
@s2agi s2agi deleted the feat/261-release-gate-workflow branch June 28, 2026 00:52

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f90e269dc1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +210 to +213
ver=$(anet --version | tr -d "v\n ")
echo "anet --version → $ver (expected $GATED_VER)"
[ "$ver" = "$GATED_VER" ] \
|| { echo "::error::version mismatch — anet says $ver, gating $GATED_VER"; exit 1; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Parse only the semver from anet --version

On every agent-network tag, this compares $GATED_VER with the full anet --version report. In this tree printVersionReport() prints anet v... plus component/help sections (agent-network/bin/cli.ts:782), so tr -d "v\n " leaves strings like anet2.2.22-preview.4Components..., which never equals 2.2.22-preview.4; Gate 1 fails before the smoke checks can run. Parse the first line/semver instead.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/release.yml Outdated
Comment on lines +234 to +236
pw=$(jq -r .bootstrap_password "$HOME/.anet/server/admin-utok.json" 2>/dev/null \
|| echo anethub)
anet login --hub http://127.0.0.1:9200 --username admin --password "$pw" > /tmp/login.log 2>&1 \

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use an available admin credential for smoke login

After anet hub start with no ANET_HUB_BOOTSTRAP_PASSWORD, the CLI generates a random password and only writes username, user_id, token, and created_at to admin-utok.json (agent-network/bin/cli.ts:3489-3494), not bootstrap_password. jq -r .bootstrap_password therefore returns literal null with status 0, so the anethub fallback is not used and login with password null fails in every fresh smoke container. Use the saved token or set a known bootstrap password for the smoke.

Useful? React with 👍 / 👎.

Comment on lines +367 to +369
install_block=$(echo "$notes_body" | awk '/^## Install/,/^## /' | head -50)
echo "$install_block" | grep -qE "@$GATED_VER\b" \
|| { echo "::error::'## Install' section in $src does not mention @$GATED_VER"; exit 1; }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Do not end the Install section on its own heading

The awk range starts at ^## Install and ends at ^## ; because the heading itself matches both patterns, awk emits only the heading line and turns the range off. Normal notes with @${GATED_VER} on the npm install line below ## Install will be rejected as stale, so Gate 3 fails even for correctly shaped release notes. Start the end match after the first line or use a different section parser.

Useful? React with 👍 / 👎.

- 'v*.*.*'
- 'v*.*.*-preview.*'
- 'agent-network@v*'
- 'agent-node@v*'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Split agent-node tags from the anet smoke path

When an agent-node@v* tag runs, build-tarball packages agent-node, but the later gates still assume the agent-network CLI: Gate 1 invokes anet, and Gate 2 looks for $GATED_PKG/bin/cli.ts. In this repo agent-node/package.json:5-7 exposes only the agent-node binary, so every agent-node release gate fails unrelated checks instead of auditing that package. Remove the agent-node trigger/input or add package-specific smoke/audit paths.

Useful? React with 👍 / 👎.

printf "set timeout 30\n"
printf "spawn anet node create r-node\n"
printf "expect {\n"
printf " -re \"(vendor|runtime|provider).*\\\\?\" { puts \"[wizard] reached first prompt\"; exit 0 }\n"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Escape Tcl brackets in the expect marker

When the wizard prompt is actually matched, this generated expect script runs puts "[wizard] reached first prompt"; in Tcl, square brackets inside double quotes are command substitution, so it tries to call a nonexistent wizard command and exits before printing the success marker. That makes the real-TTY wizard case fail even when the prompt is present; escape the brackets or use brace-quoted text.

Useful? React with 👍 / 👎.

# double-safety net 通信龙 asked for: location can drift without
# silently passing Gate 3.
# workflow_dispatch has no tag → only file source is available.
if [ -z "$notes_body" ] && [ "${{ github.event_name }}" = 'push' ]; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Allow manual runs to read GitHub release notes

This condition disables the gh release view fallback for every workflow_dispatch run, even when the operator selected a tag ref or is rerunning after notes were pasted into the GitHub release. In that valid Source-B-only scenario Gate 3 reports "no release notes" because it checks only repo files; use the selected github.ref_name or synthesize the tag from GATED_VER for manual runs too.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/release.yml Outdated
# case 3 — anet hub start brings up /health and creates admin-utok.json mode 600
export HOME=/tmp/anethome; mkdir -p "$HOME"
nohup anet hub start > /tmp/hub.log 2>&1 &
sleep 5

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Poll for hub bootstrap instead of sleeping

In the fresh smoke container there is no bunx cache, and anet hub start lazy-fetches @sleep2agi/commhub-server before writing admin-utok.json, so a fixed 5-second sleep can hit /health or stat before the server/bootstrap is ready on slower runners. This makes Gate 1 flaky even when the tarball is good; poll /health and the token file with a bounded timeout instead of sleeping once.

Useful? React with 👍 / 👎.

# Try prefix match first, then grep fallback inside that dir.
notes_body=""
src=""
notes_file=$(ls docs/tests/release-v"$GATED_VER"*.md 2>/dev/null | head -1 || true)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Prefer exact stable release-note files

For a stable gate such as GATED_VER=2.2.22, this glob also matches preview files like release-v2.2.22-preview.4.md, and shell ordering puts the hyphenated preview filename before release-v2.2.22.md. If both notes exist, Gate 3 can validate the preview notes instead of the stable notes and fail or pass for the wrong release; choose the exact stable filename before falling back to prefix matches.

Useful? React with 👍 / 👎.

s2agi pushed a commit that referenced this pull request Jun 28, 2026
Issue #1: the public hub has no per-client cap on the SSE
ReadableStream queue. A half-open consumer (TCP drop, client crash
without FIN) keeps its slot in `clients` AND keeps receiving
`controller.enqueue()` calls from both the keepalive timer and every
`pushEvent`. The bytes pile up in the stream's internal queue forever
— clear OOM vector for any commhub-server exposed to the public
internet (see #270 round-2/4).

Issue #2 (CHANGE_REQ in v1, fixed in v2): default web-streams queuing
strategy counts CHUNKS, not bytes — so a `desiredSize < -1_000_000`
threshold meant "one million chunks queued", not "1 MB queued". A
single 1 MiB enqueue bumps desiredSize by -1 under the default
strategy. v2 explicitly constructs the SSE stream with a byte-counting
strategy so the byte-cap actually fires:

    new ReadableStream<Uint8Array>(src, {
      highWaterMark: 0,
      size: (chunk) => chunk.byteLength,
    })

server/src/push.ts
- Byte-counting queuing strategy on the SSE ReadableStream (the
  critical fix for v1's defeated byte-cap).
- `tryEnqueueBytes(client, bytes)` — every enqueue path through one
  guard. Two thresholds:
    a. HARD CEILING: `desiredSize < -MAX_QUEUE_BACKPRESSURE_BYTES`
       (1 MB default) → close immediately. One huge event can't blow
       up a stuck client.
    b. STUCK TIMEOUT: `desiredSize < 0` continuously for
       `STUCK_CLOSE_MS` (60s default) → close.
- `controller.desiredSize === null` treated as no headroom → close.
- `sweepLiveness()` runs every `LIVENESS_SWEEP_MS` (15s default) so
  half-open detection doesn't require event traffic.
- `closeClient(client, reason)` idempotent, clears keepalive timer,
  best-effort close. All log lines include reason.
- All thresholds env-overridable: ANET_SSE_MAX_QUEUE_BYTES,
  ANET_SSE_STUCK_CLOSE_MS, ANET_SSE_KEEPALIVE_MS,
  ANET_SSE_LIVENESS_SWEEP_MS.
- Liveness timer uses `.unref()`.

server/src/push.test.ts (26 tests total — 21 fake-controller + 3
real-ReadableStream + 2 pre-existing rekey)
- Fake-controller tests cover tryEnqueueBytes/sweepLiveness/pushEvent
  branches with mockable desiredSize.
- Real-ReadableStream tests (new in v2):
    * byte-strategy unit-check: enqueue 1 KB → desiredSize drops by
      ~1024 (not by 1). The exact assertion that catches the v1 unit-
      mismatch bug.
    * non-reading consumer past MAX bytes → hard-ceiling close fires.
    * reading consumer drains promptly → no spurious close even after
      many pushes.

Test plan
- `bun test src/push.test.ts` → 26/26 pass
- `COMMHUB_DB=/tmp/test bun test` → 142/143 pass (1 fail is the pre-
  existing missing @modelcontextprotocol/sdk dev dep, unrelated)
- No production deploy. COMMHUB_DB overridden to /tmp/* for tests.
- Per 通信龙 dispatch: PR + 通信牛 review gate, no self-merge.
- v1 caught by 通信牛 review (Bun probe: default-strategy desiredSize
  went 1→0 after 1 MiB chunk; byte-strategy went -1048576). Thanks
  通信牛 for the close call.
s2agi added a commit that referenced this pull request Jun 28, 2026
)

Issue #1: the public hub has no per-client cap on the SSE
ReadableStream queue. A half-open consumer (TCP drop, client crash
without FIN) keeps its slot in `clients` AND keeps receiving
`controller.enqueue()` calls from both the keepalive timer and every
`pushEvent`. The bytes pile up in the stream's internal queue forever
— clear OOM vector for any commhub-server exposed to the public
internet (see #270 round-2/4).

Issue #2 (CHANGE_REQ in v1, fixed in v2): default web-streams queuing
strategy counts CHUNKS, not bytes — so a `desiredSize < -1_000_000`
threshold meant "one million chunks queued", not "1 MB queued". A
single 1 MiB enqueue bumps desiredSize by -1 under the default
strategy. v2 explicitly constructs the SSE stream with a byte-counting
strategy so the byte-cap actually fires:

    new ReadableStream<Uint8Array>(src, {
      highWaterMark: 0,
      size: (chunk) => chunk.byteLength,
    })

server/src/push.ts
- Byte-counting queuing strategy on the SSE ReadableStream (the
  critical fix for v1's defeated byte-cap).
- `tryEnqueueBytes(client, bytes)` — every enqueue path through one
  guard. Two thresholds:
    a. HARD CEILING: `desiredSize < -MAX_QUEUE_BACKPRESSURE_BYTES`
       (1 MB default) → close immediately. One huge event can't blow
       up a stuck client.
    b. STUCK TIMEOUT: `desiredSize < 0` continuously for
       `STUCK_CLOSE_MS` (60s default) → close.
- `controller.desiredSize === null` treated as no headroom → close.
- `sweepLiveness()` runs every `LIVENESS_SWEEP_MS` (15s default) so
  half-open detection doesn't require event traffic.
- `closeClient(client, reason)` idempotent, clears keepalive timer,
  best-effort close. All log lines include reason.
- All thresholds env-overridable: ANET_SSE_MAX_QUEUE_BYTES,
  ANET_SSE_STUCK_CLOSE_MS, ANET_SSE_KEEPALIVE_MS,
  ANET_SSE_LIVENESS_SWEEP_MS.
- Liveness timer uses `.unref()`.

server/src/push.test.ts (26 tests total — 21 fake-controller + 3
real-ReadableStream + 2 pre-existing rekey)
- Fake-controller tests cover tryEnqueueBytes/sweepLiveness/pushEvent
  branches with mockable desiredSize.
- Real-ReadableStream tests (new in v2):
    * byte-strategy unit-check: enqueue 1 KB → desiredSize drops by
      ~1024 (not by 1). The exact assertion that catches the v1 unit-
      mismatch bug.
    * non-reading consumer past MAX bytes → hard-ceiling close fires.
    * reading consumer drains promptly → no spurious close even after
      many pushes.

Test plan
- `bun test src/push.test.ts` → 26/26 pass
- `COMMHUB_DB=/tmp/test bun test` → 142/143 pass (1 fail is the pre-
  existing missing @modelcontextprotocol/sdk dev dep, unrelated)
- No production deploy. COMMHUB_DB overridden to /tmp/* for tests.
- Per 通信龙 dispatch: PR + 通信牛 review gate, no self-merge.
- v1 caught by 通信牛 review (Bun probe: default-strategy desiredSize
  went 1→0 after 1 MiB chunk; byte-strategy went -1048576). Thanks
  通信牛 for the close call.

Co-authored-by: vansin <smartflowaiteam@gmail.com>
s2agi pushed a commit that referenced this pull request Jun 28, 2026
…est-infra hygiene (#277)

Following #266 round-1 audit, server commit 5c7bff2 had tightened the
UTOK write path to require an explicit network_id on every send_task /
send_reply / cancel_task / report_status. tests/docker-e2e.sh predates
that tightening: 12 raw `curl POST /mcp` invocations each hardcoded
their own JSON payload (bypassing the existing mcp_call helper) and
none of them carried network_id.

This refactor unifies the test against one entry point:

  - mcp_call() hoisted from line ~252 to line ~64, immediately after
    the new NETWORK_ID bootstrap. Now reachable from every tool-call
    site that follows.
  - Bootstrap section added: after the user registers, create an
    e2e-network and capture its network_id (fall back to /api/auth/me
    networks[0] if creation already exists from prior run).
  - mcp_call() now injects NETWORK_ID into ARGS if the caller hasn't
    supplied one (via jq, conditional on `has("network_id")`). Callers
    that want a different network can still override explicitly.
  - The 12 raw curls become `mcp_call "TOOL" '{...}'` invocations.
    Net: 51 ins / 57 del / 36-line reduction.

Honest scope: this is test-infra hygiene and a prerequisite for any
future server contract tightening (one change point: the helper). It
does NOT by itself reduce the Base E2E fail count — local verification
confirms:

  before refactor (after #273):  90 pass / 45 fail
  after refactor:                90 pass / 45 fail

What changed under the hood: the immediate `permission_denied:
network_id required` error site is gone (verified by per-tool curl
probe), but the cascade has more layers than the round-1 audit
hypothesized. Investigating one failing test revealed at least two
more roots, both deferred to follow-up:

  Layer 2 — e2e-agent registration: test 8 launches
  `agent-node --alias e2e-agent --runtime codex-sdk` in the background
  without network/token wiring. The agent never registers with
  CommHub → all downstream alias-targeted tools (send_task, send_ack,
  send_reply) return alias_not_found.

  Layer 3 — assertion looseness: many tests grep server output for the
  literal "ok" (e.g. `echo "$RESP" | grep -q "ok"`). Both `{"ok":true}`
  and `{"ok":false,"error":"..."}` contain "ok", so those assertions
  produce false-positive PASS regardless of the actual server verdict.
  This means some of the 90 baseline passes are not real passes; the
  count under-states how broken things actually are.

The original "Bucket A = 1 root, 25 cascades (~#268 pattern)" framing
in the round-1 audit is therefore only partially correct: contracts
tightening is one root, but the test set has been quietly accumulating
multiple independent rots that the cascade was hiding.

Refs: #266 round-1 audit (which this revises), #265 + #269 + #273 + #270

Co-authored-by: vansin <smartflowaiteam@gmail.com>
s2agi added a commit that referenced this pull request Jun 28, 2026
* release(v0.11-preview1): bump 3 packages + release notes + PINNED audit

Versions
========
- @sleep2agi/agent-network    2.2.22-preview.4 → 2.3.0-preview.0
- @sleep2agi/agent-node       2.4.15-preview.2 → 2.5.0-preview.0
- @sleep2agi/commhub-server   0.8.8            → 0.9.0-preview.0

PINNED_SERVER_VERSION (agent-network/bin/cli.ts) bumped to
"0.9.0-preview.0" so `anet hub start` lazy-fetches the matching hub
binary. Without this pin update, hub start silently hangs (#194 class)
because npx resolves to a published version that no longer matches what
the CLI expects.

Release notes
=============
docs/tests/release-v2.3.0-preview.0.md — contains the required
## Install (new user) and ## Upgrade (existing user) sections for the
release-gate Gate 3 check. Lists every change in this preview:
- P0-1 feishu worker supervised re-fork (#263)
- P0-2 hub default credentials randomised + must_change_password (#264)
- Runtime utils — withTimeout + classifyRuntimeResult (#272)
- 429/quota fast-fail + empty-result soft-fail (folded into #272)
- Cross-tenant write blocker (#275)
- SSE memory-leak fix
- B1 telegram allowFrom fail-closed (#276 — lands in preview1 batch)
- B2 .anet/ auto-gitignore (#278 — lands in preview1 batch)
- Slug guard + 6 P0 cleanups (#274)
- Release-gate workflow (#270)
- 5 onboarding robustness fixes
- Feishu quickstart docs

Migration callout: telegram empty/missing allowFrom now fail-closed
(was: allow-all). Recovery is `"allowFrom": ["*"]` in access.json.
Boot-time warn surfaces the new posture on first message.

Verification (pre-publish)
==========================
- Docker clean install: node:22-bookworm-slim + bun, 3 tarballs from
  absolute paths, `anet --version` → 2.3.0-preview.0; component
  resolution shows all 3 versions; `commhub-server` boots and serves
  /health at the new version
- Docker post-publish: `anet hub start` lazy-fetches the published
  commhub-server@0.9.0-preview.0 and serves /health with version
  0.9.0-preview.0; admin token saved at mode 600 with random
  bootstrap password (P0-2 verified live)
- PINNED audit: source / Docker / npm all agree on 0.9.0-preview.0
- npm publish --tag preview from absolute tarball paths (no github
  short-link resolution risk)

dist-tags after publish
=======================
@sleep2agi/agent-network    { latest: 2.2.21,         preview: 2.3.0-preview.0 }
@sleep2agi/agent-node       { latest: 2.4.13,         preview: 2.5.0-preview.0 }
@sleep2agi/commhub-server   { latest: 0.8.8,          preview: 0.9.0-preview.0 }

@latest is unchanged; promotion is a separate manual step after Vincent
sign-off on the preview1 channel.

* docs(release-v2.3.0-preview.0): inline tag literals + Install heading versions for release-gate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants