Skip to content

[P1 #261] runtime utils: withTimeout + classifyRuntimeResult (covers codex/grok/telegram in one pass)#272

Merged
s2agi merged 1 commit into
mainfrom
feat/261-p1-runtime-utils
Jun 28, 2026
Merged

[P1 #261] runtime utils: withTimeout + classifyRuntimeResult (covers codex/grok/telegram in one pass)#272
s2agi merged 1 commit into
mainfrom
feat/261-p1-runtime-utils

Conversation

@s2agi

@s2agi s2agi commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Author

Agent: 通信工程马
Refs: #261 (P1 redirect — supersedes planned P1-② codex-timeout + P1-③ grok-handshake point-fixes)

Why (per redirect dispatch — leverage > point-fixes)

Review round 4 systemic 提炼: don't ship 2-3 separate point-fixes for codex / grok / telegram timeouts and quota detection. Instead build 2 shared utils that every runtime call site consumes — covers all current failure shapes AND inoculates the codebase against the same bug class returning on the next runtime added.

#267 (already merged) is the seed: pure-helper + 38 unit tests + drop-in cli.ts glue. This PR generalises it to a cross-runtime decision surface AND adds the timeout primitive that was missing entirely on codex (zero wall-clock guard pre-fix).

Changes (6 files, +991 / -67)

util ① — agent-node/src/util/timeout.ts (NEW, 158 LOC)

  • withTimeout(fn(signal) => Promise<T>, ms, label, opts?) — single signature for all timeout-bounded async work
  • TimeoutError carries label + timeoutMs so log-side surfaces don't have to re-stringify
  • externalSignal forwarding lets callers participate in caller-supplied cancellation
  • ms <= 0 sentinel preserves the pre-existing CLAUDE_TIMEOUT_MS=0 → disabled behaviour
  • resolveTimeoutMs({envValue, flagValue, defaultMs, minMs, maxMs}) — env > flag > default precedence with clamping (generalises resolveGrokAcpTimeout)

util ② — agent-node/src/runtime/classify-result.ts (NEW, 157 LOC)

  • classifyRuntimeResult({result, usage, totalCostUsd, errorMessage}) → { kind: success | soft-fail-empty | soft-fail-quota | error, reason, hint }
  • Folds [P1-① #261] claude-agent-sdk 429/quota fast-fail + empty-result soft-fail (M3 root-cause) #267's isRateLimitOrQuotaError into one cross-runtime decision shared by claude / codex / grok
  • Strict empty-result rule: only null / undefined / "" flags soft-fail-empty. output_tokens === 0 alone with non-empty text is NOT a flag-trigger — codex's usage field is not reliably reported across providers, and the previous OR-rule false-positived on healthy codex turns
  • Three-zero silent-reject rule: in=0 AND out=0 AND cost=0 (all three required) → soft-fail-empty. Captures the M3 silent-429 episode without penalising codex's optional cost reporting
  • formatClassificationError() — canonical 执行出错: <runtime> ... shape the IM bridge / dashboard parse
  • Re-exports building blocks so the auth-error fast-fail path in cli.ts doesn't have to depend on this wider surface

5 consumers wired

Consumer Pre-fix Post-fix
claude processWithClaude inline AbortController + setTimeout pair per attempt withTimeout(CLAUDE_TIMEOUT_MS, "claude-attempt-N/M") with signal forwarded into options.abortController
codex processWithCodex ZERO wall-clock guard — a wedged turn hung forever withTimeout(CODEX_TIMEOUT_MS, "codex-turn") with SDK TurnOptions.signal forwarded; TimeoutError + 429 fast-fail short-circuit thread rebuild
grok handshake (init / authenticate / session/new|load) 300s shared with session/prompt — wedged handshake hid behind prompt deadline new handshakeTimeoutMs option (default min(45s, timeoutMs)); prompt deadline untouched. Back-compat: cli.ts only forwards an explicit env / flag value; absent both, passes undefined so the runtime default applies (preserves tight GROK_ACP_TIMEOUT_MS semantics for old callers)
telegram getUpdates server-side 30s only — wedged TCP socket pinned the loop forever withTimeout(45s) with AbortSignal propagated into fetch
think() queue (no separate wrap needed) implicitly bounded — every runtime call inside the queue is now under its own withTimeout

Result classifier wired in claude + codex success paths

  • claude: m.result || "任务完成"classifyRuntimeResult({result, usage, totalCostUsd}) — empty-result rebrand is gone
  • codex: outcome.finalResponse || "(无回复)" → same classifier (no more silent reject masquerading as a real reply)

Verification

Unit tests — 55 new pure-helper tests

src/util/timeout.test.ts                — 29 tests
  - happy / timeout / sentinel / external-signal / resolver precedence / clamping / defensive null

src/runtime/classify-result.test.ts     — 26 tests
  - error precedence × 4 / three-zero strict + codex false-positive guard × 5 / empty-result rule × 6 / vendor hint routing × 4 / formatter shape × 7

Full agent-node test suite

bun test src/329 pass / 0 fail (was 274 pre-PR)

Functional smoke — 5 end-to-end cases (all PASS)

  • timeout fired in 102ms with TimeoutError(label)
  • three-zero shape → soft-fail-empty
  • 429 → soft-fail-quota with deepseek dashboard URL
  • format → starts with expected prefix
  • real success → success

Build

bun run build → 0.39 MB cli.js, 106 modules, no warnings.

Knobs added (opt-in; defaults match prior behaviour where possible)

Env Flag Default
CODEX_TIMEOUT_MS codexTimeoutMs 300000 (was: no guard — wedged turn hung forever)
GROK_HANDSHAKE_TIMEOUT_MS grokHandshakeTimeoutMs inherits runtime default min(45s, timeoutMs) when unset (was: 300000 shared with prompt)
TELEGRAM_GETUPDATES_TIMEOUT_MS 45000 (was: none)

CLAUDE_TIMEOUT_MS / GROK_ACP_TIMEOUT_MS / GROK_ACP_DRAIN_MS are unchanged.

v2 changes (from reviewer pass)

  1. grok handshake back-compat: cli.ts no longer forces 45s when no explicit knob is set — passes undefined so the runtime's min(45s, timeoutMs) keeps protecting callers with a tight GROK_ACP_TIMEOUT_MS.
  2. classify-result strict empty-rule: dropped the previous "empty result OR output_tokens=0" check. Empty result OR three-zero triple still flag; codex turns with text + zero output tokens no longer false-positive. Added 1 regression test covering the codex false-positive shape directly.
  3. Source / commit body / PR body scrubbed of internal references that should not leak to public history.

🤖 Generated with Claude Code

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f4f7ef629f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread agent-node/src/cli.ts
// #261 P1 redirect — handshake decoupled from prompt deadline.
// 45s default keeps wedged initialize / authenticate / session
// bounded without affecting long-running prompts.
handshakeTimeoutMs: GROK_HANDSHAKE_TIMEOUT_MS,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Respect shorter Grok prompt timeouts for handshakes

This affects Grok nodes that set GROK_ACP_TIMEOUT_MS or flags.grokAcpTimeoutMs below 45s: this explicit default handshakeTimeoutMs bypasses runGrokAcpTurn's min(45_000, timeoutMs) fallback, so a node that previously bounded initialize/auth/session by (for example) 10s now waits the full 45s unless it also discovers and sets the new handshake knob. Compute the handshake default from the resolved prompt timeout, or only pass this option when the new knob is explicitly configured.

Useful? React with 👍 / 👎.

Comment thread agent-node/src/cli.ts
Comment on lines +3035 to +3039
const res = await withTimeout(
(signal) => fetch(`${tg.apiBase}/getUpdates?offset=${tg.offset}&timeout=30`, { signal }),
TELEGRAM_GETUPDATES_TIMEOUT_MS,
"telegram-getUpdates",
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Keep the Telegram body read under the timeout

This only bounds the fetch() promise, which resolves when response headers arrive; if Telegram or an intermediate connection sends headers and then stalls before the JSON body completes, the timer is cleared and the following res.json() can still pin the polling loop indefinitely. Keep the body read inside the withTimeout callback (return the parsed JSON there) so the abort signal and 45s deadline cover the entire getUpdates request.

Useful? React with 👍 / 👎.

Comment thread agent-node/src/cli.ts
}
}
if (timer) clearTimeout(timer);
}, CLAUDE_TIMEOUT_MS, `claude-attempt-${attempt + 1}/${CLAUDE_MAX_RETRIES + 1}`);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate Claude timeout before passing it through

If CLAUDE_TIMEOUT_MS, --claude-timeout-ms, or the config value is a non-numeric string, parseInt produces NaN; the old CLAUDE_TIMEOUT_MS > 0 guard simply disabled the timer in that case, but withTimeout only treats <= 0 as disabled and passes NaN to setTimeout, which fires immediately. A typo in the timeout knob will therefore make every Claude attempt time out instantly; resolve or validate this value before calling withTimeout.

Useful? React with 👍 / 👎.

Comment thread agent-node/src/cli.ts Outdated
Comment on lines +335 to +339
const GROK_HANDSHAKE_TIMEOUT_MS = resolveTimeoutMs({
envValue: process.env.GROK_HANDSHAKE_TIMEOUT_MS,
flagValue: typeof fileConfig.flags?.grokHandshakeTimeoutMs === "number" ? fileConfig.flags.grokHandshakeTimeoutMs : undefined,
defaultMs: 45_000,
}).valueMs;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Reject zero for the Grok handshake timeout

When a Grok node sets GROK_HANDSHAKE_TIMEOUT_MS=0 or flags.grokHandshakeTimeoutMs: 0, this resolver preserves the zero value, but the Grok path later passes it to GrokAcpClient.request(), whose raw setTimeout(..., 0) makes initialize/auth/session time out on the next tick rather than disabling the deadline. Either reject/clamp zero for this knob like resolveGrokAcpTimeout does, or teach the Grok request path to treat zero as disabled.

Useful? React with 👍 / 👎.

Comment thread agent-node/src/cli.ts
result = `执行出错: vendor 返回空响应 (in=${u.input_tokens || 0} out=${u.output_tokens || 0}) — 疑似 vendor 静默限流/配额. ${hint}`;
} else {
result = m.result;
result = await withTimeout(async (signal) => {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid overlapping Claude retries after timeout

When CLAUDE_TIMEOUT_MS fires and CLAUDE_MAX_RETRIES is nonzero, withTimeout rejects as soon as it aborts the signal, so the retry loop can start the next query() after the backoff even if the previous SDK turn is still unwinding or running a tool. The old code only called ac.abort() and did not leave the for await until the SDK settled, so slow/abort-ignoring tools could not overlap; now the same task can be executed twice and duplicate side effects such as file edits or send_task calls. On timeout, either avoid retrying or wait for the aborted attempt to finish cleanup before starting another attempt.

Useful? React with 👍 / 👎.

…classifier

Replace the planned point-fixes (codex turn timeout / grok handshake decouple) with two shared utils that cover all runtime call sites at once.

util ① — src/util/timeout.ts (158 LOC)
  - withTimeout(fn(signal) => Promise<T>, ms, label, opts?) — one signature
  - TimeoutError carries label + ms for log-side surfacing
  - externalSignal forwarding lets callers participate in caller-supplied cancellation
  - ms<=0 sentinel preserves CLAUDE_TIMEOUT_MS=0 "disabled" behaviour
  - resolveTimeoutMs({envValue,flagValue,defaultMs,minMs,maxMs}) — env>flag>default precedence with clamping, generalises resolveGrokAcpTimeout

util ② — src/runtime/classify-result.ts (157 LOC)
  - classifyRuntimeResult({result,usage,totalCostUsd,errorMessage}) →
      success | soft-fail-empty | soft-fail-quota | error
  - Folds #267 isRateLimitOrQuotaError into one cross-runtime decision
    surface; adds a strict empty-result check and a three-zero
    silent-reject rule (in=0 AND out=0 AND cost=0).
  - Detection is intentionally strict: output_tokens=0 ALONE is not a
    flag-trigger when result text is non-empty, because codex's usage
    field is not reliably reported across providers. Empty result OR
    the full three-zero triple flag; nothing else does.
  - formatClassificationError() — canonical "执行出错: <runtime> ..." shape
    that the IM bridge / dashboard parse to flag failure rows
  - Re-exports building blocks so direct callers (auth-error fast-fail in
    cli.ts) don't bounce through the wider surface

5 consumers wired
  - claude (cli.ts processWithClaude): for-await loop now under
    withTimeout(CLAUDE_TIMEOUT_MS), success branch delegates to
    classifyRuntimeResult so empty-result rebrand is gone
  - codex (cli.ts processWithCodex): ZERO wall-clock pre-fix → now
    withTimeout(CODEX_TIMEOUT_MS, "codex-turn"); SDK TurnOptions.signal
    propagates the abort cleanly. TimeoutError + 429 paths short-circuit
    the thread rebuild loop. Classifier flags silent-reject reply.
  - grok handshake (runtime/grok-build-acp/runtime.ts): new
    handshakeTimeoutMs option (default min(45s, timeoutMs)) for
    initialize+authenticate+session/new|load — decoupled from the 300s
    session/prompt deadline so a wedged handshake fails fast.
    cli.ts only forwards an explicit env / flag value; absent both, it
    passes undefined so the runtime default applies (back-compat for
    callers that set tight GROK_ACP_TIMEOUT_MS).
  - telegram getUpdates (cli.ts): wrapped in withTimeout(45s) with
    AbortSignal propagated into fetch — TCP wedge can no longer pin the
    poll loop forever
  - think() queue: implicitly bounded — each runtime call inside the
    queue is now under its own withTimeout, no separate wrap needed

Verification
  - Unit tests: 29 (timeout: happy/timeout/sentinel/external-signal/
    resolver precedence/clamping/null) + 26 (classify: error precedence
    / three-zero strict / empty-result rule / codex false-positive
    guard / vendor hint routing / formatter shape) = 55 new
  - Full agent-node suite: 274 → 329 pass, 0 fail
  - Functional smoke: 5 cases (timeout fires in 102ms / three-zero
    detected / 429 routes to vendor dashboard / format msg starts with
    expected prefix / real success classifies correctly) — all PASS
  - bun build clean (0.39 MB cli.js, 106 modules)

Why one PR not split (per redirect dispatch)
  Pure helpers + tests + glue together = single review pass. Splitting
  would force the reviewer to re-context twice for what is logically
  one cross-cutting change. #267 followed the same pattern.
@s2agi s2agi force-pushed the feat/261-p1-runtime-utils branch from f4f7ef6 to 712f210 Compare June 28, 2026 01:15
@s2agi s2agi merged commit 1fabb49 into main Jun 28, 2026
1 of 3 checks passed
@s2agi s2agi deleted the feat/261-p1-runtime-utils branch June 28, 2026 01:25
s2agi added a commit that referenced this pull request Jun 28, 2026
* release(v0.11-preview1): bump 3 packages + release notes + PINNED audit

Versions
========
- @sleep2agi/agent-network    2.2.22-preview.4 → 2.3.0-preview.0
- @sleep2agi/agent-node       2.4.15-preview.2 → 2.5.0-preview.0
- @sleep2agi/commhub-server   0.8.8            → 0.9.0-preview.0

PINNED_SERVER_VERSION (agent-network/bin/cli.ts) bumped to
"0.9.0-preview.0" so `anet hub start` lazy-fetches the matching hub
binary. Without this pin update, hub start silently hangs (#194 class)
because npx resolves to a published version that no longer matches what
the CLI expects.

Release notes
=============
docs/tests/release-v2.3.0-preview.0.md — contains the required
## Install (new user) and ## Upgrade (existing user) sections for the
release-gate Gate 3 check. Lists every change in this preview:
- P0-1 feishu worker supervised re-fork (#263)
- P0-2 hub default credentials randomised + must_change_password (#264)
- Runtime utils — withTimeout + classifyRuntimeResult (#272)
- 429/quota fast-fail + empty-result soft-fail (folded into #272)
- Cross-tenant write blocker (#275)
- SSE memory-leak fix
- B1 telegram allowFrom fail-closed (#276 — lands in preview1 batch)
- B2 .anet/ auto-gitignore (#278 — lands in preview1 batch)
- Slug guard + 6 P0 cleanups (#274)
- Release-gate workflow (#270)
- 5 onboarding robustness fixes
- Feishu quickstart docs

Migration callout: telegram empty/missing allowFrom now fail-closed
(was: allow-all). Recovery is `"allowFrom": ["*"]` in access.json.
Boot-time warn surfaces the new posture on first message.

Verification (pre-publish)
==========================
- Docker clean install: node:22-bookworm-slim + bun, 3 tarballs from
  absolute paths, `anet --version` → 2.3.0-preview.0; component
  resolution shows all 3 versions; `commhub-server` boots and serves
  /health at the new version
- Docker post-publish: `anet hub start` lazy-fetches the published
  commhub-server@0.9.0-preview.0 and serves /health with version
  0.9.0-preview.0; admin token saved at mode 600 with random
  bootstrap password (P0-2 verified live)
- PINNED audit: source / Docker / npm all agree on 0.9.0-preview.0
- npm publish --tag preview from absolute tarball paths (no github
  short-link resolution risk)

dist-tags after publish
=======================
@sleep2agi/agent-network    { latest: 2.2.21,         preview: 2.3.0-preview.0 }
@sleep2agi/agent-node       { latest: 2.4.13,         preview: 2.5.0-preview.0 }
@sleep2agi/commhub-server   { latest: 0.8.8,          preview: 0.9.0-preview.0 }

@latest is unchanged; promotion is a separate manual step after Vincent
sign-off on the preview1 channel.

* docs(release-v2.3.0-preview.0): inline tag literals + Install heading versions for release-gate
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant