Skip to content

sendBrokerShutdown has no timeout — SessionEnd hook can hang indefinitely #288

@SmidaOps

Description

@SmidaOps

Summary

sendBrokerShutdown in plugins/codex/scripts/lib/broker-lifecycle.mjs awaits a Promise that resolves only on socket data, error, or close events. If the broker accepts the connection but stops responding mid-RPC (no data, no error, no close), the Promise never resolves and the SessionEnd hook hangs indefinitely.

Affected code

plugins/codex/scripts/lib/broker-lifecycle.mjs:43-57 (v1.0.3)

export async function sendBrokerShutdown(endpoint) {
  await new Promise((resolve) => {
    const socket = connectToEndpoint(endpoint);
    socket.setEncoding("utf8");
    socket.on("connect", () => {
      socket.write(`${JSON.stringify({ id: 1, method: "broker/shutdown", params: {} })}\n`);
    });
    socket.on("data", () => {
      socket.end();
      resolve();
    });
    socket.on("error", resolve);
    socket.on("close", resolve);
  });
}

Asymmetry with startup path

waitForBrokerEndpoint in the same file (line 24) explicitly takes timeoutMs = 2000 and bounds its wait loop. Startup path is timed; shutdown path isn't. Same module — looks like an oversight rather than intent.

Observed symptoms

Running Claude Code in unattended (cron/launchd) invocations on macOS:

  • Multiple sessions hung at SessionEnd after content rendered cleanly. One held its lock 19.5 hours before something external (a new Claude Code session creating a fresh broker socket) unwedged it.
  • In a separate session the harness's hook timeout fired with Hook cancelled and exit 1 — content was already delivered, but the misleading exit code surfaced as a failure in launchd telemetry.
  • Correlation suggests macOS DarkWake during long-running SSE streams wedges the broker. We mitigated with caffeinate -ims on the wrapper, but the shutdown await still hangs when a wedge does happen.

Proposed fix

Mirror the timeout pattern used in waitForBrokerEndpoint:

export async function sendBrokerShutdown(endpoint, timeoutMs = 5000) {
  await new Promise((resolve) => {
    const socket = connectToEndpoint(endpoint);
    let settled = false;
    const finish = () => {
      if (settled) return;
      settled = true;
      clearTimeout(timer);
      try { socket.destroy(); } catch {}
      resolve();
    };
    const timer = setTimeout(finish, timeoutMs);
    socket.setEncoding("utf8");
    socket.on("connect", () => {
      socket.write(`${JSON.stringify({ id: 1, method: "broker/shutdown", params: {} })}\n`);
    });
    socket.on("data", finish);
    socket.on("error", finish);
    socket.on("close", finish);
  });
}

5s is generous for a local socket RPC and can be tuned. Caller in session-lifecycle-hook.mjs:99 requires no change.

Environment

  • Plugin: @openai/codex-plugin-cc v1.0.3
  • Platform: macOS Darwin 25.4 (arm64)
  • Node: >= 18.18 (per engines)
  • Trigger: launchd-scheduled claude --agent ... -p ... --permission-mode bypassPermissions heartbeat invocations

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions