Skip to content

feat(observability): add 12 structured Pino event families with Zod-strict schemas#251

Open
chrisleekr wants to merge 3 commits into
mainfrom
feat/observability-event-families
Open

feat(observability): add 12 structured Pino event families with Zod-strict schemas#251
chrisleekr wants to merge 3 commits into
mainfrom
feat/observability-event-families

Conversation

@chrisleekr

@chrisleekr chrisleekr commented Jun 25, 2026

Copy link
Copy Markdown
Owner

Summary

Bundles 12 area: observability issues (closes #216 #217 #218 #223 #228 #233 #234 #235 #236 #237 #243 #247) into one change. Each family follows the repo's established pattern from PRs #242/#244: a new *-log-fields.ts module holds a Zod-strict discriminated-union schema pinning every field name and type, co-located with a drift-prevention test that validates a representative payload against the schema. Emitters log plain log.info/log.warn objects; the schema is the contract, not a runtime validator on the hot path.

Beyond logging, three behavioral changes ship in this PR: (1) #236 routes six installation-token mint sites through a new mintInstallationToken helper (src/orchestrator/installation-token.ts) that probes the @octokit/auth-app cache via a scoped hook.before to report an exact cache_hit boolean and per-call duration_ms; (2) #218 removes the unused onConnected/onDisconnected callbacks from WsClientOptions, replacing them with structured daemon.connection.* log lines emitted inside the ws-client itself; (3) #223 replaces the separate hook.after + hook.error pair on octokit with a single hook.wrap timing closure so duration_ms is threaded onto every line and a github.api.slow warn fires when the call exceeds GITHUB_API_SLOW_REQUEST_MS (default 3000 ms, new env var).

Security hardening included (surfaced by the pre-merge review panel): errSerializer in src/utils/log-redaction.ts now strips the event, payload, and signature carriers that @octokit/webhooks attaches to its errors, preventing the raw webhook body and HMAC signature from appearing in err: log lines. A regression test is co-located in test/utils/log-redaction.test.ts.

Diagram

flowchart TD
  classDef old fill:#7f1d1d,color:#ffffff
  classDef new fill:#14532d,color:#ffffff

  subgraph BEFORE
    B_octokit["octokit hook.after<br/>plus hook.error<br/>no duration_ms<br/>no slow alert"]:::old
    B_ws["DaemonWsClient<br/>onConnected / onDisconnected<br/>callbacks, no structured events"]:::old
    B_mint["6 inline mint sites<br/>no cache_hit, no duration_ms"]:::old
    B_err["errSerializer<br/>no octokit event/payload<br/>/signature strip"]:::old
  end

  subgraph AFTER
    A_octokit["octokit hook.wrap<br/>duration_ms on every line<br/>github.api.slow warn"]:::new
    A_ws["DaemonWsClient<br/>daemon.connection.* events<br/>attempt plus downtime_ms"]:::new
    A_mint["mintInstallationToken<br/>cache_hit probe via hook.before<br/>github.app.token.mint.*"]:::new
    A_err["errSerializer<br/>strips event/payload/signature<br/>regression-tested"]:::new
    A_families["10 new *-log-fields.ts modules<br/>structured_output / circuit<br/>digest / http / scheduler<br/>workflow_run / workspace<br/>agent.tool / k8s.spawn<br/>daemon.connection"]:::new
  end

  B_octokit --> A_octokit
  B_ws --> A_ws
  B_mint --> A_mint
  B_err --> A_err
Loading

Changes

New log-fields modules (each with co-located Zod-strict schema + drift-prevention test)

Behavioral / interface changes

  • src/orchestrator/installation-token.ts (new): mintInstallationToken centralises the 6 inline mint sites; scoped app.octokit.hook.before("request", probe) detects a cache miss exactly, with a finally guard that always removes the hook
  • src/daemon/ws-client.ts: removed onConnected/onDisconnected from WsClientOptions; lifecycle observability is now self-contained via daemon.connection.* events
  • src/utils/octokit-observability.ts: replaced hook.after + hook.error with a single hook.wrap closure; new threshold-injected slowRequestFields() pure helper; new GITHUB_API_SLOW_REQUEST_MS config key (default 3000 ms) in src/config.ts
  • src/utils/log-redaction.ts: errSerializer now calls stripSecretCarriers(out) to drop event/payload/signature (and recurse aggregateErrors) before the error is logged

Docs

  • docs/operate/observability.md: 12 new event-family sections with field tables + alerts
  • docs/operate/configuration.md: GITHUB_API_SLOW_REQUEST_MS env var
  • docs/operate/runbooks/daemon-fleet.md: k8s.spawn.* diagnostics section

Test plan

  • Each new *-log-fields.ts module has a co-located drift-prevention test (pins event strings, accepts valid records, rejects drift)
  • log-redaction.test.ts: errSerializer regression test for the octokit carrier strip (bare + AggregateError)
  • octokit-observability.test.ts: slowRequestFields + hook.wrap duration; ws-client.test.ts: callback removal + daemon.connection.*; executor.test.ts: agent.tool.* pairing (no input/output bodies logged)
  • typecheck, lint (0 errors), check:test-globs, check:no-destructive, check:no-em-dashes pass; DB-gated suites run in CI; check:docs-versions, check:docs-citations, mkdocs build --strict pass

Closes #216, closes #217, closes #218, closes #223, closes #228, closes #233, closes #234, closes #235, closes #236, closes #237, closes #243, closes #247

🤖 Generated with Claude Code

…trict schemas

Bundles 12 area:observability issues into one change. Each family ships a
Zod-strict *-log-fields.ts module with a co-located drift-prevention test,
following the #242/#244 pattern: structured_output, circuit, digest,
github.api.slow, github.app.token.mint, http, scheduler.scan, workflow.run,
workspace, agent.tool, daemon.connection, k8s.spawn.

Behavioral changes beyond logging: #236 routes 6 mint sites through a new
mintInstallationToken helper (exact cache_hit via scoped hook.before); #218
removes unused onConnected/onDisconnected WsClientOptions callbacks; #223
replaces octokit hook.after/hook.error with one hook.wrap timing closure
(new GITHUB_API_SLOW_REQUEST_MS env, default 3000).

Security: errSerializer now strips octokit event/payload/signature carriers so
the raw webhook body + HMAC signature cannot leak through err: log lines
(regression-tested).

Docs: 12 observability.md sections + alerts, configuration.md env var,
daemon-fleet.md k8s.spawn diagnostics.

Closes #216 #217 #218 #223 #228 #233 #234 #235 #236 #237 #243 #247

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Wjb51MuzTGGiJ2ggGX2DhH
Copilot AI review requested due to automatic review settings June 25, 2026 11:34
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@chrisleekr, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 43 minutes and 35 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f71858e6-aeba-44d5-b8a9-2a1829384aa2

📥 Commits

Reviewing files that changed from the base of the PR and between 776793e and 25cef51.

📒 Files selected for processing (61)
  • docs/operate/configuration.md
  • docs/operate/observability.md
  • docs/operate/runbooks/daemon-fleet.md
  • docs/use/workflows/index.md
  • src/ai/structured-output-log-fields.ts
  • src/ai/structured-output.ts
  • src/app-log-fields.ts
  • src/app.ts
  • src/config.ts
  • src/core/checkout.ts
  • src/core/executor.ts
  • src/core/log-fields.ts
  • src/core/workspace-events.test.ts
  • src/core/workspace-events.ts
  • src/core/workspace-sweep.ts
  • src/daemon/job-executor.ts
  • src/daemon/log-fields.ts
  • src/daemon/main.ts
  • src/daemon/workflow-executor.ts
  • src/daemon/ws-client.ts
  • src/k8s/ephemeral-daemon-spawner.ts
  • src/orchestrator/connection-handler.ts
  • src/orchestrator/ephemeral-daemon-scaler.ts
  • src/orchestrator/installation-token.ts
  • src/orchestrator/k8s-spawn-log-fields.ts
  • src/orchestrator/log-fields.ts
  • src/orchestrator/triage.ts
  • src/scheduler/log-fields.test.ts
  • src/scheduler/log-fields.ts
  • src/scheduler/scheduler.ts
  • src/utils/circuit-breaker-log-fields.ts
  • src/utils/circuit-breaker.ts
  • src/utils/github-output-guard.ts
  • src/utils/llm-output-scanner.ts
  • src/utils/log-redaction.ts
  • src/utils/octokit-observability.ts
  • src/webhook/router.ts
  • src/workflows/digest-log-fields.ts
  • src/workflows/discussion-digest.ts
  • src/workflows/dispatcher.ts
  • src/workflows/handlers/triage.ts
  • src/workflows/intent-classifier.ts
  • src/workflows/log-fields.ts
  • src/workflows/orchestrator.ts
  • src/workflows/ship/iteration.ts
  • src/workflows/ship/nl-classifier.ts
  • src/workflows/ship/scoped/chat-thread.ts
  • test/ai/structured-output-log-fields.test.ts
  • test/ai/structured-output.test.ts
  • test/app-log-fields.test.ts
  • test/core/executor.test.ts
  • test/core/log-fields.test.ts
  • test/daemon/log-fields.test.ts
  • test/daemon/ws-client.test.ts
  • test/orchestrator/k8s-spawn-log-fields.test.ts
  • test/orchestrator/log-fields.test.ts
  • test/utils/circuit-breaker-log-fields.test.ts
  • test/utils/log-redaction.test.ts
  • test/utils/octokit-observability.test.ts
  • test/workflows/digest-log-fields.test.ts
  • test/workflows/log-fields.test.ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the codebase’s structured observability surface by introducing multiple new event: "<family>.<action>" log families with strict Zod schemas and drift-prevention tests, and it threads those events through key runtime paths (scheduler scans, daemon WS lifecycle, circuit breaker, octokit requests, workspace lifecycle, token minting, etc.).

Changes:

  • Add new structured log-field modules (Zod-strict schemas + tests) for multiple event families (e.g., digest.*, circuit.*, scheduler.scan.*, k8s.spawn.*, daemon.connection.*, structured_output.*, http.*, agent.tool.*).
  • Introduce/extend behavioral helpers for observability: centralized installation-token minting (mintInstallationToken), octokit request timing via hook.wrap (duration_ms + github.api.slow), scheduler scan bracketing + counters, per-tool-call agent events, and workspace lifecycle events.
  • Harden log redaction by stripping @octokit/webhooks error “carrier” fields (event/payload/signature) from serialized errors.

Reviewed changes

Copilot reviewed 60 out of 60 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/ai/structured-output.ts Adds optional per-call emission of structured_output.* events from the parse chokepoint.
src/ai/structured-output-log-fields.ts Defines strict schema + event constants for structured_output.*.
src/app-log-fields.ts Defines strict schema + event constants for inbound http.* boundary events.
src/config.ts Adds githubApiSlowRequestMs / GITHUB_API_SLOW_REQUEST_MS config.
src/core/checkout.ts Emits workspace.* clone/base-branch events with redacted errors.
src/core/executor.ts Adds per-tool-call agent.tool.* emission via tool_use/tool_result pairing.
src/core/log-fields.ts Adds strict schemas/constants for agent.tool.* and pending-tool metadata type.
src/core/workspace-events.ts Defines strict schema + event constants for workspace.* lifecycle.
src/core/workspace-events.test.ts Drift-prevention tests for workspace.* schema.
src/core/workspace-sweep.ts Adds optional structured logging for cleanup success/failure inside removeWorkspaceTripleSync.
src/daemon/job-executor.ts Emits workspace cleanup events for exit/cancel; adds daemon.job.cancelled line.
src/daemon/log-fields.ts Defines strict schema + event constants for daemon.connection.*.
src/daemon/main.ts Removes ws-client connect/disconnect callbacks; relies on daemon.connection.*.
src/daemon/ws-client.ts Emits daemon.connection.* events with attempt/backoff/duration fields; redacts error text.
src/daemon/workflow-executor.ts Adds workflow.run.* logging helpers at lifecycle transitions (running/succeeded/failed/etc.).
src/k8s/ephemeral-daemon-spawner.ts Emits k8s.spawn.* attempted/succeeded/failed events with api_call_ms + kind.
src/orchestrator/connection-handler.ts Routes multiple token-mint sites through mintInstallationToken.
src/orchestrator/ephemeral-daemon-scaler.ts Emits k8s.spawn.decision_skipped with decision-time signals when provided an obs handle.
src/orchestrator/installation-token.ts New helper to mint tokens with exact cache_hit probing + duration_ms.
src/orchestrator/k8s-spawn-log-fields.ts Defines strict schema + event constants for k8s.spawn.*.
src/orchestrator/log-fields.ts Extends orchestrator log-field schemas with github.app.token.mint.* schema/keys.
src/orchestrator/triage.ts Emits new circuit.* events via breaker onEvent; threads latency_tripped.
src/scheduler/log-fields.ts Defines strict schema + event constants and counters for scheduler.scan.*.
src/scheduler/log-fields.test.ts Drift-prevention tests for scheduler.scan.* schema/counters.
src/scheduler/scheduler.ts Adds scheduler.scan.* bracketing + counters and uses mintInstallationToken in runAction.
src/utils/circuit-breaker.ts Adds onEvent and richer execution result (latencyTripped), plus skip/failure events.
src/utils/circuit-breaker-log-fields.ts Defines strict schema + event constants for circuit.*.
src/utils/github-output-guard.ts Threads optional logger into LLM output scanner options.
src/utils/llm-output-scanner.ts Allows passing logger into parseStructuredResponse for structured_output.* coverage.
src/utils/log-redaction.ts Adds stripping of octokit webhook error carriers (event/payload/signature) in errSerializer.
src/utils/octokit-observability.ts Switches to hook.wrap to add duration_ms to github.api.* and emit github.api.slow.
src/webhook/router.ts Threads obs into decideEphemeralSpawn and emits structured k8s.spawn.failed on config-incomplete path.
src/workflows/digest-log-fields.ts Defines strict schema + event constants for digest.*.
src/workflows/dispatcher.ts Adds workflow.run.* logging at queue/refusal/enqueue-failure points.
src/workflows/handlers/triage.ts Routes verdict parsing through parseStructuredResponse(..., {site, log}).
src/workflows/intent-classifier.ts Routes parsing through parseStructuredResponse(..., {site, log}).
src/workflows/orchestrator.ts Emits workflow.run.enqueue_failed via helper on post-commit enqueue compensation path.
src/workflows/ship/iteration.ts Logs workflow.run.queued after inserting queued ship runs.
src/workflows/ship/nl-classifier.ts Routes parsing through parseStructuredResponse(..., {site, log}).
src/workflows/ship/scoped/chat-thread.ts Routes parsing through parseStructuredResponse(..., {site, log}) and removes bespoke tolerant logs.
test/ai/structured-output.test.ts Adds tests asserting structured_output.* emission behavior and schema validity.
test/ai/structured-output-log-fields.test.ts Drift-prevention tests for structured_output.* schema.
test/app-log-fields.test.ts Drift-prevention tests for http.* schema/constants.
test/core/executor.test.ts Adds tests for per-tool-call agent.tool.* pairing and timeout draining.
test/core/log-fields.test.ts Adds tests for agent.tool.* schemas/constants.
test/daemon/log-fields.test.ts Drift-prevention tests for daemon.connection.* schema/constants.
test/daemon/ws-client.test.ts Updates ws-client tests for removal of callbacks.
test/orchestrator/k8s-spawn-log-fields.test.ts Drift-prevention tests for k8s.spawn.* schema/constants.
test/orchestrator/log-fields.test.ts Adds tests for github.app.token.mint.* schema/constants.
test/utils/circuit-breaker-log-fields.test.ts Drift-prevention tests for circuit.* schema/constants.
test/utils/log-redaction.test.ts Adds regression tests for stripping webhook error carriers via errSerializer.
test/utils/octokit-observability.test.ts Extends tests for duration_ms threading and github.api.slow.
test/workflows/digest-log-fields.test.ts Drift-prevention tests for digest.* schema/constants.
docs/operate/configuration.md Documents GITHUB_API_SLOW_REQUEST_MS.
docs/operate/runbooks/daemon-fleet.md Adds operator guidance for diagnosing k8s.spawn.* failures by kind.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/daemon/job-executor.ts
Comment thread src/daemon/ws-client.ts
Comment thread src/daemon/ws-client.ts Outdated
Comment thread src/daemon/job-executor.ts Outdated
chrisleekr and others added 2 commits June 25, 2026 21:41
…lity

Satisfies the FR-019 docs-sync guard for the src/workflows/ logging changes
in this PR and points readers to the new event-family contracts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Wjb51MuzTGGiJ2ggGX2DhH
- ws-client: omit `message` from daemon.connection.error when redaction yields
  an empty string (schema pins message: z.string().min(1).optional()).
- job-executor: derive workspace.cleanup.exit `jobIds` from the same
  `workspaceJobs` filter as `count`, so length and ids agree (scoped jobs with
  no workDir are excluded).
- pin the daemon.job.cancelled event in DaemonJobCancelledLogSchema with a
  co-located drift test, matching the schema-per-event convention of this PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01Wjb51MuzTGGiJ2ggGX2DhH
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment