feat: structured JSON logging + K8s probe docs (COW-994)#87
Conversation
Structured logging:
- Add src/application/helpers/cowLogger.ts — thin JSON logger that emits
one JSON line per call with { time, level, msg, ...fields }
- Migrate all console.log/warn calls in blockHandler.ts and settlement.ts
to cowLog(); each line now carries chainId and block as proper JSON fields
so log aggregators can filter by chain without regex parsing
- Add --log-format json to pnpm start so Ponder's own log lines are also
JSON in production; pnpm dev keeps pretty format for local readability
K8s probe docs (docs/deployment.md):
- Document /healthz (liveness) vs /ready (readiness) distinction
- Add sample K8s manifest snippet with correct probe mapping
- Explain why /ready must NOT be used as a liveness probe
- Add structured logging section explaining --log-format json behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…COW-994) The decodeAbiParameters call (orderUid, sellToken, buyToken, sellAmount, buyAmount) was only used for logging. COW-991 removes this block separately; removing it here too prevents a conflict when both PRs land — otherwise the cowLog fields reference variables that no longer exist, crashing the indexer. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The new /**
* Structured JSON logger for handler code. Outputs one JSON line per call so
* log aggregators (Datadog, CloudWatch, etc.) can filter by chainId, handler,
* block number, or any other field without regex parsing.
*
* Ponder's own log lines are controlled by --log-format (pretty|json) on the
* CLI. These handler lines are always JSON so they remain parseable regardless
* of Ponder's format setting.
*/This should be one line, e.g.: /** Structured JSON logger — emits one ndjson line per call for log aggregators. */The deployment doc already explains the JSON logging setup; the inline comment doesn't need to repeat it. |
Documentation / CI review
K8s probe documentation in One gap: Structured logging section — accurate. The
Overall: Two minor items — add |
|
Review: Structured logging + K8s probe docs (COW-994)
The file opens with an 8-line
The function always calls const log = level === "info" ? console.log : level === "warn" ? console.warn : console.error;
log(JSON.stringify({ time: Date.now(), level, msg, ...fields }));K8s probe documentation accuracy
Correct placement. Migration completeness All 18 One actionable item: consider routing |
… K8s probes (COW-994) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yvesfracari
left a comment
There was a problem hiding this comment.
Anxo mentioned only the blockHandler and settlement logs but there are others that are other files that emit logs and should be using the new logger.
Would actually save this on an ADR.
|
|
||
| | Endpoint | Semantic | Returns 200 when | | ||
| |----------|----------|-----------------| | ||
| | `/healthz` | **Liveness** — is the process alive? | Always, once the server starts | |
There was a problem hiding this comment.
Good point — updated to /health (Ponder's built-in). Simpler and more idiomatic than our /healthz wrapper for a liveness probe.
There was a problem hiding this comment.
Updated to /health — Ponder's built-in liveness endpoint, always 200 once the server starts. Simpler and more idiomatic than our /healthz wrapper for a liveness probe.
| port: 3000 | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 10 | ||
| failureThreshold: 18 # 3-minute window before marking unready |
There was a problem hiding this comment.
time to deploy would be way bigger, right?
There was a problem hiding this comment.
You're right, the comment was misleading. A readiness probe failure does NOT kill the pod — it just removes it from load-balancer rotation. On a cold start the pod will be NotReady for hours; that's expected. Updated the comment and added a paragraph clarifying this so operators aren't surprised.
There was a problem hiding this comment.
You're right — a cold-start pod can be NotReady for hours. Clarified that readiness failure doesn't kill the pod (just removes it from rotation) and that hours-long NotReady is expected on a cold start. Also removed the specific failureThreshold/periodSeconds values per your follow-up comment — those depend on the cluster's SLOs.
…tion Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…reep - Replace manual DiscreteStatus union with (typeof discreteOrderStatusEnum.enumValues)[number] so the type stays in sync with the schema automatically - Delete cowLogger.ts — structured logging belongs in COW-994 (PR #87), not in this fix PR; replace the one cowLog call with a plain console.log Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…readiness semantics - Rename cowLogger.ts to logger.ts and cowLog() to log() — the module name already provides context; the cow prefix was misleading - Rename `for (const log of receipt.logs)` loop variable to txLog to avoid shadowing the newly imported log function - K8s liveness probe: use /health (Ponder built-in) instead of /healthz - readinessProbe: fix misleading failureThreshold comment; add paragraph clarifying that NotReady does not kill the pod — cold-start sync takes hours Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ostgreSQL section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
yvesfracari
left a comment
There was a problem hiding this comment.
Please check my latest review comment about not using the new logger on other parts of the code. Also, this log should also live on a ADR so it is used later as well. Maybe it even worth to create a lint/rule that no console.log is used or something like this.
| livenessProbe: | ||
| httpGet: | ||
| path: /health | ||
| port: 3000 | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 30 | ||
| failureThreshold: 3 | ||
| readinessProbe: | ||
| httpGet: | ||
| path: /ready | ||
| port: 3000 | ||
| initialDelaySeconds: 30 | ||
| periodSeconds: 30 | ||
| failureThreshold: 3 # marks pod unready (not killed) — cold-start sync takes hours |
There was a problem hiding this comment.
I know that this is just an example but I am not sure if all values here make sense for the app context. I would prefer to remove it since the k8s operator should know how to set it properly, we just have to give him the app information so he can decide it.
There was a problem hiding this comment.
Removed all the specific timing values (periodSeconds, failureThreshold, initialDelaySeconds) — the YAML now just shows the path and port, and a note explains that the operator picks the timing based on their cluster SLOs.
…W-994) Extends the log() migration from blockHandler.ts and settlement.ts to all remaining call sites in composableCow.ts, setup.ts, orderbookClient.ts, and uidPrecompute.ts. No console.log/warn/error remain in src/application/ outside logger.ts itself. Documents the scope and usage convention in deployment.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
Migrated all remaining
No Also expanded the Structured Logging section in |
…-994) Keep only path and port — the operator decides periodSeconds, failureThreshold, and initialDelaySeconds based on their cluster SLOs. Add a note explaining this. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| **Convention:** all code under `src/application/` uses `log()` from `src/application/helpers/logger.ts` instead of `console.log/warn/error` directly. The `src/api/` layer (Hono routes) is exempt — Hono handles its own logging. Example: | ||
|
|
||
| ```ts | ||
| import { log } from "../helpers/logger"; | ||
|
|
||
| log("info", "c2:confirmed", { chainId, orderUid, block: String(event.block.number) }); | ||
| log("warn", "c2:timeout", { chainId, block: String(event.block.number) }); | ||
| ``` | ||
|
|
||
| `warn` and `error` level messages go to `stderr`; `info` goes to `stdout`. The `level` field in the JSON payload is what log aggregators use to route and alert. |
There was a problem hiding this comment.
I don't think this part is necessary actually
There was a problem hiding this comment.
A lint or pre-commit routine would be more effective
Grant Review Finding
[F19] Document K8s liveness/readiness probes and add JSON structured logging with
chainId— MAJOR (production observability / handoff) (COW-994)Summary
Structured JSON logging:
src/application/helpers/cowLogger.ts— thincowLog(level, msg, fields)utility that emits one newline-delimited JSON line per callconsole.log/console.warncalls inblockHandler.ts(18 call sites across C1–C5) andsettlement.tsmigrated tocowLog(); each line now carrieschainIdandblockas proper JSON fields, enabling log aggregator filtering by chain without regexpnpm startupdated with--log-format jsonso Ponder's own internal log lines are also JSON in production;pnpm devkeeps pretty format for local readabilityK8s probe documentation (
docs/deployment.md):/healthz(liveness — always 200 once up) vs/ready(readiness — 200 only when synced) distinctionlivenessProbe/readinessProbemapping/readymust NOT be used as liveness probe (would kill indexing pod before it finishes cold-start sync)--log-format jsonbehaviorTest plan
pnpm typecheck— passespnpm test— 19/19 passingconsole.log/console.warncalls remain inblockHandler.tsorsettlement.tscowLogcall includeschainIdandblockfieldspnpm startscript includes--log-format json🤖 Generated with Claude Code