release: To Prod by suisuss · Pull Request #1252 · KeeperHub/keeperhub

suisuss · 2026-05-14T05:34:12Z

No description provided.

…outing regressions Replace ENCODING_ERROR_RE with DISPATCH_FAILURE_RE which also rejects the "missing revert data" / data="0x" pattern -- the KEEP-456 failure mode where the previous loose guard silently tolerated calls into a non-existent SuperToken proxy. - create-pool: convert to expect(msg).toBe("") (verified simulates on Sepolia) - connect-pool/wrap/unwrap/create-flow/update-flow/delete-flow/ distribute/distribute-flow/update-member-units: tighten to DISPATCH_FAILURE_RE (their inputs cannot satisfy positive simulation without on-chain state) - Add 5 always-on regex regression tests so CI catches future "simplifications" of DISPATCH_FAILURE_RE that would silently restore the KEEP-456 hole Note: the ticket's suggested CALL_EXCEPTION.*data="0x" pattern is order-dependent and does not match real ethers v6 error strings (data= comes before code=CALL_EXCEPTION). Use data="0x" alone -- precise because real reverts have hex content between the quotes.

…esses against canonical metadata - Append BNB Smart Chain (56) and Avalanche C-Chain (43114) to SUPERFLUID_CHAIN_IDS. Both use the canonical CFAv1/GDAv1 forwarder addresses and have existing chain DB rows in seed-chains.ts. - Rewrite the SUPERFLUID_CHAIN_IDS docblock: drop the inaccurate "every chain Superfluid supports" claim, document the Avalanche Fuji (43113) CFA deviation, and point at the new cross-check test as the regression gate. - Add a vitest cross-check that, for every chain in SUPERFLUID_CHAIN_IDS, asserts the pinned CFA/GDA address equals @superfluid-finance/metadata's contractsV1.cfaV1Forwarder / gdaV1Forwarder. Fuji-style deviant chains now fail at PR time instead of silently mis-routing. - Add @superfluid-finance/metadata as a devDependency. - Rename two test descriptions hardcoded to "six chains".

…nt-subscription failure When the upstream WSS enters a half-open state — TCP/ping-pong alive, getBlockNumber works, eth_subscribe accepted, but no newHeads delivered — the existing connect-level fallback never fires because primary's getBlockNumber keeps succeeding. KEEP-544's noBlockTimer correctly reconnects every 300s, but every reconnect goes back to the same broken URL. Track consecutive BLOCK_ADVANCE_TIMEOUT_MS firings on the current URL as silentReconnects. Reset on a real height advance in processBlockRange, on start, and on stop. In reconnectWithBackoff, before the next connect attempt, call maybeFlipUrlPreference() to swap currentUrlIndex when silentReconnects >= SILENT_FAILOVER_THRESHOLD (default 2, env-tunable). connect() now honours currentUrlIndex by reordering its candidate list so the preferred URL is tried first while keeping primary/fallback labels stable in logs. The existing primaryProbeTimer already covers swapping back once primary recovers. Reset the counter on flip so the new URL gets a full threshold of its own before flipping back; if both are silent, the monitor alternates which surfaces the real failure (both upstreams down) to operators via the existing log signal.

…ent metadata The on-IPFS agent card and the .well-known/agent-card.json A2A endpoint both shipped with a generic description ("Web3 workflow automation platform...") and no inline skills, no keywords, no structured payment fields. Result: keyword search against 8004scan for "keeperhub" returned zero results; downstream indexers (agentcash search, x402scan) have no searchable surface to match against. Changes: - Description rewritten to match the D-066 wedge wording: execution layer, x402 on Base, MPP on Tempo, Managed DeFi, onchain audit trail. - Inline skills array (8 entries) on both surfaces with matching IDs: workflow_discovery, workflow_invocation, ai_workflow_generation, protocol_actions, onchain_execution, templates, execution_monitoring, reputation_feedback (new — completes the ERC-8004 feedback read/write symmetry). - Per-skill descriptions and tags include the searchable terms agents query for (x402, mpp, usdc, base, tempo, aave, safe, defi, swap, bridge, stake, contract, etc.). - keywords array (30 tags) added to data/agent-registry.json. - Structured payment block (x402 on Base + MPP on Tempo with payTo addresses) added to data/agent-registry.json. - New service entry pointing at /api/openapi.json (forthcoming) and https://docs.keeperhub.com. After merge, run scripts/pin-agent-card.ts to pin the new content to Pinata and scripts/update-agent-uri.ts to update the onchain tokenURI. Pairs with techops-services/infrastructure PR #194 (Cloudflare bypass for AI-vendor UAs on .well-known paths) so the live A2A endpoint is reachable from Claude/ChatGPT/Perplexity browse-on-behalf-of-user.

The previous payment.payTo addresses implied 100% of per-call USDC flows to the platform wallet. In reality KeeperHub is a multi-creator marketplace: each listed workflow advertises its own payTo in the 402 response and at /api/mcp/workflows/<slug>, and settlement splits 70% creator / 30% platform per lib/earnings/queries.ts. Replaces the misleading payment.payTo with: - "marketplace" block describing the multi-creator model, the platformFeePercent (30), creatorSharePercent (70), and the platformFeeRecipient address per chain. - "payment" block keeping the protocol/network/asset declarations (x402 on Base USDC, MPP on Tempo USDC.e) but dropping the overclaiming payTo — that's per-workflow runtime data, not agent-level identity data. platformFeePercent is hardcoded today because the agent card is static IPFS content; if the platform-fee env var changes, re-pin.

…overability feat: KEEP-554 enrich ERC-8004 agent card with skills, keywords, payment metadata

fix: update Aave V3 data provider addresses

…n in the metadata cross-check Three assertions inside the canonical metadata cross-check block: - Fuji (43113) is intentionally absent from SUPERFLUID_CHAIN_IDS. - Fuji's CFAv1Forwarder address in @superfluid-finance/metadata is NOT equal to CFA_FORWARDER_ADDRESS -- this captures the documented deviation as a fact under test. - Fuji's GDAv1Forwarder IS canonical -- documents the asymmetric shape of the deviation (CFA-only) and catches the rare case of GDA also drifting on Fuji. Together these pin both the trap (CFA is non-canonical upstream) and our response to it (Fuji is excluded from the chain list). If Superfluid ever redeploys Fuji's CFA at the canonical address, the "is not canonical" assertion fails -- and that failure is positive: the trap is gone and Fuji can safely join SUPERFLUID_CHAIN_IDS.

…n-coverage feat(superfluid): KEEP-463 expand chain coverage and cross-check addresses against canonical metadata

The previous wording suggested the blocker was finding a Superfluid pool address. The actual blocker is that TEST_ADDRESS has no deployed code at all -- the GDA dispatches into the pool address as a contract call, and any deployed contract implementing the pool interface would suffice.

… tests The previous regex shape tests pinned hardcoded ethers v6.16.0 sample strings. A future ethers upgrade that changed error formatting would have silently invalidated DISPATCH_FAILURE_RE without failing these tests -- the on-chain block would lose its KEEP-456 guard and the only signal would have been a real routing bug slipping into prod. Replace each sample with a call to `ethers.makeError(...)` that constructs the same shape we need to match. The assertion target is now String(error), which mirrors what estimateGasError actually returns, so the test is self-updating across ethers upgrades: if ethers changes wording, these tests fail at upgrade time and reviewers update the regex alongside the dependency bump.

…ntext Tighten the bare `data="0x"` alternation to `,\s*data="0x"`, which requires the field-separator comma that precedes top-level CALL_EXCEPTION params. Nested JSON uses `"data": value` (with colon- space), so the new anchor distinguishes top-level from nested without false-positiving if a future ethers version (or quirky calldata) produced a transaction object whose data was literally "0x" while the top-level revert data was populated. Add a regression test that exercises exactly that hypothetical: a revert with populated top-level data but empty nested transaction.data must NOT match -- it's a business revert, not a routing bug.

Directly satisfies acceptance criterion #2 -- "no test passes when its action is silently re-routed to a non-existent contract method" -- without relying on the regex-shape proxy. Takes a currently-passing action (grant-flow-operator, which routes correctly to the CFAv1Forwarder) and overrides its destination to SEPOLIA_FUSDC: a real ERC20 contract that exists on chain but has no Superfluid methods. Verified empirically: this dispatch produces `Error: missing revert data ... code=CALL_EXCEPTION` -- exactly the KEEP-456 shape. The new test asserts DISPATCH_FAILURE_RE matches it. If anyone weakens or removes DISPATCH_FAILURE_RE in a way that lets KEEP-456 through, this test fails against the live RPC. The regex-shape tests (run in CI) guard the regex against silent strengthening; this test (RPC-gated) guards against silent weakening by exercising a real misroute end-to-end.

…nsaction CI typecheck (TS2741) caught three transaction objects in the regex shape tests that were missing the required `data` field on CallExceptionTransaction. Local tsc with skipLibCheck did not surface this. No behavioral change -- the regex tests assert on the top-level error string, not the nested transaction shape.

…-tests test: KEEP-459 harden on-chain dispatch assertions against routing regressions

WebSocketProvider.ready in ethers v6 is a synchronous boolean getter, not a Promise; awaiting it resolves immediately and lets openProvider proceed before the ws upgrade actually completes. Replace with getBlockNumber(), which internally calls _waitUntilReady(). Since _waitUntilReady() never rejects on socket failure (only resolves on open), wrap the call in a Promise.race against an explicit ws-error listener and a 10s connect timeout, matching the pattern PR #988 used in the block dispatcher. Without the race, an unreachable host (DNS NXDOMAIN, ECONNREFUSED) hangs the connect attempt indefinitely instead of walking to the fallback URL. Update the two MockProvider stubs in unit tests to expose getBlockNumber in place of the unused ready Promise, and baseline the no-subscriber heartbeat assertion after connect so the initial connect probe is not counted as a heartbeat ping.

…ready-await fix: KEEP-349 race connect against ws-error + timeout in openProvider

…itor liveness, reconnect, and SQS enqueue paths Expose a per-chain Prometheus surface at :3000/metrics so operators can alert on silent subscriptions without waiting for users to report broken workflows. The block-dispatcher previously emitted only console logs and a /health boolean; today's prod incident only surfaced because a user noticed their workflow stopped firing several hours later. Metrics added (all keeperhub_block_dispatcher_*): Gauges (per chain, except chains_monitored) seconds_since_last_block scrape-time, primary alert signal socket_age_seconds scrape-time, debug is_alive mirrors ChainMonitor.isAlive() is_reconnecting reconnect-with-backoff in progress has_active_subscription eth_subscribe completed current_url_index 0=primary, 1=fallback (KEEP-557 flips) silent_reconnects_current consecutive block-advance timeouts last_processed_block highest block processed workflows_tracked block-trigger workflows per chain chains_monitored pod-level Counters blocks_received_total rate gives delivery rate per chain blocks_matched_total workflow trigger fires (no workflow_id label) ws_closes_total reason: upstream_close|pong_timeout| block_advance_timeout|socket_age_recycle| silent_failover|ping_send_failure| primary_probe_recovered reconnects_total outcome: success|exhausted url_flips_total direction: to_fallback|to_primary sqs_enqueue_total outcome: success|error unhandled_rejections_total ethers v6 eth_unsubscribe etc. Histograms reconnect_duration_ms handleDisconnect to subscription_active block_lag_seconds wall_clock - block.timestamp on receive Wiring is a single new lib/metrics.ts module with one process-wide prom-client Registry. ChainMonitor calls thin record*/set* helpers at each lifecycle point; no business logic depends on prom-client. seconds_since_last_block and socket_age_seconds use prom-client collect() callbacks so they always reflect the latest value at scrape time without per-block emission. Deploy: prometheus.io/scrape annotations added to staging and prod block-dispatcher-values.yaml. The /metrics endpoint is wired to the same express server that already serves /health, no new container port. Tests: 14 new cases across metrics.test.ts (counters, gauges, scrape-time callbacks, histograms) and chain-monitor.test.ts (integration: emit block -> counters and gauges advance; ws close -> ws_close counter labeled correctly). 80/80 vitest pass. Docs: lib/metrics/METRICS_REFERENCE.md gets a new section 6 (BLOCK DISPATCHER) documenting every metric with description, labels, and alert thresholds. Out of scope (separate PR in techops-infrastructure repo): Grafana dashboard JSON, Terraform alert rules, and ALERTS_REFERENCE.md entries that consume these metrics.

…client dep

prom-client gauges retain the last value of every label combination they have ever seen. Without an explicit .remove({chain}) call when a ChainMonitor stops, the seconds_since_last_block (and every other per-chain gauge) would emit the chain's last value forever — including chains whose workflows were deleted or chains the reconciler tore down because they were zombie. That would cause the new 'Block Dispatcher Chain Silent' alert to fire indefinitely on chains we no longer monitor. forgetChain now removes every per-chain gauge labelset so the chain disappears entirely from /metrics output. ChainMonitor.stop() no longer needs to flip individual gauges to 0 first — forgetChain wipes the labels regardless. Counters and histograms keep their cumulative history; rate()/increase() handle counter resets correctly anyway. Test updated to cover all nine per-chain gauges and asserts an unaffected chain is not impacted by the cleanup. 80/80 pass.

…-silent-wss-failover fix(block-dispatcher): KEEP-557 auto-failover to fallback WSS on silent-subscription failure

…-metrics feat(block-dispatcher): KEEP-557 Prometheus metrics for chain monitor liveness, reconnects, and SQS enqueue

suisuss and others added 28 commits May 13, 2026 15:21

fix: require active org membership for workflow access

d4d5ba9

fix: validate workflow action configs

99d9b5d

fix: update Aave V3 data provider addresses

5487e40

feat: expose OpenAPI discovery metadata

50fb03d

fix: require membership for same-org workflow access

6a1f3b9

fix: preserve dynamic workflow config values

e0f302c

fix: declare OpenAPI alias route config inline

d3e8fb7

Merge pull request #1249 from KeeperHub/feat/KEEP-554-agent-card-disc…

4468081

…overability feat: KEEP-554 enrich ERC-8004 agent card with skills, keywords, payment metadata

test: cover workflow access membership mocks

68c43e7

test: mock membership in workflow route integrations

85f64b9

Merge pull request #1247 from KeeperHub/feature/keep-439-aave-pooldata

e3e8bb5

fix: update Aave V3 data provider addresses

test: keep workflow import fixture valid

6899b2e

test: align api key execute access masking

1e39864

Merge pull request #1234 from KeeperHub/feat/KEEP-463-superfluid-chai…

c0a6051

…n-coverage feat(superfluid): KEEP-463 expand chain coverage and cross-check addresses against canonical metadata

Merge pull request #1233 from KeeperHub/feat/KEEP-459-harden-dispatch…

2358c69

…-tests test: KEEP-459 harden on-chain dispatch assertions against routing regressions

Merge pull request #1250 from KeeperHub/fix/KEEP-349-events-provider-…

33b1113

…ready-await fix: KEEP-349 race connect against ws-error + timeout in openProvider

suisuss temporarily deployed to staging May 14, 2026 05:34 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 13:21 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 13:26 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 13:31 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 13:39 — with GitHub Actions Inactive

joelorzet and others added 5 commits May 14, 2026 11:50

chore(block-dispatcher): KEEP-557 update scheduler lockfile for prom-…

f83aa73

…client dep

chore(block-dispatcher): KEEP-557 biome lint fix on metrics test

7a82b35

Merge pull request #1244 from KeeperHub/fix/keep-557-block-dispatcher…

76b6dfa

…-silent-wss-failover fix(block-dispatcher): KEEP-557 auto-failover to fallback WSS on silent-subscription failure

eskp temporarily deployed to staging May 14, 2026 18:44 — with GitHub Actions Inactive

Merge pull request #1256 from KeeperHub/fix/keep-557-block-dispatcher…

071f17d

…-metrics feat(block-dispatcher): KEEP-557 Prometheus metrics for chain monitor liveness, reconnects, and SQS enqueue

eskp temporarily deployed to staging May 14, 2026 18:45 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 18:50 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 18:58 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 19:03 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 19:11 — with GitHub Actions Inactive

eskp temporarily deployed to staging May 14, 2026 19:16 — with GitHub Actions Inactive

joelorzet merged commit aaaea67 into prod May 14, 2026
42 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release: To Prod#1252

release: To Prod#1252
joelorzet merged 46 commits into
prodfrom
staging

suisuss commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

suisuss commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants