Architectural refactor + L2BEAT integration + observability#39
Merged
Conversation
Generated by `graphify update .` (graphifyy on PyPI). Excluded so the ~7MB graph.json/graph.html aren't committed; rerun locally to refresh.
31c305c to
5241c55
Compare
Carves the 1884-line dataService.js god module and the 549-line index.js
route grab-bag into per-domain modules surfaced by graphify's community
analysis. Old paths (index.js, dataService.js) remain as thin facades
re-exporting from the new layout, so existing tests and downstream
imports keep working — verified by 499 pre-existing tests still passing
with the same 3 pre-existing failures.
Layout:
src/
transport/ fetch (network IO)
sources/ slip44 parser
domain/ relations, keywords (pure logic over indexed cache)
store/ cache singleton, snapshot, indexer, queries
services/ loader, rpcHealth, validation (orchestrators)
http/
app.js buildApp (Fastify wiring + plugin order)
routes/ one file per route domain
util/ parseIntParam, sendError
dataService.js: 1884 -> 29 lines (re-export facade)
index.js: 549 -> 23 lines (CLI bootstrap + buildApp re-export)
Adds 23 new unit tests covering the genuinely new helpers
(parseIntParam, sendError) and direct-import contracts for the extracted
modules (slip44, store/cache, domain/relations).
Test totals: 522 pass / 3 pre-existing fail (was 499/3). No regressions.
Comment on lines
+313
to
+314
| status: chain.status || 'active' | ||
| }; |
Comment on lines
+59
to
+67
| const slip44 = parseSLIP44(slip44Text); | ||
| const indexed = indexData(theGraph, chainlist, chains, slip44); | ||
|
|
||
| return { | ||
| data: { | ||
| theGraph, | ||
| chainlist, | ||
| chains, | ||
| slip44, |
| startRpcHealthCheck(); | ||
| } | ||
| }); | ||
| startRpcHealthCheck(); |
- dataService.test.js > loadData > should handle all sources failing: loadData() intentionally throws when all 4 sources fail (protects the cache from being silently wiped to empty and surfaces a clear error to POST /reload, which already wraps it in try/catch). Updated the test to assert the throw rather than expecting the older forgiving return. - mcp-tools.test.js > handleToolCall > get_stats (x2): The vi.mock() factory was missing countChainsByTag, so handleGetStats was calling undefined and the handler wrapped it as result.isError. Added countChainsByTag to the mock with the same pure aggregation as the real implementation. Test suite: 525 passed / 0 failed / 4 skipped (was 522/3/4).
Treats L2BEAT scaling data as a 5th source alongside theGraph, chainlist,
chains, and slip44. Data flows: live API first, falls back to a
checked-in static JSON when l2beat.com is unreachable (their site is
Cloudflare-gated and may 403 from some hosts/regions). The fallback
keeps /scaling responsive even when the live fetch fails.
New module layout:
src/sources/l2beat.js fetchL2Beat() + normalizeL2BeatResponse()
data/l2beat-fallback.json hand-curated last-known-good for top ~28 L2s
src/http/routes/scaling.js GET /scaling, GET /scaling/:id
Indexer changes:
- new indexL2BeatSource() merges L2BEAT fields onto chains by chainId
- auto-tags chains: L2 (always), ZK (ZK Rollup), Validium, Optimium
- adds 'l2beat' to chain.sources
- chain.l2Beat exposes: stage, category, stack, daLayer, hostChainId,
purposes, tvs, tvsBreakdown, activity, links, riskView, milestones
- per-chain dataFreshness flag ('live' | 'fallback' | 'unavailable')
tells consumers whether values came from the API or the snapshot
Cache + snapshot updated to persist l2beat raw response across restarts.
transformChain() now surfaces the l2Beat field in /chains/:id and
/scaling/:id responses.
Defensive normalizer:
- Tolerates 4 different payload shapes (projects array, data.projects,
bare array, etc.) since L2BEAT's site contract is undocumented.
- Drops projects without slug+chainId rather than emitting bad rows.
- Optional-chains every nested field (stage, daLayer, tvs, tvsBreakdown).
Tests: +18 new (sources/l2beat.test.js, store/indexer-l2beat.test.js).
Suite: 543 passing / 0 failing / 4 skipped (was 525/0/4).
Phase 2 (rolling refresher with L2BEAT live as a job type) tracked
separately.
| } | ||
|
|
||
| if (format === 'json') return await response.json(); | ||
| if (format === 'text') return await response.text(); |
| const { data, loadedSourceCount } = await fetchAndBuildData(); | ||
|
|
||
| if (requireAtLeastOneSource && loadedSourceCount === 0) { | ||
| throw new Error('All data sources failed during data refresh'); |
| * | ||
| * Returns empty / 404 when L2BEAT data hasn't loaded yet (live API gated and | ||
| * static fallback unavailable). Per-chain `l2Beat.dataFreshness` field | ||
| * indicates whether the data is `live`, `fallback`, or `unavailable`. |
| { "slug": "soneium", "chainId": 1868, "displayName": "Soneium", "stage": "Stage 1", "category": "Optimistic Rollup", "stack": "OP Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "unichain", "chainId": 130, "displayName": "Unichain", "stage": "Stage 1", "category": "Optimistic Rollup", "stack": "OP Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "zircuit", "chainId": 48900, "displayName": "Zircuit", "stage": "Stage 0", "category": "ZK Rollup", "stack": "ZK Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "starknet", "chainId": 23448594291968334, "displayName": "Starknet", "stage": "Stage 1", "category": "ZK Rollup", "stack": "Starknet", "daLayer": "Ethereum", "hostChainId": 1 }, |
Comment on lines
+34
to
+40
| return { | ||
| lastUpdated: cachedData.lastUpdated, | ||
| sources: { | ||
| theGraph: cachedData.theGraph ? 'loaded' : 'not loaded', | ||
| chainlist: cachedData.chainlist ? 'loaded' : 'not loaded', | ||
| chains: cachedData.chains ? 'loaded' : 'not loaded', | ||
| slip44: cachedData.slip44 ? 'loaded' : 'not loaded' |
Adds a background refresher that re-fetches L2BEAT data on an interval (default 5 min via L2BEAT_REFRESH_INTERVAL_MS) and re-merges into the indexed cache. Replaces "loaded once at startup" with continuous freshness. Design: - runL2BeatRefresh() — fetch + race-guarded merge. Skips writing if cachedData.lastUpdated changed mid-flight (i.e. a concurrent loadData() ran), preventing stale-data overwrites. - startL2BeatRefresh() — kick off immediately, then setInterval. Idempotent (second call is a no-op). Uses .unref() so the timer never blocks process exit. Self-coalescing: if a refresh is in flight when the timer fires, the next tick is queued instead of running parallel. - stopL2BeatRefresh() — for tests / clean shutdown. - getL2BeatRefreshStatus() — exposes isRefreshing, lastRefreshAt, lastRefreshSource, lastRefreshError, lastRefreshProjectCount, intervalMs. Wiring: - buildApp starts the refresher right after initializeDataOnStartup (alongside startRpcHealthCheck), and on every successful background refresh from the stale-first startup path. - indexer.js exports indexL2BeatSource so the refresher can re-merge without rebuilding the entire index. - New endpoint GET /scaling/status surfaces refresher state. - /scaling response now includes a `refresher` object with the same status block, so consumers can tell whether the data is fresh. Tests: +6 (services/l2beatRefresher.test.js) — covers the no-data skip, successful merge, race-guard skip, fetch-error path, status accessor, and start-twice idempotency. index.test.js: stubs the refresher module so buildApp doesn't kick off a real network fetch in unit tests. Suite: 549 passing / 0 failing / 4 skipped (was 543/0/4).
Now that L2BEAT data flows in, /validate cross-checks it against the
other sources. Five new rules surface real data quality issues:
rule 7 l2beat_missing_classification
L2BEAT classifies the chain but no l2Of/testnetOf relation
from theGraph or chains confirms it — upstream registries
may be stale or this is a new L2 they haven't picked up yet.
rule 8 l2beat_hostchain_no_relation
L2BEAT says hostChainId=N but the chain has no l2Of/testnetOf
relation pointing to N — settlement-chain disagreement.
rule 9 l2beat_category_name_mismatch
L2BEAT category contradicts the chain name (e.g. "ZK Rollup"
for something named "Optimistic ...").
rule 10 l2beat_unknown_chains (global, not per-chain)
L2BEAT lists a chainId we don't have in our registry —
discoverability gap, new L2 worth adding upstream.
rule 11 l2beat_stage_zero_high_tvs
Stage 0 chain with TVS > $1B — risk signal worth surfacing
(informational, not strictly a data error).
Each rule appears in the existing /validate response shape under
errorsByRule.ruleN_* and summary.ruleN. No breaking changes to the
existing 6 rules. Rules 7-9 + 11 are per-chain (inside validateChain);
rule 10 is global (iterates cachedData.l2beat.projects to catch chains
not in indexed.byChainId).
Tests: +14 covering each rule's positive case, the no-flag negative
cases, and aggregate summary structure. validation.test.js moves the
validation suite under tests/unit/services/ matching the new src/ layout
(pre-existing dataService.test.js coverage of validateChainData untouched).
Suite: 563 passing / 0 failing / 4 skipped (was 549/0/4).
Comment on lines
+15
to
+16
| if (format === 'json') return await response.json(); | ||
| if (format === 'text') return await response.text(); |
| { "slug": "soneium", "chainId": 1868, "displayName": "Soneium", "stage": "Stage 1", "category": "Optimistic Rollup", "stack": "OP Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "unichain", "chainId": 130, "displayName": "Unichain", "stage": "Stage 1", "category": "Optimistic Rollup", "stack": "OP Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "zircuit", "chainId": 48900, "displayName": "Zircuit", "stage": "Stage 0", "category": "ZK Rollup", "stack": "ZK Stack", "daLayer": "Ethereum", "hostChainId": 1 }, | ||
| { "slug": "starknet", "chainId": 23448594291968334, "displayName": "Starknet", "stage": "Stage 1", "category": "ZK Rollup", "stack": "Starknet", "daLayer": "Ethereum", "hostChainId": 1 }, |
Comment on lines
+422
to
+427
| export function indexL2BeatSource(l2beat, indexed) { | ||
| if (!l2beat?.projects?.length) return; | ||
|
|
||
| for (const project of l2beat.projects) { | ||
| const chain = indexed.byChainId[project.chainId]; | ||
| if (!chain) continue; |
Rounds out /validate with five more cross-source checks. Complements the
L2BEAT rules (7-11) from the previous commit by reaching into the other
sources (theGraph vs chains.json vs chainlist vs slip44 vs rpcHealth).
rule 12 rpc_block_height_drift
Two or more working RPC endpoints for the same chain report
block heights >100 apart. Surfaces stuck/forked nodes that
respond to web3_clientVersion + eth_blockNumber but lag the
consensus tip. Threshold is hard-coded; lift to config later
if tuning is needed.
rule 13 name_disagreement
chains.json name and theGraph fullName don't match after
normalization (lowercase, strip "mainnet", strip non-alnum,
skip substring relationships). Catches typos and outdated
names while tolerating "Arbitrum One" vs "Arbitrum"-style
variants.
rule 14 native_currency_mismatch
chains.json nativeCurrency.symbol disagrees with theGraph's
nativeToken (case-insensitive).
rule 15 slip44_native_symbol_mismatch
Chain has slip44Info but its symbol doesn't match the
nativeCurrency.symbol — usually means the chain is using a
slip44 coinType that doesn't actually belong to it.
rule 16 rpc_url_in_one_source_only
A healthy RPC URL appears in chainlist but not chains.json
(or vice versa). Strong hint that one source is stale and
needs updating upstream. Only fires when the chain is in
both sources AND the URL is currently passing health checks
in rpcHealth.
Slots into the existing /validate response shape. errorsByRule gets 5
new keys (rule12..rule16); summary gets 5 new counts. No breaking
changes to rules 1-11.
Tests: +13 covering each rule's positive case + negative cases. Updated
the seedCache helper to accept rawChains/rawChainlist/rpcHealth so rules
12 and 16 can be exercised in isolation.
Suite: 576 passing / 0 failing / 4 skipped (was 563/0/4).
/validate now has 16 cross-source rules covering theGraph, chainlist,
chains, slip44, l2beat, and rpcHealth.
Seven independent fixes for issues surfaced in PR review. Each is verified by an existing or new test. #2 indexer doesn't retain chain.slip44 (bug) indexChainsSource and mergeChainlistEntry now keep the slip44 field from raw sources, so attachSlip44Info can actually populate chain.slip44Info. New regression test verifies the round-trip. #3,#9 SLIP-44 fetch failure indistinguishable from empty success (bug) loader.js now only calls parseSLIP44 when slip44Text isn't null, preserving null end-to-end through applyDataToCache and the snapshot round-trip. /sources will now correctly report slip44: 'not loaded' when the fetch failed (was always 'loaded' because {} is truthy). #8,#11 Starknet fallback chainId exceeds Number.MAX_SAFE_INTEGER (bug) Removed the Starknet entry (chainId 23448594291968334) from data/l2beat-fallback.json. The codebase treats chainId as a JS Number for object keys, parseIntParam, and comparisons, so a chainId beyond 2^53-1 would round and cause silent mis-lookups. Documented in the file's note field; the live API can still surface Starknet once the indexer learns to handle BigInt IDs. #12 L2BEAT stale data persists across refreshes (bug) indexL2BeatSource now clears chain.l2Beat and removes 'l2beat' from chain.sources for any chain whose chainId isn't in the fresh project list. Defensive: only clears when the new list is non-empty, so a transient unavailable fetch doesn't wipe known data. New regression test exercises a refresh with a dropped project. #1 /export had no rate limit (CodeQL: missing-rate-limiting) Wrapped /export with the same RELOAD_RATE_LIMIT_MAX config as /reload (both are I/O-heavy and previously unguarded). #5,#10 fetchData returned undefined for unknown format (contract bug) Added an explicit `return null` in the format-fallthrough branch with a logged error, matching the documented "returns null on failure" contract. New tests cover both the success and unknown-format paths. #6 "All data sources failed" error message misleading Renamed to "All core data sources failed" since L2BEAT is intentionally excluded from loadedSourceCount (it has its own static fallback and isn't useful without the core sources). Both existing test assertions updated. #7 scaling.js comment mentioned non-existent 'unavailable' freshness Updated to reflect actual runtime: only 'live' and 'fallback' ever appear in chain.l2Beat.dataFreshness. Chains the merge couldn't reach simply have no l2Beat field. Skipped (per triage): #4 duplicate RPC sweep on warm start — minor wastefulness, not user-visible; deferred. Suite: 582 passing / 0 failing / 4 skipped (was 576/0/4). +6 new regression tests (indexer-slip44, indexer-l2beat clearing, transport fetch contract).
Comment on lines
+75
to
+101
| function normalizeProject(p) { | ||
| return { | ||
| slug: p.slug ?? p.id ?? p.display?.slug ?? null, | ||
| displayName: p.name ?? p.display?.name ?? p.displayName ?? null, | ||
| chainId: extractChainId(p), | ||
| category: p.category ?? p.type ?? null, | ||
| stage: extractStage(p), | ||
| stack: p.stack ?? p.providerName ?? p.display?.stack ?? null, | ||
| daLayer: extractDaLayer(p), | ||
| hostChainId: p.hostChain?.chainId ?? p.hostChainId ?? null, | ||
| purposes: Array.isArray(p.purposes) ? p.purposes : [], | ||
| tvs: extractTvs(p), | ||
| tvsBreakdown: p.tvs?.breakdown ?? p.tvsBreakdown ?? null, | ||
| activity: p.activity ?? null, | ||
| links: p.links ?? p.display?.links ?? null, | ||
| riskView: p.riskView ?? null, | ||
| milestones: Array.isArray(p.milestones) ? p.milestones : null | ||
| }; | ||
| } | ||
|
|
||
| function extractChainId(p) { | ||
| return p.chainId | ||
| ?? p.chainConfig?.chainId | ||
| ?? p.chains?.[0]?.chainId | ||
| ?? p.eip155Id | ||
| ?? null; | ||
| } |
Comment on lines
+430
to
+448
| export function indexL2BeatSource(l2beat, indexed) { | ||
| if (!l2beat?.projects?.length) return; | ||
|
|
||
| // Clear stale data first: any chain that previously had l2Beat data but | ||
| // isn't in the fresh project list (e.g. project was removed from L2BEAT) | ||
| // should lose its l2Beat field so /scaling stops reporting it. | ||
| const freshChainIds = new Set(l2beat.projects.map(p => p.chainId)); | ||
| for (const chain of Object.values(indexed.byChainId)) { | ||
| if (chain.l2Beat && !freshChainIds.has(chain.chainId)) { | ||
| delete chain.l2Beat; | ||
| if (Array.isArray(chain.sources)) { | ||
| chain.sources = chain.sources.filter(s => s !== 'l2beat'); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| for (const project of l2beat.projects) { | ||
| const chain = indexed.byChainId[project.chainId]; | ||
| if (!chain) continue; |
Comment on lines
+4
to
+14
| // We don't mock fetchUtil for this test because fetchData should return null | ||
| // without any network call when given an unsupported format. Use a URL that | ||
| // won't actually resolve to keep the test offline-safe. | ||
|
|
||
| describe('fetchData — unsupported format (regression)', () => { | ||
| it('returns null when format is neither "json" nor "text"', async () => { | ||
| // The fetch will fail (sandbox blocks network), but the catch block | ||
| // returns null anyway. We want to verify the contract holds for the | ||
| // success path too — so call with a format that bypasses both branches. | ||
| // Easiest deterministic check: stub global fetch to return a response | ||
| // and confirm the unknown-format branch returns null. |
…ll verdict
Today /health is a binary "dataLoaded: true/false" — useless for
observability. This makes /health a real readiness probe:
GET /health now returns:
status: 'ok' | 'degraded' | 'down'
- down when any core source (theGraph/chainlist/chains) is missing
- degraded when supplementary sources (slip44 null, l2beat empty) are
missing, or a refresher hasn't run within 2x its expected interval
- ok otherwise
sources:
theGraph/chainlist/chains/slip44/l2beat each report
{ loaded: bool, ageSeconds: number|null, source?: 'live'|'fallback'|null }
refreshers:
rpc: { isRunning, lastRunAt }
l2beat: full status block from getL2BeatRefreshStatus()
plus the existing dataLoaded, lastUpdated, totalChains.
GET /sources extended:
- Now reports slip44: 'not loaded' when slip44 is null (was always
'loaded' because {} is truthy — same fix as the SLIP-44 failure
preservation work).
- Adds l2beat: 'loaded'|'not loaded' based on whether projects[] is
non-empty.
Implementation notes:
- Single sourceFreshness(cache) helper computes per-source loaded/age.
- deriveOverallStatus() folds sources + refreshers into the verdict.
- ageSeconds() returns null for null/invalid timestamps (no NaN values
leak into the response).
- All null/undefined checks use loose equality (!= null) to be tolerant
of mocks that omit the l2beat field.
Tests: +6 in http/admin.test.js covering ok/degraded/down verdicts and
the SLIP-44 null preservation. Integration test mock updated to include
the new l2beat field so the existing /health "should return ok"
assertion still holds. Suite: 588 passing / 0 failing.
Routes now declare their query/param shapes via Fastify's schema option,
so Ajv catches type errors, enum violations, max-length overruns, and —
critically — unknown query parameters that previously silently 200'd.
Coverage:
/chains ?tag enum (Testnet/L2/Beacon/ZK/Validium/Optimium)
additionalProperties:false (typo catcher)
/chains/:id :id must match ^-?\\d+$
/search ?q required, 1..MAX_SEARCH_QUERY_LENGTH chars
additionalProperties:false
/relations/:id :id pattern
/relations/:id/graph ?depth integer 1..5, additionalProperties:false
/endpoints/:id :id pattern
/slip44/:coinType :coinType pattern
/rpc-monitor/:id :id pattern
/scaling/:id :id pattern
Implementation:
- buildApp wires a `schemaErrorFormatter` that translates Ajv errors into
the project's `{ error: "..." }` envelope, preserving the existing
user-friendly wording ("Invalid chain ID", "Invalid tag. Allowed: ...",
"Query too long. Max length: ...", "Invalid depth. Must be between 1
and 5"). Field-name → noun mapping handles cases like `:id` → "chain ID"
and `:coinType` → "coin type".
- `setErrorHandler` ensures every 4xx/5xx response uses the same envelope
(not just validation errors).
- Ajv configured with `removeAdditional: false` so additionalProperties
actually rejects unknown params instead of silently stripping them.
This is the typo-catcher win: `GET /chains?tags=L2` (typo of `?tag=`)
now returns 400 instead of returning all chains.
Tests: +1 integration test for the unknown-query-parameter rejection.
All 12 existing 400-response tests still pass — the schema-driven
messages are byte-compatible with the previous handler-driven ones.
Suite: 589 passing / 0 failing / 4 skipped (was 588/0/4).
Two observability improvements that work together:
1. Structured logging via pino
- All 25 console.{log,warn,error} calls in src/ replaced with
pino-backed logger.{info,warn,error}.
- New src/util/logger.js exports a shared logger with structured
fields (component='chains-api'). Level controlled via LOG_LEVEL env.
- Fastify already uses pino internally; this brings background
jobs (sources, services, store) onto the same logger so a single
log pipeline captures everything.
- JSON output by default; pretty in TTY when running locally.
2. GET /metrics endpoint (Prometheus exposition format)
- New src/util/metrics.js: tiny zero-dep counter + gauge implementation
in <90 lines. Counters tracked in-memory and incremented from
transport/services. Gauges computed on scrape from live cache.
- Metrics:
chains_api_source_fetch_total{url, outcome} counter
chains_api_refresh_total{refresher, outcome} counter
chains_api_rpc_check_total{outcome} counter
chains_api_chains_total gauge
chains_api_source_loaded{source} gauge
chains_api_data_age_seconds gauge
chains_api_l2beat_refresh_age_seconds gauge
chains_api_rpc_check_age_seconds gauge
chains_api_validation_errors{rule} gauge
- Content-Type: text/plain; version=0.0.4 (Prometheus default).
- Validation summary computed on scrape (O(N chains) — fine for
scrape intervals >= 15s; can be cached if it becomes a problem).
Tests:
+6 in http/metrics.test.js covering content-type, gauges (loaded
sources, total chains), counters (with labels), and validation error
labels.
Updated 5 RPC-health tests in dataService.test.js to spy on the new
pino logger instead of console.{log,warn}. Pattern: spy on
logger.warn / logger.info after the module loads. For the
vi.resetModules() test we re-import logger.js after reset so the
fresh module is the one being spied on.
Suite: 595 passing / 0 failing / 4 skipped (was 589/0/4).
Note: index.js still has a few console.error calls in the CLI bootstrap
path — those run before pino is configured, so they stay as-is.
Replaces the two parallel setInterval schedulers in services/rpcHealth.js
and services/l2beatRefresher.js with a single queue-based loop in
src/services/chainRefresher.js. The thundering-herd RPC sweep becomes
a rolling per-chain check spread across the sweep window.
Design:
queue = [
{ type: 'l2beat_batch' }, // 1 job
{ type: 'chain_rpc', chainId: N }, ... one per chain
]
every SWEEP_TICK_MS (default 1s, env-tunable):
if queue empty: rebuild from current indexed chains, increment sweep#
pop next job, dispatch to processor
Job processors live in chainRefresher.js:
- processL2BeatBatch() fetch + race-guarded indexL2BeatSource merge
- processChainRpc(id) check every RPC for one chain, race-guarded
writes to cachedData.rpcHealth[id] AND stamps
chain.lastTested for per-chain freshness
Why rolling:
- Old all-at-once sweep: ~3000 RPC URLs hit in parallel (capped at 8
concurrent). Worst case spikes outbound load.
- New rolling: ~10 RPC URLs/sec average (one chain/tick × ~10 URLs/chain
typical). Smooth, predictable, gentle on upstream RPCs.
- L2BEAT batch lands as job #0 of every sweep, so its cadence matches
the RPC sweep (~5 min for 300 chains at 1s/tick) without a separate
setInterval.
Backwards compatibility:
services/rpcHealth.js becomes a thin shim:
- runRpcHealthCheck() drains every chain immediately (legacy blocking
contract; used by /reload and tests). Detects
"no endpoints" and emits the same warn line.
- startRpcHealthCheck() alias for startChainRefresher()
- getRpcMonitoringStatus() reads from chainRefresher
services/l2beatRefresher.js similarly:
- runL2BeatRefresh() delegates to processL2BeatBatch
- startL2BeatRefresh() alias for startChainRefresher
- getL2BeatRefreshStatus() reads from chainRefresher
buildApp wiring unchanged: it still calls startRpcHealthCheck() and
startL2BeatRefresh(). Both now start the same loop; the second call is a
no-op (idempotent).
New endpoint:
GET /refresher exposes the unified status:
tickIntervalMs, isTickInFlight, lastTickAt, lastTickJobType,
queueDepth, sweep { jobIndex, totalJobs, sweepNumber, sweepStartedAt },
l2beat { lastRefreshAt, lastRefreshSource, lastRefreshProjectCount,
lastRefreshError, intervalMs },
rpc { isMonitoring, lastSweepCompletedAt, endpointsCheckedThisSweep }
Tests:
+11 new in services/chainRefresher.test.js covering the two job
processors, tick scheduling, sweep queue rebuild, overlap guard, and
status accessor.
Updated services/l2beatRefresher.test.js config mock to include the
RPC-related env vars that chainRefresher transitively requires.
Tunables:
CHAIN_REFRESHER_TICK_MS ms between ticks (default 1000)
Suite: 606 passing / 0 failing / 4 skipped (was 595/0/4).
Eight follow-ups from the PR review. Each is independent of the others and could land separately; bundled here for review economy. 1. Logging migration completion fetchUtil.js (3 calls) and mcp-server-http.js (10 calls) now use the shared pino logger. index.js CLI bootstrap stays on console.error because it runs before pino is configured. 2. Dead config marked @deprecated - RPC_CHECK_CONCURRENCY no longer drives anything since the unified rolling refresher (one chain per tick). - L2BEAT_REFRESH_INTERVAL_MS no longer drives scheduling; only kept as a hint in /scaling/status. JSDoc warns callers in both cases. 3. dataService.js facade comment Explicitly documents that new code should import directly from src/ and not add new exports here. 4. 5xx error sanitization setErrorHandler now returns "Internal Server Error" for 5xx instead of leaking error.message. Server-side log still records the full error via fastify.log.error. 5. /metrics validation cache (30s) validateChainData is O(N chains × M rules); cache the summary result for VALIDATION_CACHE_MS so frequent Prometheus scrapes don't re-run validation on every request. 6. getAllChains() memoization Caches the transformChain'd array, invalidated by cachedData.lastUpdated AND by cachedData.l2beat.fetchedAt (so a rolling L2BEAT refresh correctly invalidates without a full data reload). Reduces hot-path work on /chains, /scaling, /stats, and the validation pass. 7. Inter-job race guard in chainRefresher Each sweep captures cachedData.lastUpdated at start. If a concurrent loadData() bumps it mid-sweep, the rest of the queue is dropped to prevent writing a frankensweep of mixed data versions. Logs a warn with the dropped job count. 8. Documentation - validation.js rule 13 marked as heuristic + carries severity: 'info' in error objects (advisory, not authoritative). - scaling.js route header documents the Starknet chainId precision gap (CAIP-2 numeric ID 0x534e5f4d41494e exceeds Number.MAX_SAFE_INTEGER). New tests (+4): /health exposes per-source freshness and per-refresher status /refresher returns the unified refresher status block /metrics returns Prometheus exposition with text/plain content type /metrics includes source-loaded gauge for each of the 5 sources New workflow: .github/workflows/refresh-l2beat-fallback.yml — weekly cron + manual dispatch that fetches the live L2BEAT API, normalizes via the same module as runtime, filters non-safe-integer chainIds, and opens a PR if data/l2beat-fallback.json differs. Gracefully skips when the live API is unreachable. PR title updated from the misleading "Ignore graphify-out/ knowledge graph artifacts" to "Architectural refactor + L2BEAT integration + observability" with an accurate body via the GitHub MCP API. Suite: 610 passing / 0 failing / 4 skipped (was 606/0/4).
| @@ -0,0 +1,14 @@ | |||
| import { pino } from 'pino'; | |||
Comment on lines
+430
to
+431
| export function indexL2BeatSource(l2beat, indexed) { | ||
| if (!l2beat?.projects?.length) return; |
Comment on lines
+438
to
+442
| if (chain.l2Beat && !freshChainIds.has(chain.chainId)) { | ||
| delete chain.l2Beat; | ||
| if (Array.isArray(chain.sources)) { | ||
| chain.sources = chain.sources.filter(s => s !== 'l2beat'); | ||
| } |
Comment on lines
+10
to
+12
| export async function fetchData(url, format = 'json') { | ||
| try { | ||
| const response = await proxyFetch(url); |
1. Replace FIELD_NOUNS map with ajv-errors per-schema errorMessage
The hard-coded `id → "chain ID"` / `coinType → "coin type"` mapping in
formatSchemaValidationError() was brittle: adding a new route with a
different param name would silently fall through to a generic message.
Now each schema declares its own user-facing wording via the
`errorMessage` keyword (ajv-errors). The route author controls the
message at the schema; no central registry to maintain.
- Added `ajv-errors` dependency and registered it as an Ajv plugin in
buildApp() with `allErrors: true` so the plugin can inspect every
violation.
- schemaErrorFormatter prefers `errorMessage`-authored strings; for
`additionalProperties` it still interpolates the property name
server-side (the schema can't do `${...}` interpolation).
- Migrated every schema in src/http/routes/*.js to carry its own
errorMessage. The 12 existing 400-response tests pass unchanged.
2. ESLint with no-restricted-imports gating dataService.js
First ESLint config in the repo. Minimal flat config (eslint v10).
The rule applies to `src/{store,domain,sources,services,transport,util}/`
and prevents those layers from importing the legacy `dataService.js`
facade — they must depend on peer modules under src/ directly, which
keeps the layered architecture acyclic.
src/http/ is intentionally exempt: routes are the HTTP boundary and
the integration tests mock `dataService.js` as a single seam. Moving
those mocks to per-module paths is a separate test refactor.
- `npm run lint` runs `eslint src/`.
- .github/workflows/docker-build.yml now runs lint before tests.
Suite: 610 passing / 0 failing / 4 skipped (unchanged). Lint clean.
Completes the layered-architecture migration started in the original
refactor. Routes now import from their actual peer modules under src/
instead of going through the legacy dataService.js facade, and the
ESLint rule covers the entire src/ tree (was previously narrowed to
src/{store,domain,sources,services,transport,util}/ to dodge the test
refactor).
Test refactor:
tests/integration/api.test.js — replaces the single
vi.mock('../../dataService.js', ...) with vi.hoisted() shared mock
fns wired into 7 vi.mock factories (one per src/ module the routes
use). Test bodies that reference `dataService.X` keep working because
dataService.js's re-exports resolve to the same hoisted fn identities
via the mocked src/ modules.
tests/unit/http/admin.test.js — now mocks each src/ path individually.
tests/unit/http/metrics.test.js — same.
tests/unit/index.test.js — same; the onBackgroundRefreshSuccess
capture pattern moved from `dataService.startRpcHealthCheck` to
a direct hoisted mock reference.
Route migrations:
src/http/app.js
initializeDataOnStartup → ../services/loader.js
startRpcHealthCheck → ../services/rpcHealth.js
src/http/routes/chains.js → ../../store/queries.js
src/http/routes/relations.js → ../../domain/relations.js
src/http/routes/endpoints.js → ../../store/queries.js
src/http/routes/slip44.js → ../../store/cache.js
src/http/routes/scaling.js → ../../store/queries.js
src/http/routes/rpcMonitor.js → ../../store/queries.js + services/rpcHealth.js
src/http/routes/metrics.js → store/cache + services/{rpcHealth,validation,l2beatRefresher}
src/http/routes/admin.js → store/{cache,queries} + domain/keywords + services/{loader,rpcHealth,validation,l2beatRefresher}
ESLint scope:
Rule applies to all of src/**/*.js (was narrowed to non-http
subtrees). dataService.js is now reserved exclusively for legacy
external callers; new code under src/ depends on peer modules
directly.
Suite: 610 passing / 0 failing / 4 skipped. Lint clean.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 58 out of 60 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
src/transport/fetch.js:23
- This branch increments the success counter before
response.text()resolves. If reading the body fails, the catch block will also increment the error counter for the same request. Increment success only after the text body has been read successfully.
Comment on lines
+430
to
+431
| export function indexL2BeatSource(l2beat, indexed) { | ||
| if (!l2beat?.projects?.length) return; |
| @@ -0,0 +1,14 @@ | |||
| import { pino } from 'pino'; | |||
Comment on lines
+56
to
+58
| for (const chain of chains) { | ||
| await processChainRpc(chain.chainId); | ||
| } |
Comment on lines
+17
to
+19
| if (format === 'json') { | ||
| incCounter('chains_api_source_fetch_total', { url, outcome: 'success' }); | ||
| return await response.json(); |
| if (data.theGraph !== null) loaded++; | ||
| if (data.chainlist !== null) loaded++; | ||
| if (data.chains !== null) loaded++; | ||
| if (data.slip44Text !== null) loaded++; |
| DATA_SOURCE_CHAINS, DATA_SOURCE_SLIP44, | ||
| DATA_CACHE_ENABLED, DATA_CACHE_FILE | ||
| } from './config.js'; | ||
| import { buildApp } from './src/http/app.js'; |
Comment on lines
+126
to
+131
| const normalized = (chain.rpc || []).map(normalizeRpcUrl).filter(Boolean); | ||
| const urls = Array.from(new Set(normalized)).filter(u => u.startsWith('http')); | ||
| if (urls.length === 0) return; | ||
|
|
||
| rpcState.isMonitoring = true; | ||
| const results = await Promise.all(urls.map(checkRpcEndpoint)); |
The HTTP layer added /scaling, /scaling/:id, and /refresher in earlier
commits; MCP clients had no parallel surface and were forced to read the
new chain.l2Beat field out of get_chain_by_id responses. Three new tools
fix that:
get_scaling_chains
Returns chains classified by L2BEAT as scaling solutions, plus the
refresher status block so callers can tell whether the data is live
or from the static fallback snapshot. Parallels GET /scaling.
get_l2beat_by_id { chainId }
Single chain's L2BEAT view. Returns errorResponse when the chain
isn't in the registry, when it has no L2BEAT classification, or
when chainId is invalid. Parallels GET /scaling/:id.
get_refresher_status
Exposes the unified rolling refresher's freshness state. Parallels
GET /refresher (lighter surface — just the L2BEAT-relevant fields).
mcp-tools.js now imports getL2BeatRefreshStatus directly from
src/services/l2beatRefresher.js since the facade doesn't re-export it.
Tests: +7 covering both happy and error paths for each tool, plus a
sanity check that getToolDefinitions() exposes the new names. Tool
count assertion bumped 13 → 16.
Suite: 618 passing / 0 failing / 4 skipped (was 610/0/4).
Deploy & dependency fixes
- package.json: declare pino directly (previously only transitive via fastify).
- Dockerfile: COPY src/, data/, public/ so the refactored layout actually
ships in the container.
Correctness fixes
- src/services/loader.js: exclude SLIP-0044 from the "at least one source
loaded" guard. SLIP-0044 contributes coin-type metadata but no chain
entries, so the API was previously coming up with an empty index when
every chain registry failed but SLIP-44 succeeded. Error message
reworded accordingly.
- src/services/rpcHealth.js: in runRpcHealthCheck, check the data version
before each chain iteration so a concurrent loadData() aborts the sweep
immediately instead of leaking partial old-version results into the
fresh cache.
- src/services/chainRefresher.js: apply MAX_ENDPOINTS_PER_CHAIN in
processChainRpc so large chain entries can't generate per-tick request
bursts that ignore the configured cap.
- src/transport/fetch.js: validate format before issuing any I/O so
unsupported callers fail deterministically without a wasted outbound
request; move the success counter increment after the body parse so a
single body-parse failure isn't double-counted as both success and
error.
- src/store/indexer.js: indexL2BeatSource no longer early-returns on an
empty project list — it runs the stale-data cleanup pass even when
the fresh L2BEAT payload is empty, so /scaling stops serving projects
that have disappeared. Tag cleanup widened to also remove L2BEAT-derived
tags (L2, ZK, Validium, Optimium). Project chainIds are normalized to
numbers up front so byChainId lookups and freshChainIds membership
checks use one type.
- src/sources/l2beat.js: extractChainId coerces strings ("8453") and
CAIP-2 strings ("eip155:8453") to safe integers; unsafe values
(>= 2^53 or non-numeric) become null so downstream indexing stays
consistent.
Security
- src/util/metrics.js: Prometheus label values now escape backslashes
(and newlines) in addition to double quotes. Same escape applied to
the dynamic rule label in chains_api_validation_errors.
Test fixes
- tests/unit/services/chainRefresher.test.js: add MAX_ENDPOINTS_PER_CHAIN
to the config mock so the new cap import resolves.
- tests/unit/dataService.test.js: add MAX_ENDPOINTS_PER_CHAIN to the
config mock; loadData tests now use non-null mock payloads since
SLIP-44 no longer satisfies the chain-source guard on its own; update
the error-message assertions to match the reworded "All chain registry
sources failed" text.
All 618 tests pass, 0 failures, 4 skipped (was 0 failed pre-fixes).
Comment on lines
+457
to
+459
| if (Array.isArray(chain.tags)) { | ||
| chain.tags = chain.tags.filter(t => !L2BEAT_DERIVED_TAGS.has(t)); | ||
| } |
Comment on lines
+501
to
+503
| function handleGetRefresherStatus() { | ||
| return textResponse(getL2BeatRefreshStatus()); | ||
| } |
Comment on lines
+78
to
+89
| export const DATA_SOURCE_L2BEAT_API = parseStringEnv( | ||
| 'DATA_SOURCE_L2BEAT_API', | ||
| 'https://l2beat.com/api/scaling-summary' | ||
| ); | ||
| export const L2BEAT_FETCH_TIMEOUT_MS = parseIntEnv('L2BEAT_FETCH_TIMEOUT_MS', 10000); | ||
| /** | ||
| * @deprecated Cadence is now driven by the unified rolling refresher | ||
| * (CHAIN_REFRESHER_TICK_MS × queue length). Kept so /scaling/status can keep | ||
| * exposing the value as a hint to consumers, but no longer used for | ||
| * scheduling. Safe to remove in v2 once consumers migrate to /refresher. | ||
| */ | ||
| export const L2BEAT_REFRESH_INTERVAL_MS = parseIntEnv('L2BEAT_REFRESH_INTERVAL_MS', 300000); |
Comment on lines
+36
to
+37
| const SWEEP_TICK_MS = Number(process.env.CHAIN_REFRESHER_TICK_MS) || 1000; | ||
|
|
Brings in PR #40 (native-token USD prices + execution-client fingerprinting) and PR #41 (dependabot fastify 5.8.5 bump). Conflict resolution favored PR #39's architectural structure: the PR #40 features are ported into the new src/ layout instead of into the old monolithic index.js / mcp-tools.js. - src/http/routes/chains.js: /chains and /chains/:id now enrich responses with price via priceService. - src/http/routes/clients.js (new): /clients and /clients/:id expose getClientsByChain output. - src/http/routes/rpcMonitor.js: /rpc-monitor/:id gains a clients array via summarizeChainClients. - src/http/app.js: registers clientsRoutes, kicks off prefetchAllPrices on startup (with pino-logged failure). - index.js: kept the PR #39 shim form (imports buildApp from src/http/app.js). - mcp-tools.js: keeps PR #39's L2BEAT/refresher tools (get_scaling_chains, get_l2beat_by_id, get_refresher_status) AND adds PR #40's get_clients; get_chains / get_chain_by_id handlers enriched with price. - package.json: union of deps — pino ^10.3.0 (PR #39) + fastify ^5.8.5 (PR #41). - tests/integration/api.test.js: kept the PR #39 hoisted-mocks pattern, added priceService and clientsView mock factories so the new routes resolve under mocked seams. - tests/unit/mcp-tools.test.js: kept l2beatRefresher mock, added clientsView + priceService mocks; tool count is now 17 (13 base + 3 L2BEAT/refresher + 1 clients). - package-lock.json: regenerated by npm install. Test result: 670 passing / 0 failing / 4 skipped (PR #39 was 618/0/4 and PR #40 added priceService + clientsView + clientParser test modules; merged total adds 52 net new passing tests).
Comment on lines
+457
to
+459
| if (Array.isArray(chain.tags)) { | ||
| chain.tags = chain.tags.filter(t => !L2BEAT_DERIVED_TAGS.has(t)); | ||
| } |
Comment on lines
+73
to
+80
| logger = true, | ||
| bodyLimit = BODY_LIMIT, | ||
| maxParamLength = MAX_PARAM_LENGTH, | ||
| loadDataOnStartup = true | ||
| } = options; | ||
|
|
||
| const fastify = Fastify({ | ||
| logger, |
Comment on lines
+545
to
+546
| function handleGetRefresherStatus() { | ||
| return textResponse(getL2BeatRefreshStatus()); |
Comment on lines
+32
to
+34
| const json = await response.json(); | ||
| const projects = normalizeL2BeatResponse(json); | ||
| return { source: 'live', fetchedAt: new Date().toISOString(), projects }; |
Comment on lines
+30
to
+34
| '/scaling': 'Get all chains with L2BEAT scaling data (stage, category, DA layer, TVS)', | ||
| '/scaling/:id': 'Get L2BEAT scaling data for a specific chain by ID', | ||
| '/scaling/status': 'Get L2BEAT refresher status (last refresh, source, errors)', | ||
| '/metrics': 'Prometheus exposition format (counters + gauges for source freshness, refreshes, validation)', | ||
| '/refresher': 'Unified rolling refresher status (queue depth, sweep cursor, per-job-type state)' |
Comment on lines
+25
to
+26
| export function stopL2BeatRefresh() { | ||
| stopChainRefresher(); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What this PR contains
A multi-phase refactor + observability + integration sprint on
chains-api. The original one-line.gitignorechange grew into a major redesign — preserved as a single PR for cohesion but each commit is independent and reviewable.1. Architectural refactor (commits
bb85f00,6fa42ff)Splits the 1,884-line
dataService.jsgod module and the 549-lineindex.jsroute grab-bag into a layeredsrc/structure:transport → sources → store → domain → services → http. Old paths remain as thin re-export facades so existing imports and tests don't break.2. L2BEAT data source integration (commits
e728231,168e21e,98d15a1)Treats L2BEAT as a 5th data source. Live API (
l2beat.com/api/scaling-summary) with graceful fallback to checked-indata/l2beat-fallback.jsonwhen the endpoint is gated. New/scaling,/scaling/:id,/scaling/statusroutes. Per-chainl2Beatfield surfaces stage, category, stack, DA layer, host chain, TVS, activity. Rolling refresher pattern viaservices/l2beatRefresher.js.3. Cross-source validation (commits
6cefec2,f9ffdda)Adds 10 new rules to
/validate(now 16 total). 5 L2BEAT cross-checks (missing classification, hostChain disagreement, category vs name, unknown chains, Stage 0 high TVS) + 5 non-L2BEAT (RPC block height drift, name disagreement, native currency mismatch, slip44 vs symbol, RPC URL in one source only).4. Observability + ergonomics (commits
8c37385,4cfb29b,826d009)/healthdeepened with per-source freshness + per-refresher status + overallok|degraded|downverdict.?tags=L2(vs?tag=) now 400s instead of returning all chains.console.*replaced with pino (structured JSON logs).GET /metricsPrometheus endpoint (counters: source fetches, refreshes, RPC checks; gauges: chain count, source loaded flags, refresh ages, validation error counts per rule).5. Unified rolling refresher (commit
36614f7)Replaces the two parallel
setIntervalschedulers (RPC health + L2BEAT) with one queue-based loop inservices/chainRefresher.js. Per-tick (default 1s, env-tunable viaCHAIN_REFRESHER_TICK_MS), pops either anl2beat_batchor achain_rpcjob. Spreads RPC fan-out evenly across the sweep window instead of thundering-herd. Old service modules become thin shims. NewGET /refresherexposes sweep cursor + queue depth.Test summary
tests/unit/{store,domain,sources,services,http,transport}/.6fa42ff.New env vars
LOG_LEVELinfoCHAIN_REFRESHER_TICK_MS1000DATA_SOURCE_L2BEAT_APIhttps://l2beat.com/api/scaling-summaryL2BEAT_FETCH_TIMEOUT_MS10000L2BEAT_REFRESH_INTERVAL_MS300000New endpoints
GET /scaling,GET /scaling/:id,GET /scaling/statusGET /metrics(Prometheus exposition)GET /refresherBreaking changes
None at the HTTP API level. Internal module structure changed but
dataService.jsandindex.jsstill export the same names.