feat(backend): add data retention polling with pdp subgraph integration#286
feat(backend): add data retention polling with pdp subgraph integration#286silent-cipher wants to merge 16 commits intomainfrom
Conversation
|
PR is open for review. However, it shouldn’t be merged until subgraph data is available (FilOzone/pdp-explorer#86). |
There was a problem hiding this comment.
Pull request overview
Adds backend support for polling PDP (Proof of Data Possession) subgraph data on a schedule to derive per-provider data-retention statistics and emit them as Prometheus metrics, integrated into the existing pg-boss job system and configuration/docs.
Changes:
- Introduces
PDPSubgraphService(+ module, query, response validation) and associated tests. - Adds
DataRetentionService(+ module, tests) and wires a newdata.retention.pollpg-boss job/schedule intoJobsService. - Extends configuration and docs with
PDP_SUBGRAPH_ENDPOINTandDATA_RETENTION_POLL_INTERVAL_SECONDS, plus a new Prometheus counter registration.
Reviewed changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/environment-variables.md | Documents new env vars and scheduling option for data retention polling. |
| apps/backend/src/wallet-sdk/wallet-sdk.service.ts | Adds getBlockNumber() helper used by data retention polling. |
| apps/backend/src/wallet-sdk/wallet-sdk.service.spec.ts | Updates test config to include new blockchain config field. |
| apps/backend/src/pdp-subgraph/types.ts | Adds Joi-based validation/transforms for subgraph response types. |
| apps/backend/src/pdp-subgraph/types.spec.ts | Adds unit tests for subgraph response validation. |
| apps/backend/src/pdp-subgraph/queries.ts | Adds GraphQL query for providers and proof sets. |
| apps/backend/src/pdp-subgraph/pdp-subgraph.service.ts | Implements subgraph fetch with batching, rate limiting, retries, validation. |
| apps/backend/src/pdp-subgraph/pdp-subgraph.service.spec.ts | Adds tests for subgraph service fetch/retry/validation behavior. |
| apps/backend/src/pdp-subgraph/pdp-subgraph.module.ts | Exposes PDPSubgraphService via Nest module. |
| apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts | Registers a new counter for data retention / dataset challenge status. |
| apps/backend/src/jobs/repositories/job-schedule.repository.ts | Adds queue-name mapping for data_retention_poll in pg-boss job state queries. |
| apps/backend/src/jobs/jobs.service.ts | Wires new data.retention.poll worker + schedule row + queue mapping + metrics tracking. |
| apps/backend/src/jobs/jobs.service.spec.ts | Updates job service tests for new dependency and new worker/schedule expectations. |
| apps/backend/src/jobs/jobs.module.ts | Imports DataRetentionModule so the jobs worker can execute the poller. |
| apps/backend/src/jobs/job-queues.ts | Adds DATA_RETENTION_POLL_QUEUE constant. |
| apps/backend/src/database/entities/job-schedule-state.entity.ts | Extends JobType union with data_retention_poll. |
| apps/backend/src/data-retention/data-retention.service.ts | Implements polling logic, delta computation, and Prometheus counter increments. |
| apps/backend/src/data-retention/data-retention.service.spec.ts | Adds tests for polling behavior, batching, deltas, and edge cases. |
| apps/backend/src/data-retention/data-retention.module.ts | Exposes DataRetentionService via Nest module. |
| apps/backend/src/config/app.config.ts | Adds env validation + config fields for PDP subgraph and poll interval. |
| apps/backend/README.md | Documents PDP_SUBGRAPH_ENDPOINT and DATA_RETENTION_POLL_INTERVAL_SECONDS. |
| apps/backend/.env.example | Adds example values for new env vars. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
SgtPooki
left a comment
There was a problem hiding this comment.
a few comments.
We should add a few tests too:
- cover how we handle subgraph lag vs rpc block height
- test large deltas and assert we are handling them safely.
| `Negative delta detected for provider ${address} (faulted: ${faultedDelta}, success: ${successDelta}); skipping counter update`, | ||
| ); | ||
| return; | ||
| } |
There was a problem hiding this comment.
When is this possible? are there re-orgs or subgraph corrections? do we need to reset baseline so that metrics aren't stalled when numbers dip below baseline?
There was a problem hiding this comment.
Yes, there can be subgraph corrections. Baseline is reset to current values in 710d49e
| * Get the current block number from the RPC provider | ||
| */ | ||
| async getBlockNumber(): Promise<number> { | ||
| return await this.rpcProvider.getBlockNumber(); |
There was a problem hiding this comment.
this block number will be different than the subgraph's indexed block-number.
There was a problem hiding this comment.
replaced it with subgraph's indexed block height - e761858
| const estimatedOverduePeriods = proofSets.reduce((acc, proofSet) => { | ||
| if (proofSet.maxProvingPeriod === 0n) { | ||
| return acc; | ||
| } | ||
| return acc + (blockNumberBigInt - (proofSet.nextDeadline + 1n)) / proofSet.maxProvingPeriod; | ||
| }, 0n); |
There was a problem hiding this comment.
the subgraph could be X blocks behind RPC chain head. we should compare overdue period based on the subgraph's own indexed block height.
There was a problem hiding this comment.
Now, I'm using subgraph's indexed block height instead of rpc chain head - e761858
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| private readonly providerCumulativeTotals: Map< | ||
| string, | ||
| { | ||
| faultedPeriods: bigint; | ||
| successPeriods: bigint; | ||
| } | ||
| >; | ||
|
|
||
| constructor( | ||
| private readonly configService: ConfigService<IConfig, true>, | ||
| private readonly walletSdkService: WalletSdkService, | ||
| private readonly pdpSubgraphService: PDPSubgraphService, | ||
| @InjectMetric("dataSetChallengeStatus") | ||
| private readonly dataSetChallengeStatusCounter: Counter, | ||
| ) { | ||
| this.providerCumulativeTotals = new Map(); | ||
| } |
There was a problem hiding this comment.
The providerCumulativeTotals Map grows indefinitely as provider addresses are added but never removed. If providers are dynamically added/removed from the testing provider list (e.g., providers being approved/unapproved, or configuration changes), stale entries will accumulate in memory over time.
Consider implementing cleanup logic to periodically remove entries for providers that are no longer in the active testing provider list. For example, at the start of pollDataRetention(), you could:
- Get the current set of provider addresses
- Remove any entries from
providerCumulativeTotalswhose addresses are not in the current set - This would prevent unbounded memory growth while maintaining correct baseline tracking for active providers.
There was a problem hiding this comment.
added stale provider clean up in aa0eff3
Summary
This PR adds data retention monitoring capabilities to the dealbot backend by integrating with the PDP (Proof of Data Possession) subgraph. It introduces a new job that polls provider data retention statistics every hour (default) and exposes them as Prometheus metrics.
Changes
PDPSubgraphServiceto query provider proof-set data from subgraphDataRetentionServicethat calculates estimated faulted and successful proving periods per providerdataSetChallengeStatuscounter metric with labelscheckType/providerId/providerStatus/value, where value issuccessorfault.data.retention.polljob queue integrated with pg-boss scheduler