Skip to content

Comments

feat(backend): add data retention polling with pdp subgraph integration#286

Open
silent-cipher wants to merge 16 commits intomainfrom
feat/backend/data-retention-metrics
Open

feat(backend): add data retention polling with pdp subgraph integration#286
silent-cipher wants to merge 16 commits intomainfrom
feat/backend/data-retention-metrics

Conversation

@silent-cipher
Copy link
Collaborator

@silent-cipher silent-cipher commented Feb 17, 2026

Summary

This PR adds data retention monitoring capabilities to the dealbot backend by integrating with the PDP (Proof of Data Possession) subgraph. It introduces a new job that polls provider data retention statistics every hour (default) and exposes them as Prometheus metrics.

Changes

  • PDP Subgraph Integration: New PDPSubgraphService to query provider proof-set data from subgraph
  • Data Retention Service: DataRetentionService that calculates estimated faulted and successful proving periods per provider
  • Prometheus Metrics: New dataSetChallengeStatus counter metric with labels checkType/providerId/providerStatus/value, where value is success or fault.
  • Scheduled Job: New data.retention.poll job queue integrated with pg-boss scheduler

@FilOzzy FilOzzy added this to FOC Feb 17, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 17, 2026
@silent-cipher silent-cipher self-assigned this Feb 17, 2026
@rjan90 rjan90 moved this from 📌 Triage to ⌨️ In Progress in FOC Feb 18, 2026
@rjan90 rjan90 added this to the M4.1: mainnet ready milestone Feb 18, 2026
@silent-cipher silent-cipher marked this pull request as ready for review February 18, 2026 17:43
Copilot AI review requested due to automatic review settings February 18, 2026 17:43
@silent-cipher
Copy link
Collaborator Author

PR is open for review. However, it shouldn’t be merged until subgraph data is available (FilOzone/pdp-explorer#86).

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds backend support for polling PDP (Proof of Data Possession) subgraph data on a schedule to derive per-provider data-retention statistics and emit them as Prometheus metrics, integrated into the existing pg-boss job system and configuration/docs.

Changes:

  • Introduces PDPSubgraphService (+ module, query, response validation) and associated tests.
  • Adds DataRetentionService (+ module, tests) and wires a new data.retention.poll pg-boss job/schedule into JobsService.
  • Extends configuration and docs with PDP_SUBGRAPH_ENDPOINT and DATA_RETENTION_POLL_INTERVAL_SECONDS, plus a new Prometheus counter registration.

Reviewed changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
docs/environment-variables.md Documents new env vars and scheduling option for data retention polling.
apps/backend/src/wallet-sdk/wallet-sdk.service.ts Adds getBlockNumber() helper used by data retention polling.
apps/backend/src/wallet-sdk/wallet-sdk.service.spec.ts Updates test config to include new blockchain config field.
apps/backend/src/pdp-subgraph/types.ts Adds Joi-based validation/transforms for subgraph response types.
apps/backend/src/pdp-subgraph/types.spec.ts Adds unit tests for subgraph response validation.
apps/backend/src/pdp-subgraph/queries.ts Adds GraphQL query for providers and proof sets.
apps/backend/src/pdp-subgraph/pdp-subgraph.service.ts Implements subgraph fetch with batching, rate limiting, retries, validation.
apps/backend/src/pdp-subgraph/pdp-subgraph.service.spec.ts Adds tests for subgraph service fetch/retry/validation behavior.
apps/backend/src/pdp-subgraph/pdp-subgraph.module.ts Exposes PDPSubgraphService via Nest module.
apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts Registers a new counter for data retention / dataset challenge status.
apps/backend/src/jobs/repositories/job-schedule.repository.ts Adds queue-name mapping for data_retention_poll in pg-boss job state queries.
apps/backend/src/jobs/jobs.service.ts Wires new data.retention.poll worker + schedule row + queue mapping + metrics tracking.
apps/backend/src/jobs/jobs.service.spec.ts Updates job service tests for new dependency and new worker/schedule expectations.
apps/backend/src/jobs/jobs.module.ts Imports DataRetentionModule so the jobs worker can execute the poller.
apps/backend/src/jobs/job-queues.ts Adds DATA_RETENTION_POLL_QUEUE constant.
apps/backend/src/database/entities/job-schedule-state.entity.ts Extends JobType union with data_retention_poll.
apps/backend/src/data-retention/data-retention.service.ts Implements polling logic, delta computation, and Prometheus counter increments.
apps/backend/src/data-retention/data-retention.service.spec.ts Adds tests for polling behavior, batching, deltas, and edge cases.
apps/backend/src/data-retention/data-retention.module.ts Exposes DataRetentionService via Nest module.
apps/backend/src/config/app.config.ts Adds env validation + config fields for PDP subgraph and poll interval.
apps/backend/README.md Documents PDP_SUBGRAPH_ENDPOINT and DATA_RETENTION_POLL_INTERVAL_SECONDS.
apps/backend/.env.example Adds example values for new env vars.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@SgtPooki SgtPooki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments.

We should add a few tests too:

  1. cover how we handle subgraph lag vs rpc block height
  2. test large deltas and assert we are handling them safely.

`Negative delta detected for provider ${address} (faulted: ${faultedDelta}, success: ${successDelta}); skipping counter update`,
);
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is this possible? are there re-orgs or subgraph corrections? do we need to reset baseline so that metrics aren't stalled when numbers dip below baseline?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there can be subgraph corrections. Baseline is reset to current values in 710d49e

* Get the current block number from the RPC provider
*/
async getBlockNumber(): Promise<number> {
return await this.rpcProvider.getBlockNumber();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block number will be different than the subgraph's indexed block-number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replaced it with subgraph's indexed block height - e761858

Comment on lines +123 to +128
const estimatedOverduePeriods = proofSets.reduce((acc, proofSet) => {
if (proofSet.maxProvingPeriod === 0n) {
return acc;
}
return acc + (blockNumberBigInt - (proofSet.nextDeadline + 1n)) / proofSet.maxProvingPeriod;
}, 0n);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the subgraph could be X blocks behind RPC chain head. we should compare overdue period based on the subgraph's own indexed block height.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now, I'm using subgraph's indexed block height instead of rpc chain head - e761858

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 22 to 38
private readonly providerCumulativeTotals: Map<
string,
{
faultedPeriods: bigint;
successPeriods: bigint;
}
>;

constructor(
private readonly configService: ConfigService<IConfig, true>,
private readonly walletSdkService: WalletSdkService,
private readonly pdpSubgraphService: PDPSubgraphService,
@InjectMetric("dataSetChallengeStatus")
private readonly dataSetChallengeStatusCounter: Counter,
) {
this.providerCumulativeTotals = new Map();
}
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The providerCumulativeTotals Map grows indefinitely as provider addresses are added but never removed. If providers are dynamically added/removed from the testing provider list (e.g., providers being approved/unapproved, or configuration changes), stale entries will accumulate in memory over time.

Consider implementing cleanup logic to periodically remove entries for providers that are no longer in the active testing provider list. For example, at the start of pollDataRetention(), you could:

  1. Get the current set of provider addresses
  2. Remove any entries from providerCumulativeTotals whose addresses are not in the current set
  3. This would prevent unbounded memory growth while maintaining correct baseline tracking for active providers.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added stale provider clean up in aa0eff3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ⌨️ In Progress

Development

Successfully merging this pull request may close these issues.

3 participants