Skip to content

Comments

fix: align retrieval checks with docs#299

Open
SgtPooki wants to merge 9 commits intomainfrom
fix/emit-all-retrieval-metrics
Open

fix: align retrieval checks with docs#299
SgtPooki wants to merge 9 commits intomainfrom
fix/emit-all-retrieval-metrics

Conversation

@SgtPooki
Copy link
Collaborator

Summary

Align retrievals with IPNI-based verification and block-fetch DAG validation, and remove deprecated direct piece retrieval.

Fixes #159

Changes

  • Remove direct piece retrieval strategy and references.
  • Use IPFS block fetch validation with ?format=raw for root + child blocks.
  • Add shared IPNI verification service for retrievals and data-storage checks.
  • Enforce retrieval precheck failures to fail the job before marking retrieval pending.
  • Randomize pg-boss retrieval selection across DEAL_CREATED deals.
  • Update retrieval docs and metrics examples to match current behavior.

Notes

  • Retrieval timeout now records a retrieval failure when prechecks passed.
  • This PR’s retrieval changes are scoped to the pg‑boss flow; legacy scheduling remains unchanged for now.

Copilot AI review requested due to automatic review settings February 20, 2026 03:13
@SgtPooki SgtPooki self-assigned this Feb 20, 2026
@FilOzzy FilOzzy added this to FOC Feb 20, 2026
@github-project-automation github-project-automation bot moved this to 📌 Triage in FOC Feb 20, 2026
@SgtPooki SgtPooki moved this from 📌 Triage to 🔎 Awaiting review in FOC Feb 20, 2026
@SgtPooki SgtPooki changed the title fix: align IPNI + block fetch validation with docs fix: align retrieval checks with docs Feb 20, 2026
Remove direct piece retrieval, use IPFS block fetch with DAG traversal and ?format=raw.
Share IPNI verification for retrievals and data-storage, enforce precheck failures,
randomize pg-boss retrieval selection for DEAL_CREATED, and update docs/metrics.

Fixes #159
@SgtPooki SgtPooki force-pushed the fix/emit-all-retrieval-metrics branch from 6d3496e to 7e6da7e Compare February 20, 2026 03:15
@SgtPooki
Copy link
Collaborator Author

FYI I pushed this to pre-release branch so it will deploy to dealbot.bt.sgtpooki.com so I can see how it runs overnight.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request aligns the retrieval check implementation with its documentation by removing deprecated direct piece retrieval, implementing IPFS block-fetch DAG validation, and adding a shared IPNI verification service. The changes ensure that retrieval checks now exclusively use IPFS gateway-based retrieval with full DAG traversal and CID verification, matching the documented behavior.

Changes:

  • Removed direct piece retrieval strategy (/piece endpoint) and all references
  • Implemented DAG-based block fetching with ?format=raw query parameter and hash verification for each block
  • Created shared IpniVerificationService for reuse across data-storage and retrieval checks
  • Added IPNI pre-verification to retrieval flow (pgboss mode only) that fails the job before marking retrieval pending
  • Enhanced retrieval deal selection with IPNI metadata filters and randomized RANDOM_PIECE_SIZES support
  • Updated documentation to reflect current implementation including IPNI verification details and removed TBD items

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
docs/checks/retrievals.md Updated retrieval documentation to describe IPFS block-fetch DAG validation, IPNI verification details, removed direct piece retrieval references, and marked implemented features as complete
apps/backend/src/retrieval/retrieval.service.ts Added IPNI pre-verification for pgboss mode, integrated IpniVerificationService, enhanced deal selection with IPNI metadata filters, improved timeout handling and metrics recording
apps/backend/src/retrieval/retrieval.service.spec.ts Updated tests to mock new dependencies (IpniVerificationService, DiscoverabilityCheckMetrics, ConfigService)
apps/backend/src/retrieval/retrieval.module.ts Added IpniModule import to provide IpniVerificationService
apps/backend/src/retrieval-addons/strategies/direct.strategy.ts Removed entire file - deprecated direct piece retrieval strategy
apps/backend/src/retrieval-addons/strategies/ipfs-block.strategy.ts Updated to use ?format=raw query parameter, added HTTP status code tracking for validation results
apps/backend/src/retrieval-addons/retrieval-addons.service.ts Removed DirectRetrievalStrategy registration, improved status code handling for validation errors
apps/backend/src/retrieval-addons/retrieval-addons.service.spec.ts Updated test to reflect removal of DirectRetrievalStrategy
apps/backend/src/retrieval-addons/retrieval-addons.module.ts Removed DirectRetrievalStrategy from providers
apps/backend/src/retrieval-addons/types.ts Added httpStatusCode field to ValidationResult type
apps/backend/src/metrics/dto/failed-retrievals.dto.ts Updated example URL from /piece/ to /ipfs/ endpoint
apps/backend/src/ipni/ipni.module.ts New module exporting IpniVerificationService
apps/backend/src/ipni/ipni-verification.service.ts New shared service for IPNI verification used by both deal-addons and retrieval checks
apps/backend/src/deal-addons/strategies/ipni.strategy.ts Refactored to use shared IpniVerificationService, removed buildExpectedProviderInfo helper function
apps/backend/src/deal-addons/strategies/ipni.strategy.spec.ts Updated tests to mock IpniVerificationService.verify instead of waitForIpniProviderResults
apps/backend/src/deal-addons/deal-addons.module.ts Added IpniModule import

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SgtPooki
Copy link
Collaborator Author

retrievals stopped running on dealbot.bt.sgtpooki.com

image

This is because the AND file_size IN (:...randomDatasetSizes) filter does not match any successful deals, since the successful deals currently save the size of the CAR FILE uploaded (which differs from the content size we generate), rather than the actual content size.

> kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
WITH base AS (
  SELECT *
  FROM deals
  WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
)
SELECT
  count(*) AS total,
  count(*) FILTER (WHERE status = 'deal_created') AS deal_created,
  count(*) FILTER (
    WHERE status = 'deal_created'
      AND metadata->'ipfs_pin'->>'enabled' = 'true'
  ) AS ipni_enabled,
  count(*) FILTER (
    WHERE status = 'deal_created'
      AND metadata->'ipfs_pin'->>'enabled' = 'true'
      AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL
  ) AS with_root
FROM base;
EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
 total | deal_created | ipni_enabled | with_root 
-------+--------------+--------------+-----------
   670 |          621 |          621 |       621
   

> kubectl exec -n dealbot deploy/dealbot-worker -- printenv RANDOM_DATASET_SIZES
Defaulted container "dealbot-worker" out of: dealbot-worker, wait-for-dealbot-backend (init)
10485760


> kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
-- Check distinct sizes for that SP (what the config must include)
SELECT DISTINCT file_size
FROM deals
WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
  AND status = 'deal_created'
  AND metadata->'ipfs_pin'->>'enabled' = 'true'
  AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL;

EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
 file_size 
-----------
  10486897
(1 row)


> kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
SELECT count(*) AS matches_with_size_filter
FROM deals
WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
  AND status = 'deal_created'
  AND metadata->'ipfs_pin'->>'enabled' = 'true'
  AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL
  AND file_size IN (10485760); -- replace with your configured sizes
EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
 matches_with_size_filter 
--------------------------
                        0
(1 row)

This was fixed with fb72510 (this PR) and pushed to pre-release

@SgtPooki
Copy link
Collaborator Author

retrievals are running again:

image

Copy link
Collaborator

@silent-cipher silent-cipher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through the retrieval docs, which mention removing testing via the /piece path.
However, we also support configurations where dealbot is explicitly set not to make deals with IPNI enabled. In those cases, there wouldn’t be a way to test retrievals at all and even deal status would be FAILED. This leaves us with a decision to make:

  • either we always require IPNI-enabled deals, or
  • we keep the direct retrieval method for testing non-IPNI deals and only use the IPNI block-fetch strategy when IPNI is enabled.

Curious what the intended direction is here.

Comment on lines 226 to 248
);
finalStatusEmitted = true;

return retrievals;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m a bit confused why this logic sits outside the try block.
If we end up with partial records, should we log a warning and then throw within the try so the catch block can handle recording the retrieval status?
Also, I would expect the check-duration observation to live in a finally block so it runs regardless of success or failure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wary to add check duration Ms for failed retrievals because that changes the meaning of the metric.. but i guess it is operationally accurate. we can split success/failure timings later if we need to

@SgtPooki
Copy link
Collaborator Author

SgtPooki commented Feb 23, 2026

However, we also support configurations where dealbot is explicitly set not to make deals with IPNI enabled. In those cases, there wouldn’t be a way to test retrievals at all and even deal status would be FAILED

@silent-cipher we've talked for a few months now about narrowing down the retrieval check functionality to just be for IPFS since that's what we want to assert for GA. See #228 and #164 (which solve #158, #159, #160) for more info.

Ultimately, IPNI verification should always be enabled. #157 was a step in this direction, but there's still a follow-up to remove that env var, since we should always be testing it. A "retrieval check" is defined in the docs now (linked above) as verifying that content can be retrieved from the SP via /ipfs url, and that an IPNI indexer has that SP listed as a provider.

There are other "retrieval" checks dealbot will be validating in the future. e.g. the PDP subgraph checks you're working on, which is part of GA, and some things that are not, i.e. #300

@SgtPooki SgtPooki mentioned this pull request Feb 23, 2026
3 tasks
@silent-cipher
Copy link
Collaborator

Got it. But ipni indexing isn’t guaranteed for every storage provider, right? I’m asking because serviceProviderRegistry exposes ipniIpfs ipniPiece and ipniPeerId flag for SPs, which suggests it’s optional.

@BigLep
Copy link
Contributor

BigLep commented Feb 23, 2026

Got it. But ipni indexing isn’t guaranteed for every storage provider, right? I’m asking because serviceProviderRegistry exposes ipniIpfs ipniPiece and ipniPeerId flag for SPs, which suggests it’s optional.

@silent-cipher : you're right it isn't guaranteed, but any SP that is going to be "approved" needs to support IPNI indexing and "standard IPFS retrieval". Dealbot's primary function right now is to assess which SPs should be approved, which is why it's fine for dealbot to check for IPNI indexing, IPFS retrieval, etc.

@timfong888
Copy link

Wanted to check in on random versus sorted by retrieval dates:

  • Linear doc: "queue sorted by date of last retrieval attempt — longest since last retrieval at top" (priority queue)
  • GitHub retrievals.md: "randomly selects" pieces

@timfong888
Copy link

Should these represent the same concept of a fixed size?

  • Production doc: RANDOM_PIECE_SIZES = 10485760 (10MB, single value)
  • data-storage.md default: 10240,10485760,104857600 (10 KiB, 10 MB, 100 MB)

Are these meant to be the same metric?

  • Production doc uses NUM_RETRIEVAL_CHECKS_PER_SP_PER_HOUR
  • retrievals.md uses MAX_RETRIEVAL_CHECKS_PER_SP_PER_CYCLE

SP Maintenance Window section
Do we have a fixed start and end time so SP's can plan?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

GA "Retrieval" check implementation and documentation

4 participants