Conversation
Remove direct piece retrieval, use IPFS block fetch with DAG traversal and ?format=raw. Share IPNI verification for retrievals and data-storage, enforce precheck failures, randomize pg-boss retrieval selection for DEAL_CREATED, and update docs/metrics. Fixes #159
6d3496e to
7e6da7e
Compare
|
FYI I pushed this to |
There was a problem hiding this comment.
Pull request overview
This pull request aligns the retrieval check implementation with its documentation by removing deprecated direct piece retrieval, implementing IPFS block-fetch DAG validation, and adding a shared IPNI verification service. The changes ensure that retrieval checks now exclusively use IPFS gateway-based retrieval with full DAG traversal and CID verification, matching the documented behavior.
Changes:
- Removed direct piece retrieval strategy (
/pieceendpoint) and all references - Implemented DAG-based block fetching with
?format=rawquery parameter and hash verification for each block - Created shared
IpniVerificationServicefor reuse across data-storage and retrieval checks - Added IPNI pre-verification to retrieval flow (pgboss mode only) that fails the job before marking retrieval pending
- Enhanced retrieval deal selection with IPNI metadata filters and randomized RANDOM_PIECE_SIZES support
- Updated documentation to reflect current implementation including IPNI verification details and removed TBD items
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docs/checks/retrievals.md | Updated retrieval documentation to describe IPFS block-fetch DAG validation, IPNI verification details, removed direct piece retrieval references, and marked implemented features as complete |
| apps/backend/src/retrieval/retrieval.service.ts | Added IPNI pre-verification for pgboss mode, integrated IpniVerificationService, enhanced deal selection with IPNI metadata filters, improved timeout handling and metrics recording |
| apps/backend/src/retrieval/retrieval.service.spec.ts | Updated tests to mock new dependencies (IpniVerificationService, DiscoverabilityCheckMetrics, ConfigService) |
| apps/backend/src/retrieval/retrieval.module.ts | Added IpniModule import to provide IpniVerificationService |
| apps/backend/src/retrieval-addons/strategies/direct.strategy.ts | Removed entire file - deprecated direct piece retrieval strategy |
| apps/backend/src/retrieval-addons/strategies/ipfs-block.strategy.ts | Updated to use ?format=raw query parameter, added HTTP status code tracking for validation results |
| apps/backend/src/retrieval-addons/retrieval-addons.service.ts | Removed DirectRetrievalStrategy registration, improved status code handling for validation errors |
| apps/backend/src/retrieval-addons/retrieval-addons.service.spec.ts | Updated test to reflect removal of DirectRetrievalStrategy |
| apps/backend/src/retrieval-addons/retrieval-addons.module.ts | Removed DirectRetrievalStrategy from providers |
| apps/backend/src/retrieval-addons/types.ts | Added httpStatusCode field to ValidationResult type |
| apps/backend/src/metrics/dto/failed-retrievals.dto.ts | Updated example URL from /piece/ to /ipfs/ endpoint |
| apps/backend/src/ipni/ipni.module.ts | New module exporting IpniVerificationService |
| apps/backend/src/ipni/ipni-verification.service.ts | New shared service for IPNI verification used by both deal-addons and retrieval checks |
| apps/backend/src/deal-addons/strategies/ipni.strategy.ts | Refactored to use shared IpniVerificationService, removed buildExpectedProviderInfo helper function |
| apps/backend/src/deal-addons/strategies/ipni.strategy.spec.ts | Updated tests to mock IpniVerificationService.verify instead of waitForIpniProviderResults |
| apps/backend/src/deal-addons/deal-addons.module.ts | Added IpniModule import |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
retrievals stopped running on dealbot.bt.sgtpooki.com
This is because the > kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
WITH base AS (
SELECT *
FROM deals
WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
)
SELECT
count(*) AS total,
count(*) FILTER (WHERE status = 'deal_created') AS deal_created,
count(*) FILTER (
WHERE status = 'deal_created'
AND metadata->'ipfs_pin'->>'enabled' = 'true'
) AS ipni_enabled,
count(*) FILTER (
WHERE status = 'deal_created'
AND metadata->'ipfs_pin'->>'enabled' = 'true'
AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL
) AS with_root
FROM base;
EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
total | deal_created | ipni_enabled | with_root
-------+--------------+--------------+-----------
670 | 621 | 621 | 621
> kubectl exec -n dealbot deploy/dealbot-worker -- printenv RANDOM_DATASET_SIZES
Defaulted container "dealbot-worker" out of: dealbot-worker, wait-for-dealbot-backend (init)
10485760
> kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
-- Check distinct sizes for that SP (what the config must include)
SELECT DISTINCT file_size
FROM deals
WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
AND status = 'deal_created'
AND metadata->'ipfs_pin'->>'enabled' = 'true'
AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL;
EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
file_size
-----------
10486897
(1 row)
> kubectl exec -n dealbot dealbot-postgres-0 -i -- psql -U dealbot -d filecoin_dealbot <<'EOSQL'
SELECT count(*) AS matches_with_size_filter
FROM deals
WHERE sp_address = '0xCb9e86945cA31E6C3120725BF0385CBAD684040c'
AND status = 'deal_created'
AND metadata->'ipfs_pin'->>'enabled' = 'true'
AND metadata->'ipfs_pin'->>'rootCID' IS NOT NULL
AND file_size IN (10485760); -- replace with your configured sizes
EOSQL
Defaulted container "postgres" out of: postgres, init-pgdata (init)
matches_with_size_filter
--------------------------
0
(1 row)This was fixed with |
silent-cipher
left a comment
There was a problem hiding this comment.
I went through the retrieval docs, which mention removing testing via the /piece path.
However, we also support configurations where dealbot is explicitly set not to make deals with IPNI enabled. In those cases, there wouldn’t be a way to test retrievals at all and even deal status would be FAILED. This leaves us with a decision to make:
- either we always require IPNI-enabled deals, or
- we keep the direct retrieval method for testing non-IPNI deals and only use the IPNI block-fetch strategy when IPNI is enabled.
Curious what the intended direction is here.
| ); | ||
| finalStatusEmitted = true; | ||
|
|
||
| return retrievals; |
There was a problem hiding this comment.
I’m a bit confused why this logic sits outside the try block.
If we end up with partial records, should we log a warning and then throw within the try so the catch block can handle recording the retrieval status?
Also, I would expect the check-duration observation to live in a finally block so it runs regardless of success or failure.
There was a problem hiding this comment.
I was wary to add check duration Ms for failed retrievals because that changes the meaning of the metric.. but i guess it is operationally accurate. we can split success/failure timings later if we need to
@silent-cipher we've talked for a few months now about narrowing down the retrieval check functionality to just be for IPFS since that's what we want to assert for GA. See #228 and #164 (which solve #158, #159, #160) for more info. Ultimately, IPNI verification should always be enabled. #157 was a step in this direction, but there's still a follow-up to remove that env var, since we should always be testing it. A "retrieval check" is defined in the docs now (linked above) as verifying that content can be retrieved from the SP via /ipfs url, and that an IPNI indexer has that SP listed as a provider. There are other "retrieval" checks dealbot will be validating in the future. e.g. the PDP subgraph checks you're working on, which is part of GA, and some things that are not, i.e. #300 |
|
Got it. But ipni indexing isn’t guaranteed for every storage provider, right? I’m asking because |
@silent-cipher : you're right it isn't guaranteed, but any SP that is going to be "approved" needs to support IPNI indexing and "standard IPFS retrieval". Dealbot's primary function right now is to assess which SPs should be approved, which is why it's fine for dealbot to check for IPNI indexing, IPFS retrieval, etc. |
|
Wanted to check in on random versus sorted by retrieval dates:
|
|
Should these represent the same concept of a fixed size?
Are these meant to be the same metric?
SP Maintenance Window section |


Summary
Align retrievals with IPNI-based verification and block-fetch DAG validation, and remove deprecated direct piece retrieval.
Fixes #159
Changes
?format=rawfor root + child blocks.DEAL_CREATEDdeals.Notes