Skip to content

Documenting "unconfirmed block transfer" history #123

@rvagg

Description

@rvagg

This is now our next highest transfer error after #120. But it's also new. Discovered last week, validated here with #121 and turned from a silent error success into an explicit failure coming out of Filclient with application-research/filclient#101. So now we get data transfer failed: unconfirmed block transfer in the events database in event_details>error.

Summary of problem: a data-transfer completes, giving no indication of errors or problems, but the blockstore doesn't have anything new. Graphsync is given the blockstore to store new blocks in and check for the pre-existence of blocks for a given transfer so it doesn't need to double-up. This error is triggered when we finish a data transfer and check the blockstore for the original requested payload CID and it isn't there. So Graphsync isn't receiving it, and it shouldn't be skipping it either.

Debugging such a transfer, we can see that graphsync gets two requests to deal with:

  1. GraphSyncResponse<id=df88263a-6e62-49db-8983-82ebd1f5d848, status=15, exts=fil/data-transfer/incoming-request/1.1|fil/data-transfer/1.1|>
  2. GraphSyncResponse<id=df88263a-6e62-49db-8983-82ebd1f5d848, status=35, exts=>

i.e. RequestPaused then RequestCancelled, and no blocks get mentioned or transferred at all for these. There’s no partial, completed, failure, or anything else.

@hannahhoward's initial take is:

I bet good money on the root of this problem: unseal errors. When retrieval markets code encounters an error, it cancels the data transfer. Based on where this error occurs in the sequence, it’s almost surely an unseal error.

There’s nothing wrong with data transfer considering the transfer successful on cancel per se ... But retrieval code should send a new voucher result explaining it was a failure first so the other side can pick up what happened. Unfortunately v1 retrieval code has no way to directly send an arbitrary voucher result. ...

... some relevant lines of code to help explain:
https://github.com/filecoin-project/go-fil-markets/blob/ad3d7d3af4351e11544db840665a6a2f34433884/retrievalmarket/impl/providerstates/provider_fsm.go#L41
https://github.com/filecoin-project/go-fil-markets/blob/ad3d7d3af4351e11544db840665a6a2f34433884/retrievalmarket/impl/providerstates/provider_fsm.go#L131
https://github.com/filecoin-project/go-fil-markets/blob/9859795f76ad654d46d460acdc829731f3d42803/retrievalmarket/impl/providerstates/provider_states.go#L69


For now we've just turned it into an explicit failure on the client side, but we need to understand the causes and figure out if we can decrease the number of these we're encountering.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions