chunk id does not match its hash 0000000000000000000000000000000000000000000000000000000000000000

 I'm getting the following error on `desync untar` from an R2-backed S3 store:

```
Error: s3+https://<LOCATION>.r2.cloudflarestorage.com/<BUCKET>/chunks?lookup=path:
chunk id <CHUNK ID> does not match its hash 0000000000000000000000000000000000000000000000000000000000000000: unexpected EOF
```

After the failure, fetching the same object directly from R2 with aws s3api get-object, running zstd -t, and hashing the decompressed bytes with SHA512/256 succeeds and matches <CHUNK ID>. So the object in R2 does not seem to be corrupt.

After some poking around, I made a debug fork (https://github.com/rachtsingh/desync/tree/debug-r2-chunk-validation) that adds more information about the error and retries validation/decompression failures in S3 Store.GetChunk. That seems to fix the issue:

```
desync-debug/desync untar -n 32 -i --no-same-owner --no-same-permissions \
  -s 's3+https://<LOCATION>.r2.cloudflarestorage.com/<BUCKET>/chunks?lookup=path' \
  's3+https://<LOCATION>.r2.cloudflarestorage.com/<BUCKET>/indexes/<EXAMPLE>.caidx?lookup=path' \
  <DEST DIR>

desync debug: invalid chunk <CHUNK ID> from s3 object chunks/9046/<CHUNK ID>.cacnk after reading 720896 bytes attempt 1/4:
chunk id <CHUNK ID> does not match its hash 0000000000000000000000000000000000000000000000000000000000000000: unexpected EOF

desync debug: recovered chunk <CHUNK ID> from s3 object chunks/9046/<CHUNK ID>.cacnk after 2 attempts, final read 1620247 bytes
```

The same object key produced an unexpected EOF after reading 720896 bytes, then succeeded on retry with 1620247 bytes read. I'm not sure if the short read is coming from minio-go, Go's HTTP transport, Cloudflare R2, or something else. But desync currently reports it as a hash mismatch against the zero chunk ID, which hides the underlying decompression/read error. I ran this a few times and ran into errors with 4 different chunks, so probably not a bad chunk.

Would it make sense for S3Store.GetChunk to treat NewChunkFromStorage validation/decompression failures as retryable under the existing --error-retry policy? Ideally it would also preserve the underlying error (unexpected EOF) in the final message if all retries fail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunk id does not match its hash 0000000000000000000000000000000000000000000000000000000000000000 #334

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

chunk id does not match its hash 0000000000000000000000000000000000000000000000000000000000000000 #334

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions