Retry S3 chunk validation failures and preserve the underlying error by folbricht · Pull Request #335 · folbricht/desync

folbricht · 2026-05-09T19:07:29Z

Fixes #334.

Problem

desync untar against an S3-compatible store (Cloudflare R2 in the report) intermittently failed with:

chunk id <ID> does not match its hash 0000000000000000000000000000000000000000000000000000000000000000: unexpected EOF

The object in the store is fine. The real cause is a transient short read of the chunk body somewhere between R2 and minio-go/the HTTP transport. When that happens, io.ReadAll in S3Store.GetChunk returns truncated bytes, NewChunkFromStorage → Chunk.ID() → Chunk.Data() fails to zstd-decode the partial stream, but Chunk.ID() discards that error and returns the zero ChunkID{} — so it's reported as a hash mismatch against 0000…, hiding the real error. And GetChunk's --error-retry loop only covered GetObject and io.ReadAll, not NewChunkFromStorage, so the transient failure was fatal.

Changes

errors.go — ChunkInvalid gets an optional Err (underlying cause) and an Unwrap(); Error() reports the cause when present instead of the bogus zero-hash message.
chunk.go — NewChunkFromStorage / NewChunkWithID call c.Data() explicitly and return ChunkInvalid{ID: id, Err: err} on a decode failure, rather than letting ID() swallow it. Still ChunkInvalid, so RepairableCache and verify --repair keep self-healing such chunks.
s3.go — S3Store.GetChunk retries NewChunkFromStorage validation/decompression failures under the existing --error-retry policy (same backoff as the other retry blocks), logging each retry, and closes the object reader before looping so deferred closes don't stack.
Tests — new s3ErrCorruptBody mode in the fake S3 server (well-formed HTTP response with a truncated body — the issue's scenario) plus corrupt_body_fail / corrupt_body_recover subtests, and a new chunk_test.go covering the ChunkInvalid.Err paths.

go build ./..., go vet ./..., gofmt -l ., and go test ./... all pass.

Out of scope

gcs.go and sftp.go GetChunk have no --error-retry loop at all, and remotehttp.go has one but doesn't cover NewChunkFromStorage. They all benefit from the improved error message here, but adding retry loops there is a separate change — happy to include it if wanted.

A transient short read of a chunk body from an S3-compatible store (e.g. Cloudflare R2) leaves S3Store.GetChunk with truncated data that fails to decompress. Chunk.ID() discarded that error and returned the zero ChunkID, so the failure surfaced as the misleading "chunk id <id> does not match its hash 0000...0000", and GetChunk's --error-retry loop didn't cover it, making the transient failure fatal. - ChunkInvalid now carries an optional underlying cause (Err) and implements Unwrap(); its message reports the cause when present. - NewChunkFromStorage / NewChunkWithID capture the decode error instead of producing a bogus zero-hash mismatch, still returning ChunkInvalid so RepairableCache and 'verify --repair' keep handling such chunks. - S3Store.GetChunk retries NewChunkFromStorage failures under the existing --error-retry policy and closes the object reader before retrying. - Tests: new s3ErrCorruptBody fake-server mode plus corrupt_body_fail / corrupt_body_recover subtests, and a chunk_test.go covering ChunkInvalid.Err. Fixes #334

folbricht mentioned this pull request May 11, 2026

chunk id does not match its hash 0000000000000000000000000000000000000000000000000000000000000000 #334

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry S3 chunk validation failures and preserve the underlying error#335

Retry S3 chunk validation failures and preserve the underlying error#335
folbricht wants to merge 1 commit into
masterfrom
fix-334-s3-chunk-validation-retry

folbricht commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

folbricht commented May 9, 2026

Problem

Changes

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant