fix(test): resolve rate-limit failures when downloading datasets by shiva-istari · Pull Request #9658 · dgraph-io/dgraph

shiva-istari · 2026-03-16T10:13:25Z

CI workflows on forked PRs fail with HTTP 429 (Too Many Requests) when downloading benchmark datasets from GitHub, because shared-IP runners hit rate limits on unauthenticated requests. This fix replaces GitHub web URLs with direct CDN URLs, adds retry logic with exponential backoff, limits concurrent downloads to 5, and cleans up partial files on failure. It also adds GitHub Actions caching to avoid re-downloading data files on repeat runs. Together these changes make benchmark data downloads reliable for both forked and internal CI workflows.

mlwelles

Additional findings beyond the code duplication and cache key concerns already noted:

Note: t/t.go:1133 still has var suffix = "?raw=true" which gets appended to media.githubusercontent.com URLs via baseUrl + name + suffix. The ?raw=true parameter is a GitHub blob-URL convention and is unnecessary for direct CDN URLs. Not harmful, but misleading — should be removed for consistency with the other URL updates in this PR.

## Summary - Bumps Trivy from `0.65.0` to `0.69.3` in `.trunk/trunk.yaml` - Trivy `v0.65.0` was never published — the version scheme jumped from `v0.26.0` to `v0.69.2`, causing a 404 when trunk tries to download the hermetic tool binary - This fixes the "Trunk Code Quality / Check" CI failure seen on #9658 and all other PRs

mlwelles

Re-review against 4ee619c. All eight threads from prior rounds are resolved cleanly — the new benchdata package matches what the last comment described, errors now propagate via errgroup, the --keep-data flag is wired through, the cache key is bound to benchmark-data-version, and LFS detection is dynamic via the Contents API. Nice work.

A few new findings on the introduced code; nothing blocking, but worth addressing before merge:

🟡 No tests on benchdata despite hand-rolled LFS pointer parsing, retry/backoff, gzip-magic validation, and ref resolution priority — these are exactly the helpers that silently break.
🟡 EnsureFiles ignores the context returned by errgroup.WithContext — first failure won't cancel sibling downloads.
🟡 No per-attempt timeout on http.DefaultClient — a stalled connection hangs CI indefinitely.
🔵 A few smaller consistency / correctness issues inline.

Inline comments follow.

mlwelles · 2026-05-07T16:21:33Z

+	"time"
+
+	"golang.org/x/sync/errgroup"
+)


risk: no _test.go in this package. The hand-rolled LFS pointer parser, retry/backoff, gzip-magic check, and DataRef precedence are pure functions ideal for table tests. Without them, any regression to (e.g.) parseLFSPointer for a pointer with CRLF line endings, or a DataRef precedence change, will fail silently in CI under cache-hit and surface only on the next miss — which by then nobody is watching for.

At minimum: parseLFSPointer, validateFile, verifyChecksum, DataRef, and an httptest-backed resolveAndDownload round-trip.

mlwelles · 2026-05-07T16:21:33Z

+		paths[i] = filepath.Join(destDir, string(f))
+	}
+
+	g, _ := errgroup.WithContext(context.Background())


risk: ctx is discarded with _. With errgroup.WithContext, that context is cancelled when any goroutine returns a non-nil error — but downloadToFile calls http.NewRequest, not http.NewRequestWithContext, so a sibling failure won't abort in-flight downloads of large LFS blobs. On a 21M.rdf.gz failure you'll still wait for the other four to finish (or time out).

Fix: capture the ctx, plumb it into downloadToFile via http.NewRequestWithContext(ctx, ...), and into fetchRawBlob similarly. resolveAndDownload should accept ctx as its first arg.

mlwelles · 2026-05-07T16:21:33Z

+		req.Header.Set("Authorization", "Bearer "+token)
+	}
+
+	resp, err := http.DefaultClient.Do(req)


risk: http.DefaultClient has no timeout. A stalled TCP connection (no FIN, no RST) can wedge a download forever — the CI job hangs until the workflow-level timeout, far past where a useful error would appear.

Also — http.DefaultClient is shared global state; if anything else in the binary tweaks it, this download silently inherits that. Use a package-private client:

var httpClient = &http.Client{Timeout: 10 * time.Minute}

Or better, derive per-attempt timeout from a context with deadline.

mlwelles · 2026-05-07T16:21:33Z

+		return nil, err
+	}
+	// Read exactly 1KB. If there's more data, it's not an LFS pointer.
+	return buf[:n], nil


nit: the comment on L259 says "if there's more data, it's not an LFS pointer" but the code returns buf[:n] (1KB) regardless — parseLFSPointer will then check the prefix, find no LFS magic in the first 1KB of (e.g.) a small JSON schema, and correctly return nil. That works, but it's coincidental, not enforced. If a non-LFS file's first 1KB ever happens to start with the LFS magic prefix (extremely unlikely but not impossible for a corrupted commit), we'd misidentify it.

Consider: peek with bufio.Reader.Peek(maxPointerSize) and explicitly require the response body to be exactly the pointer length when LFS-detected. Or check Content-Length header, which the Contents API sets correctly.

mlwelles · 2026-05-07T16:21:33Z

+
+	lfsInfo, err := detectLFS(path, ref)
+	if err != nil {
+		// GitHub API unavailable (rate limit, network). Try media URL first


risk: when detectLFS fails (rate limit / network), the code falls back to trying both CDN URLs sequentially. That doubles the request count on the very condition (rate limiting) the PR is trying to fix. If the GitHub Contents API is rate-limited, media.githubusercontent.com may also be — you'll burn 2× retries × 3 attempts × N files before failing.

Safer: distinguish 403/429 (rate limit, abort) from 5xx/network (transient, fall back). A 403 from the API means you don't have an authoritative answer about LFS-tracking; falling back is wrong on first principles.

mlwelles · 2026-05-07T16:21:34Z

+	if bytes.HasPrefix(header, lfsPointerPrefix) {
+		return fmt.Errorf("file is a Git LFS pointer, not actual content")
+	}
+	if strings.HasSuffix(fpath, ".gz") && (n < 2 || !bytes.HasPrefix(header, gzipMagic)) {


nit: gzip-magic check is strings.HasSuffix(fpath, ".gz"). The TestDataFile constants encode this perfectly — a check on the filename string is one indirection too many. Consider exposing a (f TestDataFile) IsGzip() bool method or registering content-type metadata next to repoFilePath. This also lets you validate .rdf files (they should at least be UTF-8) and .schema files separately, instead of "everything not-gz is fine".

mlwelles · 2026-05-07T16:21:34Z

+			return ref
+		}
+	}
+	return "main"


nit: silent fallback to "main" defeats the reproducibility goal of having a benchmark-data-version file. If the file exists but is empty/whitespace, falling back to main will silently use whatever HEAD points to. Prefer logging a warning when the file exists but yields no usable ref, so cache invalidation surprises (cache key from the file, but downloads from main) are discoverable.

mlwelles · 2026-05-07T16:21:34Z

+	return nil
+}
+
+func pkgDir() string {


nit: pkgDir() uses runtime.Caller to find benchmark-data-version next to the source file. This breaks when the binary is built with -trimpath or shipped without source — thisFile becomes "benchdata/benchdata.go" (relative), and os.ReadFile will resolve it against CWD, which is whatever shell happened to invoke ./t.

Safer: //go:embed benchmark-data-version and read from the embedded FS. That also pins the version into the binary, which is exactly what you want for reproducibility.

mlwelles · 2026-05-07T16:21:34Z


 	err := run()
-	_ = os.RemoveAll(*tmp)
+	if !*keepData && *tmp != "" {


nit: *tmp != "" guard is correct but the contract is murky — if run() populated *tmp then errored out before downloading anything, we still skip cleanup with --keep-data but otherwise remove it. That's the right behavior. However, if pflag.Parse() panics (malformed flag) the cleanup never runs at all. Consider defer after parse:

pflag.Parse() defer func() { if !*keepData && *tmp != "" { _ = os.RemoveAll(*tmp) } }()

Minor since the panic path is rare, but the defer form is also more idiomatic.

mlwelles · 2026-05-07T16:21:34Z

-
-	if _, err := cmd.CombinedOutput(); err != nil {
-		return fmt.Errorf("error downloading file %s: %w", fname, err)
+	paths, err := benchdata.EnsureFiles(datasetFilesPath, benchdata.DataRef(""), benchdata.TestDataFile(filename))


nit: passing benchdata.DataRef("") here means dgraphtest callers can never override the ref, and silently get whichever ref the env var or version file resolves to. That's fine for now — but worth a comment noting the ref source isn't pluggable from this entry point, so future readers don't assume it is.

shiva-istari requested a review from a team as a code owner March 16, 2026 10:13

github-actions Bot added area/testing Testing related issues area/integrations Related to integrations with other projects. go Pull requests that update Go code labels Mar 16, 2026

This comment has been minimized.

Sign in to view

shiva-istari force-pushed the shiva/ci-downloads branch from adc1378 to 12d07b5 Compare March 16, 2026 10:45

mwelles-istari reviewed Mar 16, 2026

View reviewed changes

Comment thread t/t.go Outdated

mwelles-istari suggested changes Mar 16, 2026

View reviewed changes

Comment thread .github/workflows/ci-dgraph-integration2-tests.yml Outdated

mlwelles requested changes Mar 16, 2026

View reviewed changes

Comment thread dgraphtest/load.go Outdated

Comment thread t/t.go

Comment thread t/t.go

mlwelles mentioned this pull request Mar 17, 2026

fix(ci): bump trivy to v0.69.3 to resolve 404 download failure #9660

Merged

shiva-istari force-pushed the shiva/ci-downloads branch from 12d07b5 to 8175ecc Compare March 20, 2026 09:50

shiva-istari requested a review from mlwelles March 23, 2026 04:42

mwelles-istari suggested changes Mar 26, 2026

View reviewed changes

Comment thread t/t.go Outdated

Comment thread t/t.go Outdated

Comment thread t/t.go Outdated

This comment has been minimized.

Sign in to view

shiva-istari force-pushed the shiva/ci-downloads branch from 1515821 to 8e86446 Compare March 30, 2026 06:07

matthewmcneely force-pushed the shiva/ci-downloads branch from 8e86446 to ae0d4f7 Compare March 30, 2026 22:15

shiva-istari requested a review from mwelles-istari April 1, 2026 05:51

shiva-istari force-pushed the shiva/ci-downloads branch from ae0d4f7 to ebcd77c Compare April 6, 2026 03:45

This comment has been minimized.

Sign in to view

Shivaji Kharse added 3 commits May 7, 2026 12:15

resolve rate-limit failures when downloading datasets

5d26f66

resolve review comments

159c608

resolved review comments

4ee619c

matthewmcneely force-pushed the shiva/ci-downloads branch from ebcd77c to 4ee619c Compare May 7, 2026 16:15

mlwelles reviewed May 7, 2026

View reviewed changes

Conversation

shiva-istari commented Mar 16, 2026

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

mlwelles left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment has been minimized.

This comment has been minimized.

mlwelles left a comment

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mlwelles May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading

mlwelles May 7, 2026 •

edited

Loading