Skip to content

download_model.sh: prefer huggingface CLI when available#248

Open
siraustin wants to merge 1 commit into
antirez:mainfrom
siraustin:fix-xet-bridge-download
Open

download_model.sh: prefer huggingface CLI when available#248
siraustin wants to merge 1 commit into
antirez:mainfrom
siraustin:fix-xet-bridge-download

Conversation

@siraustin
Copy link
Copy Markdown

@siraustin siraustin commented May 25, 2026

Closes #249.

Problem

./download_model.sh pro-imatrix fails when fetching the new ~430 GB PRO bundle:

Downloading DeepSeek-V4-Pro-IQ2XXS-w2Q2K-AProjQ8-SExpQ8-OutQ8-Instruct-imatrix.gguf
...
curl: (22) The requested URL returned error: 400

Hugging Face's xet bridge rejects range-less GETs above a certain size threshold and returns a generic <h1>400</h1> HTML page. Per #249's table, the threshold sits between q4-imatrix (164 GB, fine) and pro-imatrix (464 GB, 400). Empirically on the PRO file:

  • curl --range 0-67108863 (64 MB chunk) → 206 Partial Content, downloads fine
  • curl with no Range header → 400 from the xet bridge after two redirects

The redirect chain is HF resolve → xethub bridge → S3 presigned URL with X-Xet-Cas-Uid; the bridge itself is what 400s, not S3. curl -C - doesn't help on a fresh download because there's no local partial to resume from. The smaller Flash files don't trip the threshold; PRO surfaces it.

Fix

Prefer the official hf CLI (or huggingface-cli if it's the older binary on the user's PATH) — it is xet-bridge native and handles chunked / ranged retrieval correctly. Fall through to the existing curl flow unchanged when no HF CLI is installed, so the existing Flash-download experience is preserved.

The token is passed via the HF_TOKEN env rather than --token so it doesn't appear in ps output of the child process.

Test

  • Patched script on mtp (small, cheap to interrupt) → hf selected, .incomplete staging file created, download begins; killed after a few seconds without side effects.
  • Patched script on pro-imatrix → in-flight on a 512 GB M3 Ultra at time of writing, against this exact code path.
  • Falls back to curl cleanly when neither hf nor huggingface-cli is on PATH.

Notes

  • Minimal patch: 16 additions, 0 deletions, no behavior change to the existing curl path.
  • No new hard dependency; the HF CLI is optional and detected at runtime via command -v.
  • If a pure-curl path is preferred (no HF dep at all), I can replace this with a chunked curl --range loop that issues fixed-size chunks and concatenates — happy to take that direction if you'd rather not introduce the HF CLI as an option. The current patch is the smallest fix that works.

The plain curl path 400s on the ~430 GB PRO bundle because Hugging
Face's xet bridge rejects range-less GETs above a certain size. The
official 'hf' (or 'huggingface-cli') client is xet-bridge native and
already handles the chunked / ranged retrieval the bridge requires.

Pick it up when on PATH; otherwise fall through to the existing curl
flow unchanged. Token is passed via HF_TOKEN env rather than --token
so it does not appear in 'ps' output of the child process.
@ivanfioravanti
Copy link
Copy Markdown
Contributor

This will surely fix #249 I did same locally. But I don't if @antirez want to keep a pure curl version.

@siraustin
Copy link
Copy Markdown
Author

Thanks @ivanfioravanti — your table in #249 was the clearest evidence of the size-threshold shape; I've linked it from the PR body so this closes it on merge.

On the pure-curl concern: this patch keeps the existing curl flow bit-for-bit unchanged. The HF CLI is opt-in (detected at runtime, not required), and the patch is +16/-0 with the new branch nested inside download_one. So users without hf get exactly the prior behavior. The PR doesn't introduce a hard dependency.

That said, if @antirez would rather keep this script truly pure-curl (no HF detection at all), the alternative is a chunked curl --range loop — issue fixed-size GETs against the same URL and concatenate. It's ~30 lines instead of 16, has its own edge cases (mid-chunk network drops, atomic finalize), but keeps the script dep-free. Happy to swap to that variant if preferred.

@martinoturrina
Copy link
Copy Markdown

Thanks @siraustin for the PR, I think actually both should be implemented so HF is used if available, while if not available curl can still download big models, what do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

download_model.sh pro-imatrix --> error 400

3 participants