fix(download): fix download speed/ETA accuracy (sliding window estimator + XetPoller scan targets)#526
Closed
mmogr wants to merge 9 commits into
Closed
fix(download): fix download speed/ETA accuracy (sliding window estimator + XetPoller scan targets)#526mmogr wants to merge 9 commits into
mmogr wants to merge 9 commits into
Conversation
Add crates/gglib-download/src/progress/rate.rs with: - SlidingWindowRate: 8-second sliding window speed estimator using a VecDeque of (Instant, u64) samples. Speed = (newest_bytes - oldest_bytes) / (newest_time - oldest_time). Falls back to overall average during the warm-up period before two distinct timestamps are in the window. - format_eta: standalone ETA formatter shared by all display paths. Renders '--:--' for 0/NaN/infinite, 'M:SS' for sub-hour, 'H:MM:SS' for longer durations. - Unit tests covering edge cases for both SlidingWindowRate and format_eta. Export both from progress/mod.rs as pub(crate). Closes part of #525.
Replace the alpha=0.02 EWA speed calculation in spawn_progress_bridge with SlidingWindowRate. Both the 250ms tick path and the final-progress (sender-dropped) path now call rate.record() + rate.speed_bps(). Remove the now-unused calculate_speed_eta helper function. closes part of #525.
Replace {bytes_per_sec} + {eta} in BAR_TEMPLATE with {prefix}. Each bar
now holds a SlidingWindowRate alongside the ProgressBar. The
DownloadProgress and ShardProgress handlers call rate.record() and write
a formatted 'X.XX MB/s eta MM:SS' string into bar.set_prefix(),
completely bypassing indicatif's internal speed/ETA estimator.
Closes part of #525.
FancyProgress: add SlidingWindowRate field; swap {binary_bytes_per_sec}/s
ETA {eta} in bar_style template for {prefix}; write computed speed+ETA
via bar.set_prefix() on every update.
PlainProgress: replace ewa_speed field and EWA update logic with
SlidingWindowRate.record()/speed_bps(); add ETA to non-terminal output
line (previously omitted entirely).
Closes part of #525.
…tation - redundant_pub_crate: pub(crate) -> pub in progress/mod.rs and rate.rs (progress is a private module; pub(crate) is redundant inside it) - missing_const_for_fn: SlidingWindowRate::new() -> const fn - dead_code: add #[allow(dead_code)] to reset() (reserved API) - cast_possible_truncation / cast_sign_loss: add #[allow] at the three f64->u64 cast sites in cli_emitter.rs and progress.rs (speed_bps() is guaranteed non-negative and fits u64 for any realistic file size) - cast_possible_truncation / cast_sign_loss: format_eta secs->u64 cast (guarded by the is_finite / > 0 check immediately above) - float_cmp: three assert_eq!(speed_bps(), 0.0) in tests replaced with assert!(... == 0.0) — 0.0 comparisons are exact by construction here - too_many_lines: add #[allow(clippy::too_many_lines)] to CliDownloadEventEmitter::emit (match arm dispatch, not real complexity)
Two root causes for the wildly fluctuating speed/ETA readings when
downloading via hf-xet:
1. **Final-file rename spike**: the previous poller_targets included
— the final destination path. When the xet
transfer completes, huggingface_hub atomically renames the
staging file to the final name. That makes the full
file size appear in a single 250 ms poll tick, producing a giant
apparent byte jump (e.g. 5 GB in 0.25 s → 20 GB/s).
Fix: remove the per-file final-path targets entirely. They aren't
written to during the transfer anyway, so nothing useful was being
tracked through them.
2. **Global hub cache not scanned**: hf_xet sometimes downloads blobs
into the user's system-wide HuggingFace hub cache
(`~/.cache/huggingface/hub/models--owner--repo/blobs/`) rather than
the local `dest/.cache` staging area. When that happens, XetPoller
always sees 0 bytes in its targets and never fires a progress event.
Fix: add the repo's global blob directory (resolved via the same
HF_HUB_CACHE / HUGGINGFACE_HUB_CACHE / HF_HOME env-var priority as
huggingface_hub itself) as a second XetPoller target. XetPoller
already skips paths that don't exist, so this is a safe no-op for
repos whose blobs aren't cached globally.
Additionally, tighten the XetPoller high-water-mark check to be
strictly monotonic: if summed bytes decrease (e.g. a file is renamed
out of the scan tree mid-download), reset last_bytes to 0 so the next
genuine increase fires immediately rather than being silently skipped.
XetPoller synthetic progress was still sourced from absolute directory sizes, which can include stale cache artifacts and non-transfer files. That means speed/ETA could still swing when unrelated files appeared, vanished, or were renamed in the scan tree. This patch tightens the estimator input to in-flight bytes only: - Directory walks now count only files, ignoring final blobs and metadata files that are not part of active transfer growth. - Synthetic downloaded bytes are baseline-relative (current - startup), so pre-existing temp files from earlier sessions do not inflate the current request's progress stream. - Synthetic downloaded is clamped to expected total (when known), preventing over-100% artifacts from auxiliary file churn. Added tests cover: - ignoring non-incomplete files in directory targets - baseline subtraction for stale pre-existing temp files - existing recursive, coalescing, and dormancy behavior
mmogr
commented
May 31, 2026
Owner
Author
mmogr
left a comment
There was a problem hiding this comment.
Added follow-up fix in 04a4e1b to address residual speed spikes reported after the prior root-cause patch.
What changed:
- XetPoller directory scans now count only in-flight
*.incompletefiles - Synthetic bytes are baseline-relative (
current - startup) so pre-existing cache artifacts don't pollute current-request progress - Synthetic downloaded is clamped to expected total when known
Validation:
cargo test --package gglib-download(109 passed)cargo clippy --package gglib-download -- -D warnings
If speed still fluctuates after this, I can add temporary debug telemetry to log raw poller samples (raw sum, baseline, emitted bytes, and per-tick delta) for one reproducer run.
mmogr
commented
May 31, 2026
Owner
Author
mmogr
left a comment
There was a problem hiding this comment.
Consolidated summary (author update): residual fluctuation fix landed in commit 04a4e1b, now in branch. Key change is that synthetic progress uses baseline-relative growth of in-flight .incomplete files only, rather than absolute cache-directory size.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #525
Problem
Download speed and ETA indicators in both the GUI and CLI fluctuated wildly. After implementing the sliding window estimator (see below), the symptom persisted: system monitor showed a stable 90 MB/s, but gglib displayed values that jumped from 4 → 250 → 50 → 1000 → 30 MB/s.
Two separate bugs were found and fixed.
Bug 1 — Slow/wrong estimator algorithm
Both the GUI bridge task and the CLI progress bars used an exponentially-weighted average with α=0.02. At a 250 ms tick interval that gives a ~34-tick half-life (~8.5 s convergence lag), and speed initialises at 0.0 so it takes ~28 s to reach 90% of the true value. Two independent copies of this bug existed in the codebase.
A second issue: indicatif's
{bytes_per_sec}and{eta}placeholders are computed internally by indicatif fromset_position()calls and cannot be overridden — so even if the bridge task sent correct speed/ETA values, the CLI bars would still show indicatif's own fluctuating estimate.Bug 2 — XetPoller scanning the wrong filesystem locations (root cause of spikes)
When
hf_xetis active,huggingface_hubbypassestqdm.update()entirely, so the Python subprocess never emitsProgressevents.XetPollercompensates by stat-polling a list of target paths and reporting the total bytes on disk every 250 ms.Two problems with how those targets were chosen:
Atomic rename spike — targets included
dest/{filename}(the final output file). When a transfer completes,huggingface_hubatomically renames.incomplete→ final file. That makes the full file size appear in one 250 ms tick: e.g. a 5 GB file produces a momentary5 GB / 0.25 s = 20 GB/sreading. TheSlidingWindowRatethen spends several seconds averaging this spike away while displaying wild intermediate values.Global HF cache not scanned —
hf_xetfrequently writes blobs to the system-wide hub cache (~/.cache/huggingface/hub/models--owner--repo/blobs/) rather than the localdest/.cachestaging directory. When that path wasn't in the target list, XetPoller saw 0 bytes for the first ~30 s of a download (explaining the "stuck at 4 MB/s" observation), then received the full blob size in one tick when the rename fired.Fix
Shared
SlidingWindowRatestruct (progress/rate.rs) — retains timestamped byte-count samples over an 8-second window and reportsΔbytes / Δtimeacross that window. This matches what system monitors show, requires no tuning constants, and is accurate from the second sample onward.GUI / SSE path (
manager/mod.rs) — the bridge task now callsrate.record()+rate.speed_bps()on every tick.DownloadEvent::progress()ingglib-corederiveseta_secondsfromspeed_bps, so GUI ETA is also fixed with no changes to the core crate.CLI indicatif bypass (
cli_emitter.rs,cli_exec/exec/progress.rs) —{bytes_per_sec}and{eta}slots removed from all templates and replaced with{prefix}. Each bar now holds aSlidingWindowRate; speed and ETA are formatted as"X.XX MB/s eta MM:SS"and written viabar.set_prefix(), completely bypassing indicatif's internal estimator.set_position()is still called so{bytes}/{total_bytes}/{bar}continue to work correctly.PlainProgress(non-TTY) now also emits ETA — it was previously omitted entirely.XetPoller scan targets (
cli_exec/exec/python_bridge.rs,xet_poller.rs):HF_HUB_CACHE→HUGGINGFACE_HUB_CACHE→HF_HOME/hub→$HOME/.cache/huggingface/hub(same priority as Python'shuggingface_hub); missing paths are silently skippedlast_bytesresets to 0 so the next genuine increase fires immediatelyCommits
feat(download): add SlidingWindowRate and format_eta to progress module— newrate.rswith full unit test suitefix(download): replace EWA bridge task speed with SlidingWindowRate—manager/mod.rs; removescalculate_speed_etahelperfix(download): bypass indicatif estimator in CliDownloadEventEmitter—cli_emitter.rsfix(download): replace EWA in FancyProgress and PlainProgress—cli_exec/exec/progress.rsdocs(download): document SlidingWindowRate and format_eta in progress READMEfix(download): resolve all strict clippy warningsfix(download): fix XetPoller scan targets to eliminate speed spikes— root cause fix for hf-xet downloads