Skip to content

fix(download): fix download speed/ETA accuracy (sliding window estimator + XetPoller scan targets)#526

Closed
mmogr wants to merge 9 commits into
mainfrom
fix/sliding-window-speed-eta
Closed

fix(download): fix download speed/ETA accuracy (sliding window estimator + XetPoller scan targets)#526
mmogr wants to merge 9 commits into
mainfrom
fix/sliding-window-speed-eta

Conversation

@mmogr
Copy link
Copy Markdown
Owner

@mmogr mmogr commented May 31, 2026

Closes #525

Problem

Download speed and ETA indicators in both the GUI and CLI fluctuated wildly. After implementing the sliding window estimator (see below), the symptom persisted: system monitor showed a stable 90 MB/s, but gglib displayed values that jumped from 4 → 250 → 50 → 1000 → 30 MB/s.

Two separate bugs were found and fixed.


Bug 1 — Slow/wrong estimator algorithm

Both the GUI bridge task and the CLI progress bars used an exponentially-weighted average with α=0.02. At a 250 ms tick interval that gives a ~34-tick half-life (~8.5 s convergence lag), and speed initialises at 0.0 so it takes ~28 s to reach 90% of the true value. Two independent copies of this bug existed in the codebase.

A second issue: indicatif's {bytes_per_sec} and {eta} placeholders are computed internally by indicatif from set_position() calls and cannot be overridden — so even if the bridge task sent correct speed/ETA values, the CLI bars would still show indicatif's own fluctuating estimate.

Bug 2 — XetPoller scanning the wrong filesystem locations (root cause of spikes)

When hf_xet is active, huggingface_hub bypasses tqdm.update() entirely, so the Python subprocess never emits Progress events. XetPoller compensates by stat-polling a list of target paths and reporting the total bytes on disk every 250 ms.

Two problems with how those targets were chosen:

  1. Atomic rename spike — targets included dest/{filename} (the final output file). When a transfer completes, huggingface_hub atomically renames .incomplete → final file. That makes the full file size appear in one 250 ms tick: e.g. a 5 GB file produces a momentary 5 GB / 0.25 s = 20 GB/s reading. The SlidingWindowRate then spends several seconds averaging this spike away while displaying wild intermediate values.

  2. Global HF cache not scannedhf_xet frequently writes blobs to the system-wide hub cache (~/.cache/huggingface/hub/models--owner--repo/blobs/) rather than the local dest/.cache staging directory. When that path wasn't in the target list, XetPoller saw 0 bytes for the first ~30 s of a download (explaining the "stuck at 4 MB/s" observation), then received the full blob size in one tick when the rename fired.


Fix

Shared SlidingWindowRate struct (progress/rate.rs) — retains timestamped byte-count samples over an 8-second window and reports Δbytes / Δtime across that window. This matches what system monitors show, requires no tuning constants, and is accurate from the second sample onward.

GUI / SSE path (manager/mod.rs) — the bridge task now calls rate.record() + rate.speed_bps() on every tick. DownloadEvent::progress() in gglib-core derives eta_seconds from speed_bps, so GUI ETA is also fixed with no changes to the core crate.

CLI indicatif bypass (cli_emitter.rs, cli_exec/exec/progress.rs) — {bytes_per_sec} and {eta} slots removed from all templates and replaced with {prefix}. Each bar now holds a SlidingWindowRate; speed and ETA are formatted as "X.XX MB/s eta MM:SS" and written via bar.set_prefix(), completely bypassing indicatif's internal estimator. set_position() is still called so {bytes} / {total_bytes} / {bar} continue to work correctly.

PlainProgress (non-TTY) now also emits ETA — it was previously omitted entirely.

XetPoller scan targets (cli_exec/exec/python_bridge.rs, xet_poller.rs):

  • Removed per-file final-path targets (eliminates the rename spike)
  • Added the global HF hub blob directory for the repo as a second target, resolving via HF_HUB_CACHEHUGGINGFACE_HUB_CACHEHF_HOME/hub$HOME/.cache/huggingface/hub (same priority as Python's huggingface_hub); missing paths are silently skipped
  • Made the high-water-mark check strictly monotonic: if polled bytes decrease (e.g. mid-rename), last_bytes resets to 0 so the next genuine increase fires immediately

Commits

  1. feat(download): add SlidingWindowRate and format_eta to progress module — new rate.rs with full unit test suite
  2. fix(download): replace EWA bridge task speed with SlidingWindowRatemanager/mod.rs; removes calculate_speed_eta helper
  3. fix(download): bypass indicatif estimator in CliDownloadEventEmittercli_emitter.rs
  4. fix(download): replace EWA in FancyProgress and PlainProgresscli_exec/exec/progress.rs
  5. docs(download): document SlidingWindowRate and format_eta in progress README
  6. fix(download): resolve all strict clippy warnings
  7. fix(download): fix XetPoller scan targets to eliminate speed spikes — root cause fix for hf-xet downloads

mmogr added 6 commits May 31, 2026 14:37
Add crates/gglib-download/src/progress/rate.rs with:
- SlidingWindowRate: 8-second sliding window speed estimator using a
  VecDeque of (Instant, u64) samples. Speed = (newest_bytes - oldest_bytes)
  / (newest_time - oldest_time). Falls back to overall average during the
  warm-up period before two distinct timestamps are in the window.
- format_eta: standalone ETA formatter shared by all display paths.
  Renders '--:--' for 0/NaN/infinite, 'M:SS' for sub-hour, 'H:MM:SS' for
  longer durations.
- Unit tests covering edge cases for both SlidingWindowRate and format_eta.

Export both from progress/mod.rs as pub(crate).

Closes part of #525.
Replace the alpha=0.02 EWA speed calculation in spawn_progress_bridge
with SlidingWindowRate. Both the 250ms tick path and the final-progress
(sender-dropped) path now call rate.record() + rate.speed_bps(). Remove
the now-unused calculate_speed_eta helper function.

closes part of #525.
Replace {bytes_per_sec} + {eta} in BAR_TEMPLATE with {prefix}. Each bar
now holds a SlidingWindowRate alongside the ProgressBar. The
DownloadProgress and ShardProgress handlers call rate.record() and write
a formatted 'X.XX MB/s  eta MM:SS' string into bar.set_prefix(),
completely bypassing indicatif's internal speed/ETA estimator.

Closes part of #525.
FancyProgress: add SlidingWindowRate field; swap {binary_bytes_per_sec}/s
ETA {eta} in bar_style template for {prefix}; write computed speed+ETA
via bar.set_prefix() on every update.

PlainProgress: replace ewa_speed field and EWA update logic with
SlidingWindowRate.record()/speed_bps(); add ETA to non-terminal output
line (previously omitted entirely).

Closes part of #525.
@mmogr mmogr added type: bug Something isn't working correctly component: downloads Download manager component: cli Command-line interface component: frontend React/TypeScript UI ux User experience improvements priority: high Important for next release status: needs-review Waiting for PR review size: m 4-8 hours (half to full day) labels May 31, 2026
@mmogr mmogr self-assigned this May 31, 2026
mmogr added 2 commits May 31, 2026 15:36
…tation

- redundant_pub_crate: pub(crate) -> pub in progress/mod.rs and rate.rs
  (progress is a private module; pub(crate) is redundant inside it)
- missing_const_for_fn: SlidingWindowRate::new() -> const fn
- dead_code: add #[allow(dead_code)] to reset() (reserved API)
- cast_possible_truncation / cast_sign_loss: add #[allow] at the three
  f64->u64 cast sites in cli_emitter.rs and progress.rs (speed_bps()
  is guaranteed non-negative and fits u64 for any realistic file size)
- cast_possible_truncation / cast_sign_loss: format_eta secs->u64 cast
  (guarded by the is_finite / > 0 check immediately above)
- float_cmp: three assert_eq!(speed_bps(), 0.0) in tests replaced with
  assert!(... == 0.0) — 0.0 comparisons are exact by construction here
- too_many_lines: add #[allow(clippy::too_many_lines)] to
  CliDownloadEventEmitter::emit (match arm dispatch, not real complexity)
Two root causes for the wildly fluctuating speed/ETA readings when
downloading via hf-xet:

1. **Final-file rename spike**: the previous poller_targets included
    — the final destination path.  When the xet
   transfer completes, huggingface_hub atomically renames the
    staging file to the final name.  That makes the full
   file size appear in a single 250 ms poll tick, producing a giant
   apparent byte jump (e.g. 5 GB in 0.25 s → 20 GB/s).

   Fix: remove the per-file final-path targets entirely.  They aren't
   written to during the transfer anyway, so nothing useful was being
   tracked through them.

2. **Global hub cache not scanned**: hf_xet sometimes downloads blobs
   into the user's system-wide HuggingFace hub cache
   (`~/.cache/huggingface/hub/models--owner--repo/blobs/`) rather than
   the local `dest/.cache` staging area.  When that happens, XetPoller
   always sees 0 bytes in its targets and never fires a progress event.

   Fix: add the repo's global blob directory (resolved via the same
   HF_HUB_CACHE / HUGGINGFACE_HUB_CACHE / HF_HOME env-var priority as
   huggingface_hub itself) as a second XetPoller target.  XetPoller
   already skips paths that don't exist, so this is a safe no-op for
   repos whose blobs aren't cached globally.

Additionally, tighten the XetPoller high-water-mark check to be
strictly monotonic: if summed bytes decrease (e.g. a file is renamed
out of the scan tree mid-download), reset last_bytes to 0 so the next
genuine increase fires immediately rather than being silently skipped.
@mmogr mmogr changed the title fix(download): replace EWA speed/ETA with 8-second sliding window estimator fix(download): fix download speed/ETA accuracy (sliding window estimator + XetPoller scan targets) May 31, 2026
XetPoller synthetic progress was still sourced from absolute directory
sizes, which can include stale cache artifacts and non-transfer files.
That means speed/ETA could still swing when unrelated files appeared,
vanished, or were renamed in the scan tree.

This patch tightens the estimator input to in-flight bytes only:

- Directory walks now count only  files, ignoring final
  blobs and metadata files that are not part of active transfer growth.
- Synthetic downloaded bytes are baseline-relative (current - startup),
  so pre-existing temp files from earlier sessions do not inflate the
  current request's progress stream.
- Synthetic downloaded is clamped to expected total (when known),
  preventing over-100% artifacts from auxiliary file churn.

Added tests cover:
- ignoring non-incomplete files in directory targets
- baseline subtraction for stale pre-existing temp files
- existing recursive, coalescing, and dormancy behavior
Copy link
Copy Markdown
Owner Author

@mmogr mmogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added follow-up fix in 04a4e1b to address residual speed spikes reported after the prior root-cause patch.

What changed:

  • XetPoller directory scans now count only in-flight *.incomplete files
  • Synthetic bytes are baseline-relative (current - startup) so pre-existing cache artifacts don't pollute current-request progress
  • Synthetic downloaded is clamped to expected total when known

Validation:

  • cargo test --package gglib-download (109 passed)
  • cargo clippy --package gglib-download -- -D warnings

If speed still fluctuates after this, I can add temporary debug telemetry to log raw poller samples (raw sum, baseline, emitted bytes, and per-tick delta) for one reproducer run.

Copy link
Copy Markdown
Owner Author

@mmogr mmogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small correction: I accidentally opened then deleted a pending review while posting follow-up notes. No code or review state changed; the intended status update is the latest comment plus commit 04a4e1b.

Copy link
Copy Markdown
Owner Author

@mmogr mmogr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consolidated summary (author update): residual fluctuation fix landed in commit 04a4e1b, now in branch. Key change is that synthetic progress uses baseline-relative growth of in-flight .incomplete files only, rather than absolute cache-directory size.

@mmogr mmogr closed this May 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component: cli Command-line interface component: downloads Download manager component: frontend React/TypeScript UI priority: high Important for next release size: m 4-8 hours (half to full day) status: needs-review Waiting for PR review type: bug Something isn't working correctly ux User experience improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: download speed and ETA displays wildly fluctuate due to broken EWA smoothing

1 participant