Await in-flight cache downloads + reconcile stranded download flags#469
Await in-flight cache downloads + reconcile stranded download flags#469jubishop wants to merge 27 commits into
Conversation
Add CacheManager.cachedURL(downloadingIfNeeded:) so callers can suspend until an episode's audio is cached instead of polling, backed by a per-episode AsyncLatch that CacheBackgroundDelegate opens on every terminal download callback (finish or failure). Harden background downloads against force-quit: - reconcileStaleDownloads() clears downloading flags that have no live task on launch, via the new Repo.downloadingEpisodeIDs() - cap timeoutIntervalForResource so a download that never progresses is abandoned in bounded time Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
- Refcount per-episode download latches so a cancelled or early-returning awaiter can't remove the shared latch and strand a concurrent awaiter. - Await reconcileStaleDownloads before starting the current-episode/queue observers, so a stranded downloading flag is cleared before an observer skips the re-download as "already caching". - Pass the already-fetched episode into ensureDownloadIsActive to drop a redundant DB read on the await path. - Signal download completion when the non-async didFinishDownloadingTo bails on a temp-move failure, so an awaiter isn't left hanging. Regression tests cover the latch-stranding and reconcile-ordering fixes, both proven failing on the prior code. Review ledger added. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Replace the shared per-episode AsyncLatch (monotonic + refcounted) with one latch per download attempt: downloadToCache registers it before resume(), and the delegate opens and removes it on the terminal callback. A finished attempt's latch can no longer resolve a later awaiter early, matching DownloadManager's per-DownloadTask ownership; drops the refcount bookkeeping with no re-check loop. Prune finished/failed tasks from FakeDataFetchable so it mirrors the real URLSession dropping completed tasks, and cover recovery of a fresh download after an earlier attempt for the same episode failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The non-async didFinishDownloadingTo move-failure catch signaled the completion latch but returned without clearing the downloading flag or progress; the subsequent didCompleteWithError(nil) early-returns on success, so the episode stayed .caching with no live task until the next launch's reconcile. Clear progress and (in a Task) the downloading flag before signaling, mirroring the normal completion paths. Also correct the downloadLatches comment: the latch is episode-keyed, so it can't guarantee "a finished download's signal can never resolve a later one" — state the single-in-flight-attempt-per-episode assumption instead. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cachedURL(downloadingIfNeeded:) now adopts or registers the per-episode latch and re-checks live-task state before awaiting, so a background download reattached after a relaunch (live task, in-memory latch gone) is awaited rather than returning a premature nil. Extract clearDownloadState(for:) and call it from every give-up path in CacheBackgroundDelegate (safe-temp move failure, URL mismatch, didCompleteWithError). The URL-mismatch path previously returned without clearing the downloading flag, stranding the episode .caching with no live task. Regression tests: awaitsReattachedDownloadWithoutLatch; mismatchedURLRefused now sets downloading and asserts .uncached. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cachedURL(downloadingIfNeeded:) get-or-creates the per-episode latch when it suspends, so downloadToCache no longer needs to pre-register one. Doing so only allocated a latch for every observer-driven download that nobody awaits; cachedURL is now the sole latch registrar and awaiter. Update the field, await, and signal comments (and the reattach test comment) to reflect lazy, awaiter-owned latch creation. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
cachedURL registered its completion latch after ensureDownloadIsActive and gated the wait on task liveness. A success signal from didFinishDownloadingTo fires into an empty slot while the task still lingers in allTasks (until the later didCompleteWithError), so the liveness re-check saw a live task and awaited a fresh latch that never reopened — a permanent hang for a future awaiting consumer. Clear the downloading flag last on success (after the cached filename is written) so it is a terminal marker written before the signal, and gate cachedURL on that flag instead of task liveness: still downloading means the signal is still coming and will open the latch just registered; anything else is already resolved. Also drops the skip-wait latch leak. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Gates loadEpisodeAsset so the success delegate parks mid-completion — after the file move, before the cached filename is written and the flag cleared — and asserts a concurrent downloadToCache no-ops. Red on the old ordering (which cleared downloading before the asset load, briefly exposing .uncached and inviting a duplicate download), green after. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Simplify and tighten comments throughout the cache manager and delegate to improve clarity and reduce verbosity. Remove redundant explanations and consolidate key points into more concise language. - Shorten comments in CacheBackgroundDelegate about error handling, state clearing, and completion signaling - Streamline CacheManager comments on latch registration, stale download reconciliation, and timeout configuration - Tighten test comments to remove unnecessary detail while preserving intent - Clarify the relationship between downloading flag, file caching, and completion callbacks in simplified language These changes maintain the technical accuracy of the comments while making them easier to scan and understand at a glance.
…ngs) Independent code-only round at 278a2d9 confirms the latch/reconcile mechanism is correct and all 11 prior findings remain settled; logs the intake state and a dropped sub-threshold latch-leak candidate. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ead its episode The async didFinishDownloadingTo/didCompleteWithError callbacks signal any awaiter on every exit, but the episode-fetch-throws catch returned without clearing the downloading flag, stranding the row as .caching with no live task until next launch (didFinishDownloadingTo also orphaned the temp file). Clear state via the already-parsed episode id in both catches and drop the unattributable temp file. Regression tests use a new one-shot FakeRepo.episodeFetchError. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… findings) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… additional session notes Add comprehensive documentation for the F14 finding regarding background URLSession completion timing issues. Include the complete triage session from 2026-06-15T15:07:39 detailing why F14 was escalated to a follow-up issue (#482) rather than fixed in-PR, with evidence that the issue is pre-existing in main and not introduced by #469. Add the intermediate code-only review session (2026-06-15T14:51:01) that identified the pending background URLSession completion concern. Update the findings summary count to reflect 10 Warnings, 12 fixed items, and 1 follow-up issue. Document the F14 finding with full severity, architecture, location, and recommendation details, including the decision rationale and reframed fix approach around missing beginBackgroundTask assertion and deferred completion handling.
F15 (non-atomic same-episode download claim) decided as a follow-up rather than fixed in-PR: downloadToCache's claim is pre-existing/byte-identical to main, and the premature-nil impact rides on the unshipped await API. Filed #483; also folds in the prior session's F15-pending intake. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eRabbit Fresh code-only pass at HEAD found no new findings; all 16 prior findings remain settled (13 fixed, 1 skipped, 2 decided to follow-ups #482/#483). Dismissed a new non-minimized CodeRabbit review claiming the requestMidCompletionDoesNotRestartDownload test leaks its asset-loader handler: .container (FactoryKit ContainerTrait) gives each test a fresh Container + cloned singleton scope, and fakeEpisodeAssetLoader is .scope(.cached), so injected state is per-test isolated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…findings) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Real-world instance of the strand this PR's Sentry user feedback podhaven:7566596723 (TestFlight
Analysis of the attached NDJSON logs: episode
Build 537 was cut from Triage notes: |
Fixes #503
Generic
CacheManagerAPI split out of #461 so it can be reviewed and merged on its own. No transcription code here.What
Await a download instead of polling.
CacheManager.cachedURL(downloadingIfNeeded:)ensures an episode's audio is cached — starting a download if needed — and suspends until it finishes, returning the cached URL (ornilif the episode vanished or the download failed). It's event-driven via a per-episodeAsyncLatch:CacheBackgroundDelegateopens the latch on every terminal callback (didFinishDownloadingToanddidCompleteWithError), keyed off the task'staskDescriptionso a deleted row still unblocks the awaiter.Force-quit recovery. A background task cancelled by a force-quit never delivers its completion callback, stranding
downloading == truewith no live task.reconcileStaleDownloads()runs on launch and clears any downloading row absent from the recreated session's live-task list, via the newRepo.downloadingEpisodeIDs().timeoutIntervalForResourceis capped well below the framework's multi-day default so a download that never progresses is abandoned in bounded time.Why now
cachedURL(downloadingIfNeeded:)'s only consumer is the manual-transcription work in #461 (it needs the audio on disk before transcribing). The capability is generic and self-contained, so it lands here with its own tests; #461 will rebase ontomainonce this merges.Tests
CacheManagerAwaitTests(new): immediate return when already cached, start-and-await-completion, and resolve-nil-on-failure.CacheManagerReconcileTests(new): a stranded flag is cleared onstart()while a live download is left untouched.All four
CacheManager*suites pass on My Mac (36 tests), zero compiler warnings.🤖 Generated with Claude Code
Summary by CodeRabbit