Add manual on-device episode transcription#461
Open
jubishop wants to merge 16 commits into
Open
Conversation
User-initiated, on-device transcription of episodes via iOS 26 SpeechAnalyzer, surfaced from every episode action and rendered read-only in EpisodeDetailView. - Migration v59: nullable episode.transcript column storing timed-segment JSON - Transcriber actor: SpeechTranscriber + .audioTimeRange over the cached AVAudioFile, AssetInventory model install (en-US), behind a Transcribing protocol - Persisted one-at-a-time TranscriptionQueue (UserDefaults, survives termination) + TranscriptionProcessor: foreground background-priority drain plus a discretionary BGProcessingTask, mirroring EmbeddingProcessor - Transcribe action on the detail toolbar + section, list context menu, swipe action (configurable in Settings), and multi-select; per-episode queued/transcribing/transcribed/failed status across surfaces - Docs: docs/initiatives/manual-transcripts.md initiative; transcripts.md reframed as research (FluidAudio + RSS-format findings) Interactive playback (tap-to-seek/highlight/auto-scroll) deferred to v2 (#459); the per-segment timestamps are stored, so it is a pure UI layer later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This comment has been minimized.
This comment has been minimized.
jubishop
commented
Jun 13, 2026
jubishop
commented
Jun 13, 2026
…evel test seam Address PR review feedback: - Replace the download poll/sleep loop with a per-episode AsyncLatch completion signal in the cache layer (CacheManager + CacheBackgroundDelegate), awaited by the processor; drop the poll constants and sleeper dependency. - Event-drive the foreground drain via the queue stream (drain → drainUntilEmpty), removing waitForWork. - Push the transcription test seam to the OS boundary: SpeechAnalyzer / SpeechTranscriber (and the opaque SpeechTranscriber.Result) behind protocols the real types conform to, mirroring AVPlayable/AVPlayableItem, with fakes. Transcriber is now concrete and unit-tested. Document the rule in AGENTS.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove overly specific examples from the test seam guidance section to keep documentation concise and maintainable. The existing pattern examples (AVPlayer/AVPlayerItem → AVPlayable/AVPlayableItem, URLSession → DataFetchable, SpeechAnalyzer/SpeechTranscriber → SpeechAnalyzing/SpeechTranscribing) added unnecessary detail that can become outdated. Retain the core principle that test seams should be placed at the OS-integration boundary with app-owned protocols wrapping system-framework types, allowing developers to apply this pattern flexibly to their own integration points.
Only offer Transcribe when an episode can actually be transcribed (status .none/.failed), hiding it once transcribed/queued/transcribing — consistent with markFinished/cache gating across the swipe, context-menu, and multi-select toolbar surfaces. - Lift hasTranscript onto EpisodeFoundational; expose a shared Episode.hasTranscriptSelectable derived column so slim list projections carry it without fetching the transcript blob. - Add ManagingEpisodes.canTranscribe and SelectableEpisodeList .anySelectedCanTranscribe; gate the three action surfaces on them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Render finalized phrases on their own lines instead of one space-joined paragraph, so the transcript reads like paragraphs and scans better. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v59.swift # PodHavenTests/MigrationTests/v59Tests.swift
Review fixes on top of the manual-transcription branch: - cache: reconcile stranded `downloading` flags at launch via the session's live-task list and cap the background resource timeout at 24h, so a download that never signals completion can't permanently stall the transcription queue (or leave a stuck "downloading" indicator) - transcriptions: treat no-decodable-audio as a retryable failure, while audio that contains no recognizable speech finalizes to a terminal transcript rendered as "No speech detected" - episode detail: memoize the transcript decode instead of re-decoding the full JSON on every `viewModel.episode` access - docs: correct migration references v59 → v60 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion engine Implement transcription progress tracking by reading audio file duration upfront and reporting monotonic progress as each transcribed segment's end time is processed. Add `duration(ofAudioFileAt:)` to `SpeechAnalyzing` protocol and implement it in `SpeechAnalyzer` using `AVAudioFile` length and sample rate. Track progress in `Transcriber.collectSegments()` by comparing each result's `endSeconds` to the total duration, clamped and monotonic so the progress bar never jumps backward. Update `SpeechTranscriptionResult` protocol to expose `endSeconds` (latest audio end time across runs) alongside existing `startSeconds`. Wire progress callback through `Transcriber.transcribe()` → `TranscriptionProcessor` → `TranscriptionQueue.setProgress()`, feeding live updates to the UI. Enhance `EpisodeDetailView` to render a determinate `ProgressView` with percentage label when transcribing with progress > 0, falling back to indeterminate spinner until first segment finalizes. Add comprehensive test coverage via `TranscriberTests.reportsProgress()` validating monotonic progress [0.25, 0.5, 1] across three segments in a 100-second file. Update all test fakes (`FakeSpeechAnalyzer`, `FakeSpeechTranscriptionResult`, `TranscriptionHelpers`) to support duration configuration. Update initiative docs to reflect shipped progress reporting and defer tap-to-seek/highlighting to v2. Mark audio retention and failure surfacing as decided; note background-grant timing as open follow-up.
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/initiatives/manual-transcripts.md`:
- Around line 62-64: The fenced code block containing the TranscriptionStatus
enum declaration is missing a language identifier. Add the language specifier
swift immediately after the opening triple backticks (```) to enable proper
syntax highlighting and tooling support for the code block in the markdown file.
In `@PodHaven/Cache/CacheManager.swift`:
- Around line 84-85: The reconcileStaleDownloads() call is fire-and-forget,
creating a race condition where cachedURL(downloadingIfNeeded:) can see the
stale .caching flag before reconciliation completes, causing
downloadToCache(for:) to no-op and leaving callers waiting on a latch that never
opens, which indefinitely blocks the TranscriptionProcessor. Change the
reconcileStaleDownloads() call at lines 84-85 and any other affected locations
(lines 98-116) to await the result so that startup completes only after stale
downloads are fully reconciled. Additionally, add a regression test that
verifies the processor correctly re-downloads an episode when it starts with
downloading == true but no live background task and a queued transcription,
proving this test fails before the fix and passes after.
In `@PodHaven/Database/DisplayModels/EpisodeDetailContent.swift`:
- Line 46: In the init(initial:) method of EpisodeDetailContent, the
hasTranscript property is hard-coded to false, which discards the transcript
state already known from the initial list model and causes incorrect UI to
display before hydration. Replace the hard-coded false value with
listed.hasTranscript to propagate the actual transcript state from the initial
model.
In `@PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift`:
- Around line 90-99: The decodedTranscript computed property caches transcripts
using only the episodeID as a cache key, which means if the transcript JSON for
the same episode ID changes, the UI will render stale cached content. Modify the
cache validation logic in the decodedTranscript property to not only check if
the episodeID matches but also verify that the transcript content itself hasn't
changed. When accessing the transcriptCache, compare it against the newly loaded
transcript from content.loaded?.decodedTranscript, and invalidate the cache (by
updating transcriptCache) if the content differs from what is currently loaded,
even when the episodeID remains the same.
In `@PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift`:
- Around line 404-417: The transcribeSelectedEpisodes method enqueues all
selected episodes regardless of their transcribability status, but the action is
only enabled when any selected episode is transcribable (creating a mismatch
that can re-enqueue already transcribed/queued episodes). Filter the episodeIDs
retrieved from selectedPodcastEpisodeIDs to only include transcribable episodes
before calling transcriptionQueue.enqueue, ensuring only episodes that can be
transcribed are actually enqueued. This filtering should align with the
transcribability check used to enable the action.
In `@PodHavenTests/Fakes/FakeSpeechAnalyzer.swift`:
- Around line 12-13: The comment on the durationSeconds property references the
specific literal value 0, which creates maintenance debt if the default value or
behavior changes in the future. Rewrite the comment to describe the behavior of
the durationSeconds property itself, explaining what it controls or how it
affects the system, without embedding specific constant values like 0 in the
comment text.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: c668f144-f096-45a0-80f4-6d55719e5f68
⛔ Files ignored due to path filters (1)
memory/pr_reviews/461.mdis excluded by!memory/pr_reviews/**
📒 Files selected for processing (58)
AGENTS.mdPodHaven/AppLauncher.swiftPodHaven/Cache/CacheBackgroundDelegate.swiftPodHaven/Cache/CacheManager.swiftPodHaven/Database/DisplayModels/DisplayedEpisode.swiftPodHaven/Database/DisplayModels/EpisodeDetailContent.swiftPodHaven/Database/DisplayModels/ListedEpisode.swiftPodHaven/Database/Migrations/Migration_v60.swiftPodHaven/Database/Models/Episode.swiftPodHaven/Database/Models/ListableEpisode.swiftPodHaven/Database/Models/ListablePodcastEpisode.swiftPodHaven/Database/Models/OnDeck.swiftPodHaven/Database/Models/PodcastEpisode.swiftPodHaven/Database/Models/UnsavedPodcastEpisode.swiftPodHaven/Database/Protocols/Databasing.swiftPodHaven/Database/Protocols/EpisodeFoundational.swiftPodHaven/Database/Repo.swiftPodHaven/Database/Schema.swiftPodHaven/Environment/AppIcon.swiftPodHaven/Environment/SystemImageName.swiftPodHaven/Environment/UserSettings.swiftPodHaven/Info.plistPodHaven/Logging/LogSubsystem.swiftPodHaven/PodHavenApp.swiftPodHaven/Transcriptions/Extensions/SpeechAnalyzer.swiftPodHaven/Transcriptions/Extensions/SpeechTranscriber.swiftPodHaven/Transcriptions/Protocols/SpeechAnalyzing.swiftPodHaven/Transcriptions/Protocols/SpeechModelManaging.swiftPodHaven/Transcriptions/Protocols/SpeechTranscribing.swiftPodHaven/Transcriptions/Protocols/SpeechTranscriptionResult.swiftPodHaven/Transcriptions/Transcriber.swiftPodHaven/Transcriptions/Transcript.swiftPodHaven/Transcriptions/TranscriptionProcessor.swiftPodHaven/Transcriptions/TranscriptionQueue.swiftPodHaven/Views/Episodes/Components/EpisodesToolbarItems.swiftPodHaven/Views/Episodes/EpisodeDetailView.swiftPodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swiftPodHaven/Views/Episodes/Protocols/ManagingEpisodes.swiftPodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swiftPodHaven/Views/Episodes/ViewModifiers/EpisodeContextMenuViewModifier.swiftPodHaven/Views/Episodes/ViewModifiers/EpisodeSwipeViewModifier.swiftPodHaven/Views/Settings/SwipeActions/EpisodeSwipeSettingsView.swiftPodHavenTests/CacheManagerReconcileTests.swiftPodHavenTests/Fakes/FakeRepo.swiftPodHavenTests/Fakes/FakeSpeechAnalyzer.swiftPodHavenTests/Fakes/FakeSpeechModelManager.swiftPodHavenTests/Fakes/FakeSpeechTranscriber.swiftPodHavenTests/MigrationTests/v60Tests.swiftPodHavenTests/TranscriptionTests/TranscriberTests.swiftPodHavenTests/TranscriptionTests/TranscriptionBackgroundTaskTests.swiftPodHavenTests/TranscriptionTests/TranscriptionProcessorTests.swiftPodHavenTests/TranscriptionTests/TranscriptionQueueTests.swiftPodHavenTests/Utility/TranscriptionHelpers.swiftPodHavenTests/ViewModelTests/EpisodeDetailViewModelTests/EpisodeDetailTranscriptTests.swiftPodHavenTests/ViewModelTests/EpisodesListViewModelTests/EpisodesListTranscribeTests.swiftdocs/README.mddocs/initiatives/manual-transcripts.mddocs/initiatives/transcripts.md
…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v60.swift # PodHaven/Database/Schema.swift # PodHavenTests/MigrationTests/v60Tests.swift
Describe the non-positive-duration progress-disable behavior instead of baking the literal 0 into the comment, per the AGENTS.md guardrail (CodeRabbit thread on PR #461). Update the review ledger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record the EpisodeDetailViewModel transcript-memo invalidation seam (and the skip-redundant-overwrite requirement) that any v2 path replacing a transcript in place — RSS import or model-revision re-transcribe — must handle. Tighten the F10 review-ledger resolution with the same reasoning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9 tasks
A transcribed saved episode opened from a list starts in the .initial detail state (loaded == nil) with status .transcribed but a nil decoded transcript, so EpisodeDetailView rendered "No speech detected" until hydration completed. Add EpisodeTranscriptDisplay (loading/empty/text) on the view model: a nil decode with no loaded episode is .loading, reserving the empty notice for a decoded zero-segment transcript. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts: # PodHaven/Database/Migrations/Migration_v62.swift # PodHavenTests/MigrationTests/v62Tests.swift
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
User-initiated, on-device episode transcription (iOS 26
SpeechAnalyzer). Tap Transcribe from any episode surface → it enters a persisted, one-at-a-time queue → the processor transcribes on-device → the transcript renders read-only below the episode description. Timed segments are stored, so a v2 (#459) can add tap-to-seek/highlight without re-transcribing.How it's built
episode.transcriptcolumn (timed-segment JSON);Transcript/TranscriptSegmentTranscriberservice —SpeechTranscriberw/.audioTimeRangeover the cachedAVAudioFile, speech model install (en-US), behind speech protocol fakesTranscriptionQueue(@PersistedBroadcastin UserDefaults — survives termination, not SQLite) +TranscriptionProcessor(foreground background-priorityTask+ discretionaryBGProcessingTask, mirroringEmbeddingProcessor)queued/transcribing/transcribed/failedstatus mirroring the cache indicatorsKey decisions
transcript_segmenttable if cross-episode search / diarization land.BGProcessingTask, notBGContinuedProcessingTask— no odd Live Activity, mature, and unit-testable; trade is opportunistic (not immediate) background completion.Full design + rationale:
docs/initiatives/manual-transcripts.md.Testing
BGProcessingTaskhandler), status derivation, swipe-settings.Transcriberagainst Apple's on-device speech model, and actualBGProcessingTaskgrant behaviour.Deferred to v2 (#459)
Interactive transcript (tap-to-seek / highlight / auto-scroll), FluidAudio speaker diarization, RSS
<podcast:transcript>import, FTS5 cross-episode search.🤖 Generated with Claude Code
Summary by CodeRabbit
Release Notes
New Features
Documentation