Add manual on-device episode transcription by jubishop · Pull Request #461 · jubishop/podhaven

jubishop · 2026-06-13T22:02:36Z

What

User-initiated, on-device episode transcription (iOS 26 SpeechAnalyzer). Tap Transcribe from any episode surface → it enters a persisted, one-at-a-time queue → the processor transcribes on-device → the transcript renders read-only below the episode description. Timed segments are stored, so a v2 (#459) can add tap-to-seek/highlight without re-transcribing.

How it's built

Layer	What
Schema	Migration v62 — nullable `episode.transcript` column (timed-segment JSON); `Transcript` / `TranscriptSegment`
Engine	`Transcriber` service — `SpeechTranscriber` w/ `.audioTimeRange` over the cached `AVAudioFile`, speech model install (en-US), behind speech protocol fakes
Queue / processor	`TranscriptionQueue` (`@PersistedBroadcast` in UserDefaults — survives termination, not SQLite) + `TranscriptionProcessor` (foreground background-priority `Task` + discretionary `BGProcessingTask`, mirroring `EmbeddingProcessor`)
UI	Transcribe action on detail toolbar + section, list context menu, swipe action (Settings-configurable), multi-select; live `queued/transcribing/transcribed/failed` status mirroring the cache indicators

Key decisions

Timed JSON in one column, not a segment table — one migration; lossless upgrade to a transcript_segment table if cross-episode search / diarization land.
Discretionary BGProcessingTask, not BGContinuedProcessingTask — no odd Live Activity, mature, and unit-testable; trade is opportunistic (not immediate) background completion.

Full design + rationale: docs/initiatives/manual-transcripts.md.

Testing

Unit-tested: migration v62, queue behaviour, processor loop (foreground + the BGProcessingTask handler), status derivation, swipe-settings.
Integration/device-only (behind protocols + fakes, not in CI): the real Transcriber against Apple's on-device speech model, and actual BGProcessingTask grant behaviour.

Deferred to v2 (#459)

Interactive transcript (tap-to-seek / highlight / auto-scroll), FluidAudio speaker diarization, RSS <podcast:transcript> import, FTS5 cross-episode search.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added on-device episode transcription with timed transcript segments
- Transcribe episodes via toolbar, context menu, swipe actions, or multi-select
- View transcripts with timestamp-linked segments in episode details
- Track transcription status (queued, in-progress, completed, or failed)
- Persisted transcription queue processes episodes sequentially in foreground and background
Documentation
- Added comprehensive transcription feature documentation and architecture guide

User-initiated, on-device transcription of episodes via iOS 26 SpeechAnalyzer, surfaced from every episode action and rendered read-only in EpisodeDetailView. - Migration v59: nullable episode.transcript column storing timed-segment JSON - Transcriber actor: SpeechTranscriber + .audioTimeRange over the cached AVAudioFile, AssetInventory model install (en-US), behind a Transcribing protocol - Persisted one-at-a-time TranscriptionQueue (UserDefaults, survives termination) + TranscriptionProcessor: foreground background-priority drain plus a discretionary BGProcessingTask, mirroring EmbeddingProcessor - Transcribe action on the detail toolbar + section, list context menu, swipe action (configurable in Settings), and multi-select; per-episode queued/transcribing/transcribed/failed status across surfaces - Docs: docs/initiatives/manual-transcripts.md initiative; transcripts.md reframed as research (FluidAudio + RSS-format findings) Interactive playback (tap-to-seek/highlight/auto-scroll) deferred to v2 (#459); the per-segment timestamps are stored, so it is a pure UI layer later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…evel test seam Address PR review feedback: - Replace the download poll/sleep loop with a per-episode AsyncLatch completion signal in the cache layer (CacheManager + CacheBackgroundDelegate), awaited by the processor; drop the poll constants and sleeper dependency. - Event-drive the foreground drain via the queue stream (drain → drainUntilEmpty), removing waitForWork. - Push the transcription test seam to the OS boundary: SpeechAnalyzer / SpeechTranscriber (and the opaque SpeechTranscriber.Result) behind protocols the real types conform to, mirroring AVPlayable/AVPlayableItem, with fakes. Transcriber is now concrete and unit-tested. Document the rule in AGENTS.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Remove overly specific examples from the test seam guidance section to keep documentation concise and maintainable. The existing pattern examples (AVPlayer/AVPlayerItem → AVPlayable/AVPlayableItem, URLSession → DataFetchable, SpeechAnalyzer/SpeechTranscriber → SpeechAnalyzing/SpeechTranscribing) added unnecessary detail that can become outdated. Retain the core principle that test seams should be placed at the OS-integration boundary with app-owned protocols wrapping system-framework types, allowing developers to apply this pattern flexibly to their own integration points.

Only offer Transcribe when an episode can actually be transcribed (status .none/.failed), hiding it once transcribed/queued/transcribing — consistent with markFinished/cache gating across the swipe, context-menu, and multi-select toolbar surfaces. - Lift hasTranscript onto EpisodeFoundational; expose a shared Episode.hasTranscriptSelectable derived column so slim list projections carry it without fetching the transcript blob. - Add ManagingEpisodes.canTranscribe and SelectableEpisodeList .anySelectedCanTranscribe; gate the three action surfaces on them. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Render finalized phrases on their own lines instead of one space-joined paragraph, so the transcript reads like paragraphs and scans better. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v59.swift # PodHavenTests/MigrationTests/v59Tests.swift

Review fixes on top of the manual-transcription branch: - cache: reconcile stranded `downloading` flags at launch via the session's live-task list and cap the background resource timeout at 24h, so a download that never signals completion can't permanently stall the transcription queue (or leave a stuck "downloading" indicator) - transcriptions: treat no-decodable-audio as a retryable failure, while audio that contains no recognizable speech finalizes to a terminal transcript rendered as "No speech detected" - episode detail: memoize the transcript decode instead of re-decoding the full JSON on every `viewModel.episode` access - docs: correct migration references v59 → v60 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ion engine Implement transcription progress tracking by reading audio file duration upfront and reporting monotonic progress as each transcribed segment's end time is processed. Add `duration(ofAudioFileAt:)` to `SpeechAnalyzing` protocol and implement it in `SpeechAnalyzer` using `AVAudioFile` length and sample rate. Track progress in `Transcriber.collectSegments()` by comparing each result's `endSeconds` to the total duration, clamped and monotonic so the progress bar never jumps backward. Update `SpeechTranscriptionResult` protocol to expose `endSeconds` (latest audio end time across runs) alongside existing `startSeconds`. Wire progress callback through `Transcriber.transcribe()` → `TranscriptionProcessor` → `TranscriptionQueue.setProgress()`, feeding live updates to the UI. Enhance `EpisodeDetailView` to render a determinate `ProgressView` with percentage label when transcribing with progress > 0, falling back to indeterminate spinner until first segment finalizes. Add comprehensive test coverage via `TranscriberTests.reportsProgress()` validating monotonic progress [0.25, 0.5, 1] across three segments in a 100-second file. Update all test fakes (`FakeSpeechAnalyzer`, `FakeSpeechTranscriptionResult`, `TranscriptionHelpers`) to support duration configuration. Update initiative docs to reflect shipped progress reporting and defer tap-to-seek/highlighting to v2. Mark audio retention and failure surfacing as decided; note background-grant timing as open follow-up.

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/initiatives/manual-transcripts.md`:
- Around line 62-64: The fenced code block containing the TranscriptionStatus
enum declaration is missing a language identifier. Add the language specifier
swift immediately after the opening triple backticks (```) to enable proper
syntax highlighting and tooling support for the code block in the markdown file.

In `@PodHaven/Cache/CacheManager.swift`:
- Around line 84-85: The reconcileStaleDownloads() call is fire-and-forget,
creating a race condition where cachedURL(downloadingIfNeeded:) can see the
stale .caching flag before reconciliation completes, causing
downloadToCache(for:) to no-op and leaving callers waiting on a latch that never
opens, which indefinitely blocks the TranscriptionProcessor. Change the
reconcileStaleDownloads() call at lines 84-85 and any other affected locations
(lines 98-116) to await the result so that startup completes only after stale
downloads are fully reconciled. Additionally, add a regression test that
verifies the processor correctly re-downloads an episode when it starts with
downloading == true but no live background task and a queued transcription,
proving this test fails before the fix and passes after.

In `@PodHaven/Database/DisplayModels/EpisodeDetailContent.swift`:
- Line 46: In the init(initial:) method of EpisodeDetailContent, the
hasTranscript property is hard-coded to false, which discards the transcript
state already known from the initial list model and causes incorrect UI to
display before hydration. Replace the hard-coded false value with
listed.hasTranscript to propagate the actual transcript state from the initial
model.

In `@PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift`:
- Around line 90-99: The decodedTranscript computed property caches transcripts
using only the episodeID as a cache key, which means if the transcript JSON for
the same episode ID changes, the UI will render stale cached content. Modify the
cache validation logic in the decodedTranscript property to not only check if
the episodeID matches but also verify that the transcript content itself hasn't
changed. When accessing the transcriptCache, compare it against the newly loaded
transcript from content.loaded?.decodedTranscript, and invalidate the cache (by
updating transcriptCache) if the content differs from what is currently loaded,
even when the episodeID remains the same.

In `@PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift`:
- Around line 404-417: The transcribeSelectedEpisodes method enqueues all
selected episodes regardless of their transcribability status, but the action is
only enabled when any selected episode is transcribable (creating a mismatch
that can re-enqueue already transcribed/queued episodes). Filter the episodeIDs
retrieved from selectedPodcastEpisodeIDs to only include transcribable episodes
before calling transcriptionQueue.enqueue, ensuring only episodes that can be
transcribed are actually enqueued. This filtering should align with the
transcribability check used to enable the action.

In `@PodHavenTests/Fakes/FakeSpeechAnalyzer.swift`:
- Around line 12-13: The comment on the durationSeconds property references the
specific literal value 0, which creates maintenance debt if the default value or
behavior changes in the future. Rewrite the comment to describe the behavior of
the durationSeconds property itself, explaining what it controls or how it
affects the system, without embedding specific constant values like 0 in the
comment text.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c668f144-f096-45a0-80f4-6d55719e5f68

📥 Commits

Reviewing files that changed from the base of the PR and between aa8526a and d52d761.

⛔ Files ignored due to path filters (1)

memory/pr_reviews/461.md is excluded by !memory/pr_reviews/**

📒 Files selected for processing (58)

AGENTS.md
PodHaven/AppLauncher.swift
PodHaven/Cache/CacheBackgroundDelegate.swift
PodHaven/Cache/CacheManager.swift
PodHaven/Database/DisplayModels/DisplayedEpisode.swift
PodHaven/Database/DisplayModels/EpisodeDetailContent.swift
PodHaven/Database/DisplayModels/ListedEpisode.swift
PodHaven/Database/Migrations/Migration_v60.swift
PodHaven/Database/Models/Episode.swift
PodHaven/Database/Models/ListableEpisode.swift
PodHaven/Database/Models/ListablePodcastEpisode.swift
PodHaven/Database/Models/OnDeck.swift
PodHaven/Database/Models/PodcastEpisode.swift
PodHaven/Database/Models/UnsavedPodcastEpisode.swift
PodHaven/Database/Protocols/Databasing.swift
PodHaven/Database/Protocols/EpisodeFoundational.swift
PodHaven/Database/Repo.swift
PodHaven/Database/Schema.swift
PodHaven/Environment/AppIcon.swift
PodHaven/Environment/SystemImageName.swift
PodHaven/Environment/UserSettings.swift
PodHaven/Info.plist
PodHaven/Logging/LogSubsystem.swift
PodHaven/PodHavenApp.swift
PodHaven/Transcriptions/Extensions/SpeechAnalyzer.swift
PodHaven/Transcriptions/Extensions/SpeechTranscriber.swift
PodHaven/Transcriptions/Protocols/SpeechAnalyzing.swift
PodHaven/Transcriptions/Protocols/SpeechModelManaging.swift
PodHaven/Transcriptions/Protocols/SpeechTranscribing.swift
PodHaven/Transcriptions/Protocols/SpeechTranscriptionResult.swift
PodHaven/Transcriptions/Transcriber.swift
PodHaven/Transcriptions/Transcript.swift
PodHaven/Transcriptions/TranscriptionProcessor.swift
PodHaven/Transcriptions/TranscriptionQueue.swift
PodHaven/Views/Episodes/Components/EpisodesToolbarItems.swift
PodHaven/Views/Episodes/EpisodeDetailView.swift
PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift
PodHaven/Views/Episodes/Protocols/ManagingEpisodes.swift
PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift
PodHaven/Views/Episodes/ViewModifiers/EpisodeContextMenuViewModifier.swift
PodHaven/Views/Episodes/ViewModifiers/EpisodeSwipeViewModifier.swift
PodHaven/Views/Settings/SwipeActions/EpisodeSwipeSettingsView.swift
PodHavenTests/CacheManagerReconcileTests.swift
PodHavenTests/Fakes/FakeRepo.swift
PodHavenTests/Fakes/FakeSpeechAnalyzer.swift
PodHavenTests/Fakes/FakeSpeechModelManager.swift
PodHavenTests/Fakes/FakeSpeechTranscriber.swift
PodHavenTests/MigrationTests/v60Tests.swift
PodHavenTests/TranscriptionTests/TranscriberTests.swift
PodHavenTests/TranscriptionTests/TranscriptionBackgroundTaskTests.swift
PodHavenTests/TranscriptionTests/TranscriptionProcessorTests.swift
PodHavenTests/TranscriptionTests/TranscriptionQueueTests.swift
PodHavenTests/Utility/TranscriptionHelpers.swift
PodHavenTests/ViewModelTests/EpisodeDetailViewModelTests/EpisodeDetailTranscriptTests.swift
PodHavenTests/ViewModelTests/EpisodesListViewModelTests/EpisodesListTranscribeTests.swift
docs/README.md
docs/initiatives/manual-transcripts.md
docs/initiatives/transcripts.md

…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v60.swift # PodHaven/Database/Schema.swift # PodHavenTests/MigrationTests/v60Tests.swift

Describe the non-positive-duration progress-disable behavior instead of baking the literal 0 into the comment, per the AGENTS.md guardrail (CodeRabbit thread on PR #461). Update the review ledger. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Record the EpisodeDetailViewModel transcript-memo invalidation seam (and the skip-redundant-overwrite requirement) that any v2 path replacing a transcript in place — RSS import or model-revision re-transcribe — must handle. Tighten the F10 review-ledger resolution with the same reasoning. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A transcribed saved episode opened from a list starts in the .initial detail state (loaded == nil) with status .transcribed but a nil decoded transcript, so EpisodeDetailView rendered "No speech detected" until hydration completed. Add EpisodeTranscriptDisplay (loading/empty/text) on the view model: a nil decode with no loaded episode is .loading, reserving the empty notice for a decoded zero-segment transcript. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

# Conflicts: # PodHaven/Database/Migrations/Migration_v62.swift # PodHavenTests/MigrationTests/v62Tests.swift

This comment has been minimized.

Sign in to view

jubishop commented Jun 13, 2026

View reviewed changes

Comment thread PodHaven/Transcriptions/Transcriber.swift Outdated

jubishop and others added 7 commits June 13, 2026 16:30

🎨 fix(transcriptions): join transcript segments with newlines

78ffc7d

Render finalized phrases on their own lines instead of one space-joined paragraph, so the transcript reads like paragraphs and scans better. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/main' into worktree-manualTransc…

33d68df

…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v59.swift # PodHavenTests/MigrationTests/v59Tests.swift

coderabbitai Bot reviewed Jun 14, 2026

View reviewed changes

jubishop and others added 5 commits June 14, 2026 15:48

Merge remote-tracking branch 'origin/main' into worktree-manualTransc…

7f6ca72

…ripts # Conflicts: # PodHaven/Database/Migrations/Migration_v60.swift # PodHaven/Database/Schema.swift # PodHavenTests/MigrationTests/v60Tests.swift

Merge branch 'main' into worktree-manualTranscripts

9bf7c74

Fix manual transcription review issues

dd43463

jubishop mentioned this pull request Jun 15, 2026

Implement Episode Transcripts v2 (interactivity, diarization, RSS import, search) #459

Open

9 tasks

jubishop and others added 2 commits June 14, 2026 18:41

📝 review(transcripts): record F10 thread deferral + closure

cbe1767

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

jubishop mentioned this pull request Jun 15, 2026

Await in-flight cache downloads + reconcile stranded download flags #469

Open

Merge branch 'main' into worktree-manualTranscripts

30d6451

# Conflicts: # PodHaven/Database/Migrations/Migration_v62.swift # PodHavenTests/MigrationTests/v62Tests.swift

jubishop mentioned this pull request Jun 15, 2026

Atomically claim episode download starts to prevent duplicate concurrent downloads (PR #469 F15 follow-up) #483

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add manual on-device episode transcription#461

Add manual on-device episode transcription#461
jubishop wants to merge 16 commits into
mainfrom
worktree-manualTranscripts

jubishop commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jubishop commented Jun 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How it's built

Key decisions

Testing

Deferred to v2 (#459)

Summary by CodeRabbit

Release Notes

Uh oh!

This comment has been minimized.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jubishop commented Jun 13, 2026 •

edited by coderabbitai Bot

Loading