Skip to content

Add manual on-device episode transcription#461

Open
jubishop wants to merge 16 commits into
mainfrom
worktree-manualTranscripts
Open

Add manual on-device episode transcription#461
jubishop wants to merge 16 commits into
mainfrom
worktree-manualTranscripts

Conversation

@jubishop

@jubishop jubishop commented Jun 13, 2026

Copy link
Copy Markdown
Owner

What

User-initiated, on-device episode transcription (iOS 26 SpeechAnalyzer). Tap Transcribe from any episode surface → it enters a persisted, one-at-a-time queue → the processor transcribes on-device → the transcript renders read-only below the episode description. Timed segments are stored, so a v2 (#459) can add tap-to-seek/highlight without re-transcribing.

How it's built

Layer What
Schema Migration v62 — nullable episode.transcript column (timed-segment JSON); Transcript / TranscriptSegment
Engine Transcriber service — SpeechTranscriber w/ .audioTimeRange over the cached AVAudioFile, speech model install (en-US), behind speech protocol fakes
Queue / processor TranscriptionQueue (@PersistedBroadcast in UserDefaults — survives termination, not SQLite) + TranscriptionProcessor (foreground background-priority Task + discretionary BGProcessingTask, mirroring EmbeddingProcessor)
UI Transcribe action on detail toolbar + section, list context menu, swipe action (Settings-configurable), multi-select; live queued/transcribing/transcribed/failed status mirroring the cache indicators

Key decisions

  • Timed JSON in one column, not a segment table — one migration; lossless upgrade to a transcript_segment table if cross-episode search / diarization land.
  • Discretionary BGProcessingTask, not BGContinuedProcessingTask — no odd Live Activity, mature, and unit-testable; trade is opportunistic (not immediate) background completion.

Full design + rationale: docs/initiatives/manual-transcripts.md.

Testing

  • Unit-tested: migration v62, queue behaviour, processor loop (foreground + the BGProcessingTask handler), status derivation, swipe-settings.
  • Integration/device-only (behind protocols + fakes, not in CI): the real Transcriber against Apple's on-device speech model, and actual BGProcessingTask grant behaviour.

Deferred to v2 (#459)

Interactive transcript (tap-to-seek / highlight / auto-scroll), FluidAudio speaker diarization, RSS <podcast:transcript> import, FTS5 cross-episode search.

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

  • New Features

    • Added on-device episode transcription with timed transcript segments
    • Transcribe episodes via toolbar, context menu, swipe actions, or multi-select
    • View transcripts with timestamp-linked segments in episode details
    • Track transcription status (queued, in-progress, completed, or failed)
    • Persisted transcription queue processes episodes sequentially in foreground and background
  • Documentation

    • Added comprehensive transcription feature documentation and architecture guide

User-initiated, on-device transcription of episodes via iOS 26 SpeechAnalyzer,
surfaced from every episode action and rendered read-only in EpisodeDetailView.

- Migration v59: nullable episode.transcript column storing timed-segment JSON
- Transcriber actor: SpeechTranscriber + .audioTimeRange over the cached
  AVAudioFile, AssetInventory model install (en-US), behind a Transcribing protocol
- Persisted one-at-a-time TranscriptionQueue (UserDefaults, survives termination)
  + TranscriptionProcessor: foreground background-priority drain plus a
  discretionary BGProcessingTask, mirroring EmbeddingProcessor
- Transcribe action on the detail toolbar + section, list context menu, swipe
  action (configurable in Settings), and multi-select; per-episode
  queued/transcribing/transcribed/failed status across surfaces
- Docs: docs/initiatives/manual-transcripts.md initiative; transcripts.md
  reframed as research (FluidAudio + RSS-format findings)

Interactive playback (tap-to-seek/highlight/auto-scroll) deferred to v2 (#459);
the per-segment timestamps are stored, so it is a pure UI layer later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

This comment has been minimized.

Comment thread PodHaven/Transcriptions/TranscriptionProcessor.swift Outdated
Comment thread PodHaven/Transcriptions/TranscriptionProcessor.swift Outdated
Comment thread PodHaven/Transcriptions/TranscriptionProcessor.swift Outdated
Comment thread PodHaven/Transcriptions/TranscriptionProcessor.swift Outdated
Comment thread PodHaven/Transcriptions/Transcriber.swift Outdated
jubishop and others added 7 commits June 13, 2026 16:30
…evel test seam

Address PR review feedback:
- Replace the download poll/sleep loop with a per-episode AsyncLatch completion
  signal in the cache layer (CacheManager + CacheBackgroundDelegate), awaited by
  the processor; drop the poll constants and sleeper dependency.
- Event-drive the foreground drain via the queue stream
  (drain → drainUntilEmpty), removing waitForWork.
- Push the transcription test seam to the OS boundary: SpeechAnalyzer /
  SpeechTranscriber (and the opaque SpeechTranscriber.Result) behind protocols
  the real types conform to, mirroring AVPlayable/AVPlayableItem, with fakes.
  Transcriber is now concrete and unit-tested. Document the rule in AGENTS.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Remove overly specific examples from the test seam guidance section to keep documentation concise and maintainable. The existing pattern examples (AVPlayer/AVPlayerItem → AVPlayable/AVPlayableItem, URLSession → DataFetchable, SpeechAnalyzer/SpeechTranscriber → SpeechAnalyzing/SpeechTranscribing) added unnecessary detail that can become outdated. Retain the core principle that test seams should be placed at the OS-integration boundary with app-owned protocols wrapping system-framework types, allowing developers to apply this pattern flexibly to their own integration points.
Only offer Transcribe when an episode can actually be transcribed (status
.none/.failed), hiding it once transcribed/queued/transcribing — consistent
with markFinished/cache gating across the swipe, context-menu, and multi-select
toolbar surfaces.

- Lift hasTranscript onto EpisodeFoundational; expose a shared
  Episode.hasTranscriptSelectable derived column so slim list projections carry
  it without fetching the transcript blob.
- Add ManagingEpisodes.canTranscribe and SelectableEpisodeList
  .anySelectedCanTranscribe; gate the three action surfaces on them.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Render finalized phrases on their own lines instead of one space-joined
paragraph, so the transcript reads like paragraphs and scans better.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ripts

# Conflicts:
#	PodHaven/Database/Migrations/Migration_v59.swift
#	PodHavenTests/MigrationTests/v59Tests.swift
Review fixes on top of the manual-transcription branch:

- cache: reconcile stranded `downloading` flags at launch via the session's
  live-task list and cap the background resource timeout at 24h, so a
  download that never signals completion can't permanently stall the
  transcription queue (or leave a stuck "downloading" indicator)
- transcriptions: treat no-decodable-audio as a retryable failure, while
  audio that contains no recognizable speech finalizes to a terminal
  transcript rendered as "No speech detected"
- episode detail: memoize the transcript decode instead of re-decoding the
  full JSON on every `viewModel.episode` access
- docs: correct migration references v59 → v60

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ion engine

Implement transcription progress tracking by reading audio file duration upfront and reporting monotonic progress as each transcribed segment's end time is processed. Add `duration(ofAudioFileAt:)` to `SpeechAnalyzing` protocol and implement it in `SpeechAnalyzer` using `AVAudioFile` length and sample rate. Track progress in `Transcriber.collectSegments()` by comparing each result's `endSeconds` to the total duration, clamped and monotonic so the progress bar never jumps backward.

Update `SpeechTranscriptionResult` protocol to expose `endSeconds` (latest audio end time across runs) alongside existing `startSeconds`. Wire progress callback through `Transcriber.transcribe()` → `TranscriptionProcessor` → `TranscriptionQueue.setProgress()`, feeding live updates to the UI. Enhance `EpisodeDetailView` to render a determinate `ProgressView` with percentage label when transcribing with progress > 0, falling back to indeterminate spinner until first segment finalizes.

Add comprehensive test coverage via `TranscriberTests.reportsProgress()` validating monotonic progress [0.25, 0.5, 1] across three segments in a 100-second file. Update all test fakes (`FakeSpeechAnalyzer`, `FakeSpeechTranscriptionResult`, `TranscriptionHelpers`) to support duration configuration.

Update initiative docs to reflect shipped progress reporting and defer tap-to-seek/highlighting to v2. Mark audio retention and failure surfacing as decided; note background-grant timing as open follow-up.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/initiatives/manual-transcripts.md`:
- Around line 62-64: The fenced code block containing the TranscriptionStatus
enum declaration is missing a language identifier. Add the language specifier
swift immediately after the opening triple backticks (```) to enable proper
syntax highlighting and tooling support for the code block in the markdown file.

In `@PodHaven/Cache/CacheManager.swift`:
- Around line 84-85: The reconcileStaleDownloads() call is fire-and-forget,
creating a race condition where cachedURL(downloadingIfNeeded:) can see the
stale .caching flag before reconciliation completes, causing
downloadToCache(for:) to no-op and leaving callers waiting on a latch that never
opens, which indefinitely blocks the TranscriptionProcessor. Change the
reconcileStaleDownloads() call at lines 84-85 and any other affected locations
(lines 98-116) to await the result so that startup completes only after stale
downloads are fully reconciled. Additionally, add a regression test that
verifies the processor correctly re-downloads an episode when it starts with
downloading == true but no live background task and a queued transcription,
proving this test fails before the fix and passes after.

In `@PodHaven/Database/DisplayModels/EpisodeDetailContent.swift`:
- Line 46: In the init(initial:) method of EpisodeDetailContent, the
hasTranscript property is hard-coded to false, which discards the transcript
state already known from the initial list model and causes incorrect UI to
display before hydration. Replace the hard-coded false value with
listed.hasTranscript to propagate the actual transcript state from the initial
model.

In `@PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift`:
- Around line 90-99: The decodedTranscript computed property caches transcripts
using only the episodeID as a cache key, which means if the transcript JSON for
the same episode ID changes, the UI will render stale cached content. Modify the
cache validation logic in the decodedTranscript property to not only check if
the episodeID matches but also verify that the transcript content itself hasn't
changed. When accessing the transcriptCache, compare it against the newly loaded
transcript from content.loaded?.decodedTranscript, and invalidate the cache (by
updating transcriptCache) if the content differs from what is currently loaded,
even when the episodeID remains the same.

In `@PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift`:
- Around line 404-417: The transcribeSelectedEpisodes method enqueues all
selected episodes regardless of their transcribability status, but the action is
only enabled when any selected episode is transcribable (creating a mismatch
that can re-enqueue already transcribed/queued episodes). Filter the episodeIDs
retrieved from selectedPodcastEpisodeIDs to only include transcribable episodes
before calling transcriptionQueue.enqueue, ensuring only episodes that can be
transcribed are actually enqueued. This filtering should align with the
transcribability check used to enable the action.

In `@PodHavenTests/Fakes/FakeSpeechAnalyzer.swift`:
- Around line 12-13: The comment on the durationSeconds property references the
specific literal value 0, which creates maintenance debt if the default value or
behavior changes in the future. Rewrite the comment to describe the behavior of
the durationSeconds property itself, explaining what it controls or how it
affects the system, without embedding specific constant values like 0 in the
comment text.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: c668f144-f096-45a0-80f4-6d55719e5f68

📥 Commits

Reviewing files that changed from the base of the PR and between aa8526a and d52d761.

⛔ Files ignored due to path filters (1)
  • memory/pr_reviews/461.md is excluded by !memory/pr_reviews/**
📒 Files selected for processing (58)
  • AGENTS.md
  • PodHaven/AppLauncher.swift
  • PodHaven/Cache/CacheBackgroundDelegate.swift
  • PodHaven/Cache/CacheManager.swift
  • PodHaven/Database/DisplayModels/DisplayedEpisode.swift
  • PodHaven/Database/DisplayModels/EpisodeDetailContent.swift
  • PodHaven/Database/DisplayModels/ListedEpisode.swift
  • PodHaven/Database/Migrations/Migration_v60.swift
  • PodHaven/Database/Models/Episode.swift
  • PodHaven/Database/Models/ListableEpisode.swift
  • PodHaven/Database/Models/ListablePodcastEpisode.swift
  • PodHaven/Database/Models/OnDeck.swift
  • PodHaven/Database/Models/PodcastEpisode.swift
  • PodHaven/Database/Models/UnsavedPodcastEpisode.swift
  • PodHaven/Database/Protocols/Databasing.swift
  • PodHaven/Database/Protocols/EpisodeFoundational.swift
  • PodHaven/Database/Repo.swift
  • PodHaven/Database/Schema.swift
  • PodHaven/Environment/AppIcon.swift
  • PodHaven/Environment/SystemImageName.swift
  • PodHaven/Environment/UserSettings.swift
  • PodHaven/Info.plist
  • PodHaven/Logging/LogSubsystem.swift
  • PodHaven/PodHavenApp.swift
  • PodHaven/Transcriptions/Extensions/SpeechAnalyzer.swift
  • PodHaven/Transcriptions/Extensions/SpeechTranscriber.swift
  • PodHaven/Transcriptions/Protocols/SpeechAnalyzing.swift
  • PodHaven/Transcriptions/Protocols/SpeechModelManaging.swift
  • PodHaven/Transcriptions/Protocols/SpeechTranscribing.swift
  • PodHaven/Transcriptions/Protocols/SpeechTranscriptionResult.swift
  • PodHaven/Transcriptions/Transcriber.swift
  • PodHaven/Transcriptions/Transcript.swift
  • PodHaven/Transcriptions/TranscriptionProcessor.swift
  • PodHaven/Transcriptions/TranscriptionQueue.swift
  • PodHaven/Views/Episodes/Components/EpisodesToolbarItems.swift
  • PodHaven/Views/Episodes/EpisodeDetailView.swift
  • PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift
  • PodHaven/Views/Episodes/Protocols/ManagingEpisodes.swift
  • PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift
  • PodHaven/Views/Episodes/ViewModifiers/EpisodeContextMenuViewModifier.swift
  • PodHaven/Views/Episodes/ViewModifiers/EpisodeSwipeViewModifier.swift
  • PodHaven/Views/Settings/SwipeActions/EpisodeSwipeSettingsView.swift
  • PodHavenTests/CacheManagerReconcileTests.swift
  • PodHavenTests/Fakes/FakeRepo.swift
  • PodHavenTests/Fakes/FakeSpeechAnalyzer.swift
  • PodHavenTests/Fakes/FakeSpeechModelManager.swift
  • PodHavenTests/Fakes/FakeSpeechTranscriber.swift
  • PodHavenTests/MigrationTests/v60Tests.swift
  • PodHavenTests/TranscriptionTests/TranscriberTests.swift
  • PodHavenTests/TranscriptionTests/TranscriptionBackgroundTaskTests.swift
  • PodHavenTests/TranscriptionTests/TranscriptionProcessorTests.swift
  • PodHavenTests/TranscriptionTests/TranscriptionQueueTests.swift
  • PodHavenTests/Utility/TranscriptionHelpers.swift
  • PodHavenTests/ViewModelTests/EpisodeDetailViewModelTests/EpisodeDetailTranscriptTests.swift
  • PodHavenTests/ViewModelTests/EpisodesListViewModelTests/EpisodesListTranscribeTests.swift
  • docs/README.md
  • docs/initiatives/manual-transcripts.md
  • docs/initiatives/transcripts.md

Comment thread docs/initiatives/manual-transcripts.md Outdated
Comment thread PodHaven/Cache/CacheManager.swift
Comment thread PodHaven/Database/DisplayModels/EpisodeDetailContent.swift Outdated
Comment thread PodHaven/Views/Episodes/Models/EpisodeDetailViewModel.swift
Comment thread PodHaven/Views/Episodes/Protocols/SelectableEpisodeList.swift Outdated
Comment thread PodHavenTests/Fakes/FakeSpeechAnalyzer.swift Outdated
jubishop and others added 5 commits June 14, 2026 15:48
…ripts

# Conflicts:
#	PodHaven/Database/Migrations/Migration_v60.swift
#	PodHaven/Database/Schema.swift
#	PodHavenTests/MigrationTests/v60Tests.swift
Describe the non-positive-duration progress-disable behavior instead of
baking the literal 0 into the comment, per the AGENTS.md guardrail
(CodeRabbit thread on PR #461). Update the review ledger.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Record the EpisodeDetailViewModel transcript-memo invalidation seam (and
the skip-redundant-overwrite requirement) that any v2 path replacing a
transcript in place — RSS import or model-revision re-transcribe — must
handle. Tighten the F10 review-ledger resolution with the same reasoning.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
jubishop and others added 2 commits June 14, 2026 18:41
A transcribed saved episode opened from a list starts in the .initial
detail state (loaded == nil) with status .transcribed but a nil decoded
transcript, so EpisodeDetailView rendered "No speech detected" until
hydration completed. Add EpisodeTranscriptDisplay (loading/empty/text)
on the view model: a nil decode with no loaded episode is .loading,
reserving the empty notice for a decoded zero-segment transcript.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
# Conflicts:
#	PodHaven/Database/Migrations/Migration_v62.swift
#	PodHavenTests/MigrationTests/v62Tests.swift
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants