open_file: return immediately with best-available recording (fix 18s timeout)#353
Merged
open_file: return immediately with best-available recording (fix 18s timeout)#353
Conversation
Fixes the root cause of "I couldn't find the recording at the standard
location" during phone calls that owner diagnosed earlier today. The
18-second polling loop waiting for the subtitled version exceeded
Gemini Live's tool-call timeout, so the tool got cancelled mid-retry
and the model reported a false negative even when the narrated file
was on disk.
### What changes
- Remove the 10-iteration retry loop. `findRecording()` already
returns the best-available version in priority order (subtitled >
narrated > raw), so one synchronous call is enough.
- Compute a `subtitled_pending` flag when the returned version is not
already subtitled AND it's a sutando recording (i.e. the background
subtitle burn might still be running).
- When `subtitled_pending` is true, the returned `instruction` string
tells the model to proactively inform the user: "I opened the
narrated version. Subtitles are still being generated — want me to
switch to the subtitled version when it's ready?" If the user says
yes, the model can wait ~30 seconds and call open_file again, which
will pick up the subtitled version once the burn-in finishes.
- Return payload gains a `version` field (`subtitled | narrated | raw`)
so the model knows exactly what it's telling the user.
### What this doesn't do yet
**Async notification**: owner also asked "can we make the wait async?"
i.e., can the voice agent proactively tell the user when subtitled is
ready without the user having to ask. That's a bigger feature — it
needs a signal channel to voice-agent mid-call (post-flood we're
cautious about the results/ iteration path), plus background polling
with a dedicated state file. Pausing that work here and asking owner
for a design discussion first.
The current PR gives owner the non-blocking behavior + "user-driven
retry" pattern, which eliminates the timeout failure and ships a
workable UX immediately.
### Test plan
- [x] `npx tsc --noEmit --skipLibCheck` clean
- [ ] Manual: restart voice-agent, start a recording, ask "open it" —
tool returns within ~100ms instead of 18s, model speaks the
subtitled_pending message if applicable
- [ ] Manual: call open_file again 30s later with subtitled now on
disk — tool returns subtitled version
### References
- Phone call diagnosis: `notes/meetings/task-summary-1776289358782.md`,
`notes/meetings/task-summary-1776292611357.md`
- Reply with diagnosis: in result text at 22:42 local
- Owner design directive: "let users know the subtitled file is not
ready and ask them whether they want to wait; if they do, can we
make the wait async?" (via Discord 22:02 local)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi
commented
Apr 15, 2026
Owner
Author
sonichi
left a comment
There was a problem hiding this comment.
MacBook review: LGTM. Clean fix — removes the 18s blocking poll, returns immediately with best-available version, flags subtitled_pending for model-driven retry. The retry is now in the prompt instruction, not a blocking loop. Well within Gemini's tool-call timeout. No regressions to normal open_file path.
liususan091219
added a commit
that referenced
this pull request
Apr 16, 2026
…#354 Each script reproduces the bug (before the fix) and verifies it's resolved (after the fix). All POCs pass on current main. - poc-pr353-open-file.sh (11/11) — 18s polling timeout in open_file - poc-pr355-subtitled-pending.sh (9/9) — false positive subtitled_pending - poc-pr332-team-tier-revert.sh (9/9) — team-tier -C /tmp broke codex - poc-pr325-bodhi-dep.sh (7/7) — bodhi dep pointed at deleted repo - poc-pr354-retention-sweep.sh — retention sweep for stale results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2 tasks
Collaborator
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the root cause of "I couldn't find the recording at the standard location" on phone calls. The 18s polling loop waiting for the subtitled burn-in exceeded Gemini Live's tool-call timeout — the tool got cancelled mid-retry and the model reported a false negative even while the narrated file was sitting on disk.
findRecording()already returns best-available in priority order (subtitled > narrated > raw), so one synchronous call is enough.subtitled_pendingflag + aversionfield to the return payload.instructiontells the model to proactively say: "I opened the narrated version. Subtitles are still being generated — want me to switch to the subtitled version when it's ready?" If the user says yes, the model waits ~30 seconds and callsopen_fileagain, which picks up the subtitled version once the burn-in finishes.What this does NOT do (yet)
The async-notification half of the owner's ask ("can we make the wait async?") — i.e., voice agent proactively tells user when subtitled is ready without the user asking. That's a bigger feature: needs a mid-call signal channel to voice-agent, plus background polling with dedicated state. Post-flood I'm cautious about any
results/iteration path, so I'm pausing that work and flagging it for a design conversation before shipping.This PR gives the user-driven retry pattern — eliminates the timeout failure, restores a workable UX immediately.
Test plan
npx tsc --noEmit --skipLibCheckcleanopen_fileagain 30s later once subtitled is on disk — tool returns subtitled versionDiff
src/recording-tools.ts+33/-13References
notes/meetings/task-summary-1776292611357.md🤖 Generated with Claude Code
Fixes #356