Skip to content

open_file: return immediately with best-available recording (fix 18s timeout)#353

Merged
sonichi merged 1 commit intomainfrom
fix/open-file-nonblocking
Apr 15, 2026
Merged

open_file: return immediately with best-available recording (fix 18s timeout)#353
sonichi merged 1 commit intomainfrom
fix/open-file-nonblocking

Conversation

@sonichi
Copy link
Copy Markdown
Owner

@sonichi sonichi commented Apr 15, 2026

Summary

Fixes the root cause of "I couldn't find the recording at the standard location" on phone calls. The 18s polling loop waiting for the subtitled burn-in exceeded Gemini Live's tool-call timeout — the tool got cancelled mid-retry and the model reported a false negative even while the narrated file was sitting on disk.

  • Drops the 10-iteration retry loop.
  • findRecording() already returns best-available in priority order (subtitled > narrated > raw), so one synchronous call is enough.
  • Adds a subtitled_pending flag + a version field to the return payload.
  • When subtitled is pending, the returned instruction tells the model to proactively say: "I opened the narrated version. Subtitles are still being generated — want me to switch to the subtitled version when it's ready?" If the user says yes, the model waits ~30 seconds and calls open_file again, which picks up the subtitled version once the burn-in finishes.

What this does NOT do (yet)

The async-notification half of the owner's ask ("can we make the wait async?") — i.e., voice agent proactively tells user when subtitled is ready without the user asking. That's a bigger feature: needs a mid-call signal channel to voice-agent, plus background polling with dedicated state. Post-flood I'm cautious about any results/ iteration path, so I'm pausing that work and flagging it for a design conversation before shipping.

This PR gives the user-driven retry pattern — eliminates the timeout failure, restores a workable UX immediately.

Test plan

  • npx tsc --noEmit --skipLibCheck clean
  • Manual: restart voice-agent, start a recording, ask "open it" — tool returns within ~100ms instead of 18s, model speaks the pending message
  • Manual: call open_file again 30s later once subtitled is on disk — tool returns subtitled version

Diff

  • src/recording-tools.ts +33/-13

References

  • Phone-call diagnosis (with 18s timeout root cause): Discord thread 22:42 local
  • Owner directive: "let users know the subtitled file is not ready and ask them whether they want to wait" (Discord 22:02 local)
  • Prior meeting notes that captured this as a latent bug: notes/meetings/task-summary-1776292611357.md

🤖 Generated with Claude Code

Fixes #356

Fixes the root cause of "I couldn't find the recording at the standard
location" during phone calls that owner diagnosed earlier today. The
18-second polling loop waiting for the subtitled version exceeded
Gemini Live's tool-call timeout, so the tool got cancelled mid-retry
and the model reported a false negative even when the narrated file
was on disk.

### What changes

- Remove the 10-iteration retry loop. `findRecording()` already
  returns the best-available version in priority order (subtitled >
  narrated > raw), so one synchronous call is enough.
- Compute a `subtitled_pending` flag when the returned version is not
  already subtitled AND it's a sutando recording (i.e. the background
  subtitle burn might still be running).
- When `subtitled_pending` is true, the returned `instruction` string
  tells the model to proactively inform the user: "I opened the
  narrated version. Subtitles are still being generated — want me to
  switch to the subtitled version when it's ready?" If the user says
  yes, the model can wait ~30 seconds and call open_file again, which
  will pick up the subtitled version once the burn-in finishes.
- Return payload gains a `version` field (`subtitled | narrated | raw`)
  so the model knows exactly what it's telling the user.

### What this doesn't do yet

**Async notification**: owner also asked "can we make the wait async?"
i.e., can the voice agent proactively tell the user when subtitled is
ready without the user having to ask. That's a bigger feature — it
needs a signal channel to voice-agent mid-call (post-flood we're
cautious about the results/ iteration path), plus background polling
with a dedicated state file. Pausing that work here and asking owner
for a design discussion first.

The current PR gives owner the non-blocking behavior + "user-driven
retry" pattern, which eliminates the timeout failure and ships a
workable UX immediately.

### Test plan

- [x] `npx tsc --noEmit --skipLibCheck` clean
- [ ] Manual: restart voice-agent, start a recording, ask "open it" —
      tool returns within ~100ms instead of 18s, model speaks the
      subtitled_pending message if applicable
- [ ] Manual: call open_file again 30s later with subtitled now on
      disk — tool returns subtitled version

### References

- Phone call diagnosis: `notes/meetings/task-summary-1776289358782.md`,
  `notes/meetings/task-summary-1776292611357.md`
- Reply with diagnosis: in result text at 22:42 local
- Owner design directive: "let users know the subtitled file is not
  ready and ask them whether they want to wait; if they do, can we
  make the wait async?" (via Discord 22:02 local)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@sonichi sonichi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MacBook review: LGTM. Clean fix — removes the 18s blocking poll, returns immediately with best-available version, flags subtitled_pending for model-driven retry. The retry is now in the prompt instruction, not a blocking loop. Well within Gemini's tool-call timeout. No regressions to normal open_file path.

@sonichi sonichi merged commit 2be13be into main Apr 15, 2026
1 check passed
@sonichi sonichi deleted the fix/open-file-nonblocking branch April 15, 2026 23:16
liususan091219 added a commit that referenced this pull request Apr 16, 2026
…#354

Each script reproduces the bug (before the fix) and verifies it's resolved
(after the fix). All POCs pass on current main.

- poc-pr353-open-file.sh (11/11) — 18s polling timeout in open_file
- poc-pr355-subtitled-pending.sh (9/9) — false positive subtitled_pending
- poc-pr332-team-tier-revert.sh (9/9) — team-tier -C /tmp broke codex
- poc-pr325-bodhi-dep.sh (7/7) — bodhi dep pointed at deleted repo
- poc-pr354-retention-sweep.sh — retention sweep for stale results

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@liususan091219
Copy link
Copy Markdown
Collaborator

POC: bash scripts/poc-pr353-open-file.sh (11/11 pass). Script in PR #358. Issue: #356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

open_file 18s polling timeout causes false 'recording not found' on phone calls

2 participants