feat: video transcript prototype — generate tests from screen recordings#81
feat: video transcript prototype — generate tests from screen recordings#81
Conversation
… recordings Adds apps/video-transcript — a standalone CLI that extracts structured interaction transcripts from screen recordings using Gemini 2.5 Flash via AI SDK. Designed to compound with git diff context so the test agent gets both "what changed" and "how the feature works." Pipeline: video → ffmpeg idle-time cutting → Gemini transcript extraction Updates the video-transcript spec to use AI SDK (@ai-sdk/google) instead of @google/genai, matching the existing codebase dependency surface.
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
commit: |
- Replace @ai-sdk/google with @ai-sdk/gateway for provider-agnostic model routing via AI_GATEWAY_API_KEY - Add 24 tests covering activity-analyzer (frame diff, segment classification, timeline formatting), transcript-prompt (base prompt, timeline appending), and extract-transcript (gateway model, file parts, timeline inclusion, response handling)
There was a problem hiding this comment.
1 issue found across 10 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="apps/video-transcript/src/index.ts">
<violation number="1" location="apps/video-transcript/src/index.ts:115">
P1: `--timeline-only` is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
apps/video-transcript/src/index.ts
Outdated
| } | ||
| } | ||
|
|
||
| console.error(pc.cyan("Extracting transcript via Gemini 2.5 Flash...")); |
There was a problem hiding this comment.
P1: --timeline-only is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/video-transcript/src/index.ts, line 115:
<comment>`--timeline-only` is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.</comment>
<file context>
@@ -0,0 +1,128 @@
+ }
+ }
+
+ console.error(pc.cyan("Extracting transcript via Gemini 2.5 Flash..."));
+
+ const transcript = await extractTranscript(processedVideoPath, timeline);
</file context>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 4 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit ce7912b. Configure here.
| } else { | ||
| console.log(output); | ||
| } | ||
| return; |
There was a problem hiding this comment.
--timeline-only ignored without ffmpeg or with --no-preprocess
High Severity
The --timeline-only early return is nested inside if (hasFfmpeg) within if (options.preprocess), so it's only reachable when both conditions are true. When ffmpeg is unavailable or --no-preprocess is passed, the flag is silently ignored and the code falls through to extractTranscript, which requires AI_GATEWAY_API_KEY — a key the earlier check at line 54 explicitly skipped validating for timelineOnly mode. This causes an unhandled API error for users who only want the timeline.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ce7912b. Configure here.
| const frameCount = await extractFrames(videoPath, framesDir); | ||
| if (frameCount < 2) return [{ type: "active", startSeconds: 0, endSeconds: frameCount }]; | ||
|
|
||
| const frameSize = 320 * 180; |
There was a problem hiding this comment.
Unused variable frameSize declared but never read
Low Severity
const frameSize = 320 * 180 is assigned but never referenced anywhere. It appears to be a leftover from a planned validation step (e.g., verifying raw frame buffer sizes match the expected dimensions). This is dead code that adds confusion about whether frame size checking was intentionally omitted.
Reviewed by Cursor Bugbot for commit ce7912b. Configure here.
| "-vf", | ||
| `select='${selectFilter}',setpts=N/FRAME_RATE/TB`, | ||
| "-af", | ||
| `aselect='${selectFilter}',asetpts=N/SR/TB`, |
There was a problem hiding this comment.
ffmpeg -af crashes on videos without audio
Medium Severity
buildTrimmedVideo unconditionally passes -af aselect='...',asetpts=N/SR/TB to ffmpeg. When the input video has no audio stream — common for screen recordings (macOS screenshot tool, many Linux tools, etc.) — ffmpeg fails with a "matches no streams" error. This crashes the CLI during the trim step, preventing transcript extraction for audio-less recordings even though only the video track is needed.
Reviewed by Cursor Bugbot for commit ce7912b. Configure here.
Test Results❌ Website Test: failed11 passed, 5 failed out of 16 steps — 582s
Session Recordinghttps://github.com/millionco/expect/releases/download/ci-pr-81/d5849a2e310483443dd0b5f534f80f7b.webm |
| const frameCount = await extractFrames(videoPath, framesDir); | ||
| if (frameCount < 2) return [{ type: "active", startSeconds: 0, endSeconds: frameCount }]; | ||
|
|
||
| const frameSize = 320 * 180; |
| export const formatTimeline = (timeline: ActivityTimeline): string => { | ||
| const formatTime = (seconds: number): string => { | ||
| const minutes = Math.floor(seconds / 60); | ||
| const secs = seconds % 60; |
| } | ||
| } | ||
|
|
||
| const segments: ActivitySegment[] = []; |
|
|
||
| let processedVideoPath = videoPath; | ||
| let timeline: Awaited<ReturnType<typeof analyzeActivity>> | undefined; | ||
|
|
| "Error: AI_GATEWAY_API_KEY environment variable is required for transcript extraction.", | ||
| ), | ||
| ); | ||
| process.exit(1); |
| } | ||
| }; | ||
|
|
||
| export const buildTrimmedVideo = async ( |
| export const extractTranscript = async ( | ||
| videoPath: string, | ||
| timeline: ActivityTimeline | undefined, | ||
| ): Promise<string> => { |
| "-vf", | ||
| `select='${selectFilter}',setpts=N/FRAME_RATE/TB`, | ||
| "-af", | ||
| `aselect='${selectFilter}',asetpts=N/SR/TB`, |
The bundler inlines source from @expect/agent but leaves dynamic require.resolve() calls intact. Consumers with strict node_modules (pnpm) cannot resolve these at runtime unless they are declared as dependencies in the published package.json. Adds runtime-deps tests to both packages that scan the built dist for require.resolve() targets and fail if any are undeclared.


Summary
apps/video-transcript/— a standalone CLI prototype that extracts structured interaction transcripts from screen recordings using Gemini 2.5 Flash via AI SDK (@ai-sdk/google).specs/video-transcript.mdto use AI SDK instead of@google/genaiUsage
Test plan
--timeline-onlyto verify ffmpeg frame analysis detects active/idle segmentsGOOGLE_GENERATIVE_AI_API_KEYset to verify Gemini transcript extraction--no-preprocessto verify raw video upload fallbackNote
Medium Risk
Adds a new CLI that shells out to
ffmpegand sends video data to an AI model via Vercel AI Gateway; failures or platform differences (ffmpeg availability, large files, env vars) could affect reliability. Also expands published dependency surfaces for the main CLI/SDK and adds tests to prevent missing runtime deps in bundled outputs.Overview
Adds a new
apps/video-transcriptstandalone CLI that validates video inputs, optionally preprocesses recordings viaffmpegframe-diff analysis to trim idle/keep scene changes, and then callsgenerateTextthrough@ai-sdk/gateway(Gemini 2.5 Flash) to produce a structured interaction transcript.Updates the spec to use
mediaTypefor AI SDK file parts, adds unit tests covering activity analysis/prompting/transcript extraction, and introduces "runtime dependency safety" tests plus new dependencies inapps/cliandpackages/typescript-sdkto ensure bundleddist/runtime-resolved packages are declared.Reviewed by Cursor Bugbot for commit be73bdc. Bugbot is set up for automated code reviews on this repo. Configure here.
Summary by cubic
Prototype CLI to turn screen recordings into structured interaction transcripts for test generation. Uses ffmpeg to trim idle time and routes requests via Vercel AI Gateway (requires
AI_GATEWAY_API_KEY, replacingGOOGLE_GENERATIVE_AI_API_KEY).New Features
apps/video-transcript/CLI: video → ffmpeg activity analysis/trim → transcript via@ai-sdk/gateway+ai.--timeline-only,--no-preprocess,-o/--output,--verbose; supports.mp4,.webm,.mov,.avi,.mkv; clear errors for missing ffmpeg or API key..specs/video-transcript.mdto usemediaType.Bug Fixes
apps/cliandpackages/typescript-sdkand adding dist-scanning tests to enforce declarations (e.g.,@github/copilot,@google/gemini-cli,accessibility-checker-engine).Written for commit be73bdc. Summary will update on new commits.