Skip to content

feat: video transcript prototype — generate tests from screen recordings#81

Open
aidenybai wants to merge 3 commits intomainfrom
feat/video-transcript-prototype
Open

feat: video transcript prototype — generate tests from screen recordings#81
aidenybai wants to merge 3 commits intomainfrom
feat/video-transcript-prototype

Conversation

@aidenybai
Copy link
Copy Markdown
Member

@aidenybai aidenybai commented Apr 5, 2026

Summary

  • Adds apps/video-transcript/ — a standalone CLI prototype that extracts structured interaction transcripts from screen recordings using Gemini 2.5 Flash via AI SDK (@ai-sdk/google)
  • Pipeline: video file → ffmpeg idle-time cutting (frame diff heuristics) → Gemini transcript extraction → structured output
  • Designed to compound with the existing git diff context so the test agent gets both "what changed in code" and "how the feature works" from a video demo
  • Updates .specs/video-transcript.md to use AI SDK instead of @google/genai

Usage

# Full pipeline
GOOGLE_GENERATIVE_AI_API_KEY=... node apps/video-transcript/dist/index.mjs ./demo.mp4

# Output to file
GOOGLE_GENERATIVE_AI_API_KEY=... node apps/video-transcript/dist/index.mjs ./demo.mp4 -o transcript.md

# Just the activity timeline (no Gemini call)
node apps/video-transcript/dist/index.mjs ./demo.mp4 --timeline-only

# Skip ffmpeg preprocessing
GOOGLE_GENERATIVE_AI_API_KEY=... node apps/video-transcript/dist/index.mjs ./demo.mp4 --no-preprocess

Test plan

  • Record a short screen recording of interacting with a web app
  • Run with --timeline-only to verify ffmpeg frame analysis detects active/idle segments
  • Run full pipeline with GOOGLE_GENERATIVE_AI_API_KEY set to verify Gemini transcript extraction
  • Run with --no-preprocess to verify raw video upload fallback
  • Verify graceful error when ffmpeg is not installed
  • Verify clear error message when API key is missing

Note

Medium Risk
Adds a new CLI that shells out to ffmpeg and sends video data to an AI model via Vercel AI Gateway; failures or platform differences (ffmpeg availability, large files, env vars) could affect reliability. Also expands published dependency surfaces for the main CLI/SDK and adds tests to prevent missing runtime deps in bundled outputs.

Overview
Adds a new apps/video-transcript standalone CLI that validates video inputs, optionally preprocesses recordings via ffmpeg frame-diff analysis to trim idle/keep scene changes, and then calls generateText through @ai-sdk/gateway (Gemini 2.5 Flash) to produce a structured interaction transcript.

Updates the spec to use mediaType for AI SDK file parts, adds unit tests covering activity analysis/prompting/transcript extraction, and introduces "runtime dependency safety" tests plus new dependencies in apps/cli and packages/typescript-sdk to ensure bundled dist/ runtime-resolved packages are declared.

Reviewed by Cursor Bugbot for commit be73bdc. Bugbot is set up for automated code reviews on this repo. Configure here.


Summary by cubic

Prototype CLI to turn screen recordings into structured interaction transcripts for test generation. Uses ffmpeg to trim idle time and routes requests via Vercel AI Gateway (requires AI_GATEWAY_API_KEY, replacing GOOGLE_GENERATIVE_AI_API_KEY).

  • New Features

    • Adds apps/video-transcript/ CLI: video → ffmpeg activity analysis/trim → transcript via @ai-sdk/gateway + ai.
    • Flags: --timeline-only, --no-preprocess, -o/--output, --verbose; supports .mp4, .webm, .mov, .avi, .mkv; clear errors for missing ffmpeg or API key.
    • Adds tests for frame diff/segment classification/timeline formatting, prompt building, and extraction; updates .specs/video-transcript.md to use mediaType.
  • Bug Fixes

    • Prevent pnpm runtime resolution failures by declaring runtime-resolved deps in apps/cli and packages/typescript-sdk and adding dist-scanning tests to enforce declarations (e.g., @github/copilot, @google/gemini-cli, accessibility-checker-engine).

Written for commit be73bdc. Summary will update on new commits.

… recordings

Adds apps/video-transcript — a standalone CLI that extracts structured
interaction transcripts from screen recordings using Gemini 2.5 Flash
via AI SDK. Designed to compound with git diff context so the test
agent gets both "what changed" and "how the feature works."

Pipeline: video → ffmpeg idle-time cutting → Gemini transcript extraction

Updates the video-transcript spec to use AI SDK (@ai-sdk/google) instead
of @google/genai, matching the existing codebase dependency surface.
@vercel
Copy link
Copy Markdown
Contributor

vercel bot commented Apr 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
expect Ready Ready Preview, Comment Apr 5, 2026 6:09am

@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new bot commented Apr 5, 2026

Open in StackBlitz

npm i https://pkg.pr.new/expect-cli@81

commit: be73bdc

- Replace @ai-sdk/google with @ai-sdk/gateway for provider-agnostic
  model routing via AI_GATEWAY_API_KEY
- Add 24 tests covering activity-analyzer (frame diff, segment
  classification, timeline formatting), transcript-prompt (base prompt,
  timeline appending), and extract-transcript (gateway model, file parts,
  timeline inclusion, response handling)
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 10 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="apps/video-transcript/src/index.ts">

<violation number="1" location="apps/video-transcript/src/index.ts:115">
P1: `--timeline-only` is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

}
}

console.error(pc.cyan("Extracting transcript via Gemini 2.5 Flash..."));
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: --timeline-only is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At apps/video-transcript/src/index.ts, line 115:

<comment>`--timeline-only` is ignored when ffmpeg is unavailable, so the CLI still attempts transcript extraction instead of exiting after timeline handling.</comment>

<file context>
@@ -0,0 +1,128 @@
+      }
+    }
+
+    console.error(pc.cyan("Extracting transcript via Gemini 2.5 Flash..."));
+
+    const transcript = await extractTranscript(processedVideoPath, timeline);
</file context>
Fix with Cubic

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 4 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit ce7912b. Configure here.

} else {
console.log(output);
}
return;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--timeline-only ignored without ffmpeg or with --no-preprocess

High Severity

The --timeline-only early return is nested inside if (hasFfmpeg) within if (options.preprocess), so it's only reachable when both conditions are true. When ffmpeg is unavailable or --no-preprocess is passed, the flag is silently ignored and the code falls through to extractTranscript, which requires AI_GATEWAY_API_KEY — a key the earlier check at line 54 explicitly skipped validating for timelineOnly mode. This causes an unhandled API error for users who only want the timeline.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ce7912b. Configure here.

const frameCount = await extractFrames(videoPath, framesDir);
if (frameCount < 2) return [{ type: "active", startSeconds: 0, endSeconds: frameCount }];

const frameSize = 320 * 180;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable frameSize declared but never read

Low Severity

const frameSize = 320 * 180 is assigned but never referenced anywhere. It appears to be a leftover from a planned validation step (e.g., verifying raw frame buffer sizes match the expected dimensions). This is dead code that adds confusion about whether frame size checking was intentionally omitted.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ce7912b. Configure here.

"-vf",
`select='${selectFilter}',setpts=N/FRAME_RATE/TB`,
"-af",
`aselect='${selectFilter}',asetpts=N/SR/TB`,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffmpeg -af crashes on videos without audio

Medium Severity

buildTrimmedVideo unconditionally passes -af aselect='...',asetpts=N/SR/TB to ffmpeg. When the input video has no audio stream — common for screen recordings (macOS screenshot tool, many Linux tools, etc.) — ffmpeg fails with a "matches no streams" error. This crashes the CLI during the trim step, preventing transcript extraction for audio-less recordings even though only the video track is needed.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit ce7912b. Configure here.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 5, 2026

Test Results

❌ Website Test: failed

11 passed, 5 failed out of 16 steps — 582s

Step Status Duration
Homepage loads — hero section and install commands visible ✅ passed 26s
View demo — navigates to /replay?demo=true and replay player loads ❌ failed 78s
Replay controls — play/pause, speed selector, and step list ❌ failed 116s
Copy button — clipboard contains expected install command ✅ passed 33s
Theme toggle — dark mode changes background, light mode restores it ✅ passed 21s
Footer links — GitHub and X with correct URLs and target="_blank" ✅ passed 24s
Legal page /terms loads with text content ✅ passed 6s
Legal page /privacy loads with text content ✅ passed 4s
Legal page /security loads with text content ✅ passed 18s
Mobile viewport 375×812 — no horizontal scroll, key content visible ✅ passed 39s
Accessibility audit (WCAG) ❌ failed 44s
Performance metrics ✅ passed 44s
Tablet viewport (768×1024) — no overflow, layout intact ✅ passed 10s
WebKit cross-browser — homepage + View demo + copy button ✅ passed 42s
Project healthcheck — pnpm check ❌ failed 13s
Replay time display — verify current time does not exceed total duration ❌ failed 58s

Session Recording

https://github.com/millionco/expect/releases/download/ci-pr-81/d5849a2e310483443dd0b5f534f80f7b.webm


Workflow run #307 | 📎 Download all recordings

const frameCount = await extractFrames(videoPath, framesDir);
if (frameCount < 2) return [{ type: "active", startSeconds: 0, endSeconds: frameCount }];

const frameSize = 320 * 180;
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused variable frameSize is declared but never referenced

Fix on Vercel

export const formatTimeline = (timeline: ActivityTimeline): string => {
const formatTime = (seconds: number): string => {
const minutes = Math.floor(seconds / 60);
const secs = seconds % 60;
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatTime function produces malformed time strings when receiving fractional seconds (e.g., "00:5.5" instead of "00:05")

Fix on Vercel

}
}

const segments: ActivitySegment[] = [];
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop condition accesses undefined array element causing off-by-one error in segment boundary calculation

Fix on Vercel


let processedVideoPath = videoPath;
let timeline: Awaited<ReturnType<typeof analyzeActivity>> | undefined;

Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling in async action callback allows unhandled promise rejections when analyzeActivity, buildTrimmedVideo, or extractTranscript fail

Fix on Vercel

"Error: AI_GATEWAY_API_KEY environment variable is required for transcript extraction.",
),
);
process.exit(1);
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --timeline-only flag is ignored when ffmpeg is unavailable or --no-preprocess is set, causing transcript extraction to proceed instead of exiting after timeline generation

Fix on Vercel

}
};

export const buildTrimmedVideo = async (
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource leak: buildTrimmedVideo creates a temporary directory with mkdtempSync but never cleans it up, causing indefinite disk space accumulation

Fix on Vercel

export const extractTranscript = async (
videoPath: string,
timeline: ActivityTimeline | undefined,
): Promise<string> => {
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

readFileSync loads entire video file into memory, causing memory pressure and potential OOM errors for large files with Vercel AI Gateway

Fix on Vercel

"-vf",
`select='${selectFilter}',setpts=N/FRAME_RATE/TB`,
"-af",
`aselect='${selectFilter}',asetpts=N/SR/TB`,
Copy link
Copy Markdown
Contributor

@vercel vercel bot Apr 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ffmpeg command uses -af audio filter unconditionally, causing failure on videos without audio streams

Fix on Vercel

The bundler inlines source from @expect/agent but leaves dynamic
require.resolve() calls intact. Consumers with strict node_modules
(pnpm) cannot resolve these at runtime unless they are declared as
dependencies in the published package.json.

Adds runtime-deps tests to both packages that scan the built dist
for require.resolve() targets and fail if any are undeclared.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant