[feat] VLM as judge for WM#1429
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🔴 PR merge requirementsWaiting for
This rule is failing.
|
This comment was marked as resolved.
This comment was marked as resolved.
There was a problem hiding this comment.
Code Review
This pull request introduces the judge.third_person_separation pairwise VLM metric using Gemini, along with an evaluation script, documentation, and dependency configurations. Feedback focuses on improving API robustness by falling back to a tie on failure, removing self.k from the cache path hash to enable proper cache reuse, using a deterministic hash for reproducible counterbalancing, and gracefully handling missing scores in the evaluation script output.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Purpose
Adds a
judge.third_person_separationmetric tofastvideo.eval: a pairwise Gemini judge that, given two rollouts from the same first frame + control signal, picks the one that better separates the third-person character (foreground) from the background, and reports it as a candidate-vs-reference win-rate.Checklist
pre-commit run --all-filesand fixed all issues