Skip to content

[feat] VLM as judge for WM#1429

Open
H1yori233 wants to merge 2 commits into
hao-ai-lab:mainfrom
H1yori233:vlm-eval
Open

[feat] VLM as judge for WM#1429
H1yori233 wants to merge 2 commits into
hao-ai-lab:mainfrom
H1yori233:vlm-eval

Conversation

@H1yori233
Copy link
Copy Markdown
Collaborator

Purpose

Adds a judge.third_person_separation metric to fastvideo.eval: a pairwise Gemini judge that, given two rollouts from the same first frame + control signal, picks the one that better separates the third-person character (foreground) from the background, and reports it as a candidate-vs-reference win-rate.

Checklist

  • I ran pre-commit run --all-files and fixed all issues
  • I added or updated tests for my changes
  • I updated documentation if needed
  • I considered GPU memory impact of my changes

@H1yori233 H1yori233 requested a review from mignonjia June 3, 2026 06:51
@mergify mergify Bot added type: feat New feature or capability scope: inference Inference pipeline, serving, CLI labels Jun 3, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Jun 3, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🔴 PR merge requirements

Waiting for

  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
This rule is failing.
  • #approved-reviews-by>=1
  • check-success=fastcheck-passed
  • check-success=full-suite-passed
  • check-success~=pre-commit
  • title~=(?i)^\[(feat|feature|bugfix|fix|refactor|perf|ci|doc|docs|misc|chore|kernel|new.?model|skill|skills|infra)\]

@mergify

This comment was marked as resolved.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the judge.third_person_separation pairwise VLM metric using Gemini, along with an evaluation script, documentation, and dependency configurations. Feedback focuses on improving API robustness by falling back to a tie on failure, removing self.k from the cache path hash to enable proper cache reuse, using a deterministic hash for reproducible counterbalancing, and gracefully handling missing scores in the evaluation script output.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread fastvideo/eval/metrics/judge/third_person_separation/metric.py
Comment thread fastvideo/eval/metrics/judge/third_person_separation/metric.py
Comment thread fastvideo/eval/metrics/judge/third_person_separation/metric.py
Comment thread examples/inference/eval/eval_third_person_separation.py Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

scope: inference Inference pipeline, serving, CLI type: feat New feature or capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant