Referee Decision Tracing via Agentic Graph-of-Thought Reasoning over Sports Broadcast Video
This repository contains the reference implementation for the RefTrace research paper. It provides the complete agentic reasoning architecture, training pipeline, evaluation framework, and data preparation tools described in the paper.
RefTrace is the second paper in the sports adjudication trilogy:
- RuleGround — Perception to predicates: grounds raw video into structured game-state predicates
- RefTrace (this repo) — Evidence to traces: generates verifiable reasoning traces over a rule knowledge graph
- StateTrace — Traces to state transitions: adds explicit state bottleneck for per-step verification
RefTrace is a sport-agnostic agentic framework for automated referee decision analysis. The system combines a Game-State Trace Hypergraph (GSTH) -- a structured knowledge representation encoding sport rules, game state, and temporal play context -- with an Agentic Graph-of-Thought (AGoT) reasoning process. A vision-language policy (Qwen3-VL + heterogeneous GAT) navigates the GSTH through a typed action space, selecting relevant subgraphs and applying rule logic to reach justified penalty/no-penalty decisions. Training uses a two-stage pipeline: supervised fine-tuning (SFT) on expert traces, followed by Grounded Reinforcement Policy Optimization (GRPO) with a 7-component Normalized Trace Reward (NTR). We validate on NFL broadcast video using the NFL-MH benchmark.
| Configuration | VTA | DA | RC-F1 |
|---|---|---|---|
| RefTrace-7B | 72.1 | 89.2 | -- |
| w/o NTR (ablation) | 61.7 | -- | -- |
| GRPO-only | 52.3 | -- | -- |
---
config:
layout: elk
look: neo
theme: neo
---
flowchart TB
Q["<b>Query</b><br>(play description)"] --> QE
Video["<b>Video Frames</b><br>[B, T, C, H, W]"] --> VLM
subgraph Policy ["RefTracePolicy"]
VLM["<b>QwenVLBackbone</b><br>Qwen3-VL-8B"] --> GE["<b>GraphEncoder</b><br>Heterogeneous GAT"]
QE["<b>QueryEncoder</b><br>Sentence-T5"]
HE["<b>HistoryEncoder</b><br>Transformer"]
GE & QE & HE --> F["<b>FusionLayer</b>"]
F --> AH["<b>ActionHead</b>"]
end
subgraph Act ["Action Space"]
direction LR
A1["GET_RULE"] & A2["GET_STATE"] & A3["GET_EVENTS"]
A4["GET_VIDEO"] & A5["VERIFY_TEMPORAL"] & A6["STOP"]
end
AH --> Act
Act -->|"execute"| GSTH["<b>GSTH</b><br>Game-State Trace Hypergraph<br>4 node types · 6 edge types"]
GSTH -->|"result"| HE
A6 -->|"decision"| D["<b>CALL_CORRECT · CALL_INCORRECT · NO_FOUL</b>"]
# Clone and install
git clone https://github.com/sreevadde/reftrace.git && cd reftrace
pip install -e ".[dev]"
# Build the Game-State Trace Hypergraph
reftrace build-gsth --config configs/base.yaml
# Train Stage 1: Supervised Fine-Tuning
reftrace train --config configs/training/sft.yaml
# Train Stage 2: GRPO with NTR
reftrace train --config configs/training/grpo.yaml
# Evaluate on NFL-MH-Core
reftrace eval --config configs/nfl/core.yaml \
--checkpoint outputs/grpo/grpo_checkpoint_final.ptfrom reftrace.models import RefTracePolicy
from reftrace.graph import GSTH
# Load a pre-built GSTH
gsth = GSTH.load("data/gsth/gsth.pkl")
pyg_data = gsth.to_pyg()
# Build the policy from config
policy = RefTracePolicy.from_config(cfg)
# Run a single reasoning step
dist = policy(
query="Was the defensive pass interference call correct?",
gsth_data=pyg_data,
video_frames=video_tensor, # (B, T, C, H, W)
history_pairs=history_tensor, # (B, T_hist, 2*D)
)
# Sample an action
action = policy.select_action(
query="Was the defensive pass interference call correct?",
gsth_data=pyg_data,
video_frames=video_tensor,
)RefTrace uses OmegaConf for hierarchical YAML configuration with CLI overrides.
configs/
├── base.yaml # Default hyperparameters for all components
├── model/
│ ├── base.yaml # Qwen3-VL-8B (default) + LoRA rank 64
│ ├── large.yaml # Qwen3-VL-32B
│ ├── small.yaml # Qwen3-VL-2B
│ ├── qwen25.yaml # Qwen2.5-VL-7B (paper baseline, for reproducibility)
│ └── qwen35.yaml # Qwen3.5-9B (latest, unified VL)
├── training/
│ ├── sft.yaml # Stage 1: supervised fine-tuning
│ └── grpo.yaml # Stage 2: GRPO with NTR
└── nfl/
├── core.yaml # NFL-MH-Core (frame-accurate, expert-labeled)
├── auto.yaml # NFL-MH-Auto (broadcast-scale, auto-extracted)
└── combined.yaml # Combined Core + Auto
Override any parameter from the command line:
reftrace train --config configs/training/sft.yaml \
--set training.lr=1e-5 \
--set training.batch_size=8- Training: 4x NVIDIA A100 80GB (SFT ~6 hours, GRPO ~10 hours)
- Inference: 1x A100 40GB (or 2x A6000)
- GSTH construction: CPU-only, ~15 minutes
# Build GSTH from play-by-play data
reftrace build-gsth --config configs/base.yaml
# Stage 1: SFT on expert reasoning traces
reftrace train --config configs/training/sft.yaml
# Stage 2: GRPO with Normalized Trace Reward
reftrace train --config configs/training/grpo.yaml
# Evaluate on NFL-MH-Core test set
reftrace eval --config configs/nfl/core.yaml \
--checkpoint outputs/grpo/grpo_checkpoint_final.pt
# Evaluate on NFL-MH-Auto test set
reftrace eval --config configs/nfl/auto.yaml \
--checkpoint outputs/grpo/grpo_checkpoint_final.ptResults are deterministic given fixed seeds (training.seed=42). Use --set training.seed={43,44} for the additional seeds reported in the paper.
@article{vadde2026reftrace,
title = {RefTrace: Referee Decision Tracing via Agentic Graph-of-Thought
Reasoning over Sports Broadcast Video},
author = {Vadde, Sree Krishna},
journal = {arXiv preprint},
year = {2026}
}MIT