RefTrace

Referee Decision Tracing via Agentic Graph-of-Thought Reasoning over Sports Broadcast Video

This repository contains the reference implementation for the RefTrace research paper. It provides the complete agentic reasoning architecture, training pipeline, evaluation framework, and data preparation tools described in the paper.

RefTrace is the second paper in the sports adjudication trilogy:

RuleGround — Perception to predicates: grounds raw video into structured game-state predicates
RefTrace (this repo) — Evidence to traces: generates verifiable reasoning traces over a rule knowledge graph
StateTrace — Traces to state transitions: adds explicit state bottleneck for per-step verification

Overview

RefTrace is a sport-agnostic agentic framework for automated referee decision analysis. The system combines a Game-State Trace Hypergraph (GSTH) -- a structured knowledge representation encoding sport rules, game state, and temporal play context -- with an Agentic Graph-of-Thought (AGoT) reasoning process. A vision-language policy (Qwen3-VL + heterogeneous GAT) navigates the GSTH through a typed action space, selecting relevant subgraphs and applying rule logic to reach justified penalty/no-penalty decisions. Training uses a two-stage pipeline: supervised fine-tuning (SFT) on expert traces, followed by Grounded Reinforcement Policy Optimization (GRPO) with a 7-component Normalized Trace Reward (NTR). We validate on NFL broadcast video using the NFL-MH benchmark.

Key Results on NFL-MH-Core

Configuration	VTA	DA	RC-F1
RefTrace-7B	72.1	89.2	--
w/o NTR (ablation)	61.7	--	--
GRPO-only	52.3	--	--

Architecture

---
config:
  layout: elk
  look: neo
  theme: neo
---
flowchart TB
    Q["<b>Query</b><br>(play description)"] --> QE
    Video["<b>Video Frames</b><br>[B, T, C, H, W]"] --> VLM

    subgraph Policy ["RefTracePolicy"]
        VLM["<b>QwenVLBackbone</b><br>Qwen3-VL-8B"] --> GE["<b>GraphEncoder</b><br>Heterogeneous GAT"]
        QE["<b>QueryEncoder</b><br>Sentence-T5"]
        HE["<b>HistoryEncoder</b><br>Transformer"]
        GE & QE & HE --> F["<b>FusionLayer</b>"]
        F --> AH["<b>ActionHead</b>"]
    end

    subgraph Act ["Action Space"]
        direction LR
        A1["GET_RULE"] & A2["GET_STATE"] & A3["GET_EVENTS"]
        A4["GET_VIDEO"] & A5["VERIFY_TEMPORAL"] & A6["STOP"]
    end

    AH --> Act
    Act -->|"execute"| GSTH["<b>GSTH</b><br>Game-State Trace Hypergraph<br>4 node types · 6 edge types"]
    GSTH -->|"result"| HE
    A6 -->|"decision"| D["<b>CALL_CORRECT · CALL_INCORRECT · NO_FOUL</b>"]

Quick Start

# Clone and install
git clone https://github.com/sreevadde/reftrace.git && cd reftrace
pip install -e ".[dev]"

# Build the Game-State Trace Hypergraph
reftrace build-gsth --config configs/base.yaml

# Train Stage 1: Supervised Fine-Tuning
reftrace train --config configs/training/sft.yaml

# Train Stage 2: GRPO with NTR
reftrace train --config configs/training/grpo.yaml

# Evaluate on NFL-MH-Core
reftrace eval --config configs/nfl/core.yaml \
    --checkpoint outputs/grpo/grpo_checkpoint_final.pt

Python API

from reftrace.models import RefTracePolicy
from reftrace.graph import GSTH

# Load a pre-built GSTH
gsth = GSTH.load("data/gsth/gsth.pkl")
pyg_data = gsth.to_pyg()

# Build the policy from config
policy = RefTracePolicy.from_config(cfg)

# Run a single reasoning step
dist = policy(
    query="Was the defensive pass interference call correct?",
    gsth_data=pyg_data,
    video_frames=video_tensor,      # (B, T, C, H, W)
    history_pairs=history_tensor,    # (B, T_hist, 2*D)
)

# Sample an action
action = policy.select_action(
    query="Was the defensive pass interference call correct?",
    gsth_data=pyg_data,
    video_frames=video_tensor,
)

Configuration

RefTrace uses OmegaConf for hierarchical YAML configuration with CLI overrides.

configs/
├── base.yaml               # Default hyperparameters for all components
├── model/
│   ├── base.yaml            # Qwen3-VL-8B (default) + LoRA rank 64
│   ├── large.yaml           # Qwen3-VL-32B
│   ├── small.yaml           # Qwen3-VL-2B
│   ├── qwen25.yaml          # Qwen2.5-VL-7B (paper baseline, for reproducibility)
│   └── qwen35.yaml          # Qwen3.5-9B (latest, unified VL)
├── training/
│   ├── sft.yaml             # Stage 1: supervised fine-tuning
│   └── grpo.yaml            # Stage 2: GRPO with NTR
└── nfl/
    ├── core.yaml            # NFL-MH-Core (frame-accurate, expert-labeled)
    ├── auto.yaml            # NFL-MH-Auto (broadcast-scale, auto-extracted)
    └── combined.yaml        # Combined Core + Auto

Override any parameter from the command line:

reftrace train --config configs/training/sft.yaml \
    --set training.lr=1e-5 \
    --set training.batch_size=8

Reproduction

Hardware Requirements

Training: 4x NVIDIA A100 80GB (SFT ~6 hours, GRPO ~10 hours)
Inference: 1x A100 40GB (or 2x A6000)
GSTH construction: CPU-only, ~15 minutes

Reproducing Paper Results

# Build GSTH from play-by-play data
reftrace build-gsth --config configs/base.yaml

# Stage 1: SFT on expert reasoning traces
reftrace train --config configs/training/sft.yaml

# Stage 2: GRPO with Normalized Trace Reward
reftrace train --config configs/training/grpo.yaml

# Evaluate on NFL-MH-Core test set
reftrace eval --config configs/nfl/core.yaml \
    --checkpoint outputs/grpo/grpo_checkpoint_final.pt

# Evaluate on NFL-MH-Auto test set
reftrace eval --config configs/nfl/auto.yaml \
    --checkpoint outputs/grpo/grpo_checkpoint_final.pt

Results are deterministic given fixed seeds (training.seed=42). Use --set training.seed={43,44} for the additional seeds reported in the paper.

Citation

@article{vadde2026reftrace,
  title   = {RefTrace: Referee Decision Tracing via Agentic Graph-of-Thought
             Reasoning over Sports Broadcast Video},
  author  = {Vadde, Sree Krishna},
  journal = {arXiv preprint},
  year    = {2026}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
configs		configs
reftrace		reftrace
tests		tests
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
README.md		README.md
RefTrace.pdf		RefTrace.pdf
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RefTrace

Overview

Key Results on NFL-MH-Core

Architecture

Quick Start

Python API

Configuration

Reproduction

Hardware Requirements

Reproducing Paper Results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RefTrace

Overview

Key Results on NFL-MH-Core

Architecture

Quick Start

Python API

Configuration

Reproduction

Hardware Requirements

Reproducing Paper Results

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages