GitHub - AirKyzzZ/pkvision: PkVision — Open-source AI for automatic parkour trick detection and scoring. See Every Move. Score Every Trick.

See Every Move. Score Every Trick.
Open-source AI for parkour trick detection and scoring using 3D biomechanics.

What is Parkour?

Origins

Parkour was born in the suburbs of Lisses, France, in the late 1980s. Developed by David Belle -- inspired by the military obstacle course training (parcours du combattant) of his father Raymond Belle -- parkour is a discipline of movement. The goal: get from point A to point B as efficiently and fluidly as possible, using only the human body to overcome obstacles.

Over time, the discipline branched into distinct but overlapping practices. Parkour emphasizes efficiency and flow. Freerunning, popularized by Sebastien Foucan, adds acrobatic expression -- flips, twists, and aerial creativity. Tricking focuses purely on acrobatic combinations on flat ground. PkVision covers the competitive and acrobatic side of these disciplines, where trick identification and scoring matter.

The Trick System

Parkour and freerunning tricks follow a compositional naming convention that directly encodes their physics. The name tells you exactly what the body does:

"Back Flip" = backward direction + 1 flip + 0 twists
"Back Double Full" = backward + 1 flip + 2 twists
"Double Cork" = 2 flips + twists + off-axis rotation + one-foot takeoff

Tricks belong to families -- flips (rotation around the lateral axis), twists (rotation around the longitudinal axis), vaults (obstacle-based movements), and transitions (connecting movements). Within each family, tricks build on each other in progression trees:

Back Flip --> Back Full --> Back Double Full --> Back Triple Full
    |             |
  Gainer        Cork --> Double Cork --> Triple Cork
    |
  Webster

A gainer is a backflip with a one-foot running takeoff. A cork adds off-axis rotation. Each variation changes one physical property -- and changes the trick's name. This compositional structure is what makes automated identification possible.

From the Streets to the Olympics

Parkour has grown from a street practice into an internationally recognized sport. The FIG (Federation Internationale de Gymnastique) has taken responsibility for standardizing competition formats, developing a Code of Points, and organizing international events. Parkour has been discussed as a candidate discipline for future Olympic Games.

As competition scales to the international level, automated and reproducible trick notation becomes essential. Human judges need to identify tricks precisely, consistently, and fairly across nations. In acrobatic disciplines like gymnastics, this is already supported by systems like Fujitsu's Judging Support System. But parkour has over 1,800 known tricks -- far more than any gymnastics discipline -- and no open-source system exists to identify them from video.

PkVision fills this gap.

What is PkVision?

PkVision is an open-source system that identifies and scores parkour tricks from video using 3D body mesh reconstruction and zero-shot biomechanical matching. It understands how the human body moves through 3D space -- rotation axes, flip counts, twist counts, body shape, takeoff type -- and matches these measurements against 1,800+ known parkour tricks without needing training videos for each one.

The core idea: A backflip is a 360 degree backward rotation around the lateral axis in a tucked position. If the system can measure those properties in 3D from video, it can name the trick -- even tricks it has never seen before.

Left: Input frame. Center: 2D joint locations (ViTPose). Right: 3D biomechanical analysis overlay.

Why 3D?

Previous approaches (including PkVision v1) used 2D pose estimation, which fundamentally cannot distinguish many tricks:

A cork and a back full look identical from certain camera angles in 2D
Twists are invisible when filmed from the side
Rotation counting is imprecise from 2D projections

By reconstructing a full 3D body mesh from video, all ambiguity disappears. The rotation axis is a 3D vector, not a guess.

How It Works

Video (smartphone, GoPro, competition camera)
  |
  +- Stage 1: 2D Pose Estimation (ViTPose -- 17 COCO keypoints)
  |  Detects the athlete and tracks 17 body joints per frame.
  |
  +- Stage 2: 3D Mesh Recovery (GVHMR -- SMPL body model)
  |  Reconstructs a full 3D body mesh in world coordinates,
  |  aligned with gravity. Outputs per-frame:
  |    - global_orient (3D) -- body orientation as axis-angle
  |    - body_pose (63D) -- 21 joint rotations
  |    - transl (3D) -- world position in meters
  |
  +- Stage 3: Biomechanical Feature Extraction
  |  From the SMPL parameters, computes:
  |    - Swing-twist decomposition --> flip degrees + twist degrees
  |    - Joint angles --> body shape (tuck / pike / layout)
  |    - Ankle analysis --> takeoff type (one-foot / two-foot)
  |    - COM trajectory --> jump height, aerial phase detection
  |
  +- Stage 4: Trick Segmentation
  |  Detects where each trick starts and ends in a full run
  |  using 3D angular velocity peaks + aerial phase detection.
  |
  +- Stage 5: Zero-Shot Trick Matching
  |  Compares the 3D biomechanical signature against 1,800+
  |  trick definitions. No training videos needed per trick.
  |
  +- Stage 6: Scoring (Top 3 by difficulty x confidence)

Key Results

Clip	Measured Flip	Measured Twist	Body Shape	Direction	Takeoff
Backflip	349 deg (1.0 flip)	20 deg (0 twist)	Tuck	Backward	Two-foot
Frontflip	372 deg (1.0 flip)	9 deg (0 twist)	Tuck	Forward	Two-foot
Back Double Full	379 deg (1.1 flip)	798 deg (2.2 twists)	Layout	Backward	Two-foot
Double Cork	703 deg (2.0 flips)	543 deg (1.5 twists)	Layout	Backward	One-foot
Gainer	382 deg (1.1 flip)	25 deg (0 twist)	Tuck	Backward	One-foot

Full pipeline analysis of a backflip: COM trajectory, angular velocity, cumulative rotation, joint angles, 3D trajectory, and biomechanical signature.

The Science

Swing-Twist Decomposition

Every frame, the body's 3D rotation is decomposed into two components:

Swing (flip): Rotation around the body's lateral axis (left-right). A backflip accumulates ~360 deg of swing.
Twist: Rotation around the body's longitudinal axis (head-to-toe). A full twist accumulates ~360 deg.

This separation is the key insight. A back full = 360 deg swing + 360 deg twist. A double cork = 720 deg swing + 540 deg twist + off-axis entry. The system counts degrees, not patterns.

from scipy.spatial.transform import Rotation

# For each consecutive frame pair:
R_delta = R_curr * R_prev.inv()           # Incremental rotation
body_y = R_prev.apply([0, 1, 0])          # Body's longitudinal axis in world
swing, twist = swing_twist_decompose(R_delta, body_y)
cumulative_flip += swing                   # Count flip degrees
cumulative_twist += twist                  # Count twist degrees

Body Shape Classification

Joint angles from the SMPL body model directly reveal body shape during the aerial phase:

Shape	Knee Angle	Hip Angle	Description
Tuck	> 60 deg from rest	> 40 deg from rest	Knees to chest, tight ball
Pike	> 50 deg from rest	< 30 deg from rest	Straight legs, folded at hips
Layout	< 35 deg from rest	< 30 deg from rest	Fully extended body
Open	Layout + arms spread (shoulder > 100 deg)	--	Extended with arms out

Takeoff Detection

The system analyzes pre-takeoff frames to determine entry type:

Two-foot: Symmetric knee/hip angles at takeoff (backflip, front flip)
One-foot: Asymmetric angles -- one leg kicks while the other plants (gainer, cork, webster)
Running: Significant horizontal COM velocity at takeoff
Wall: Horizontal velocity reversal near takeoff (wall push-off)

Zero-Shot Matching

Each trick in the catalog is defined by its physics:

"back_flip":   {"flips": 1.0, "twists": 0.0, "direction": "backward", "takeoff": "two_foot", "shape": "tuck"}
"double_cork": {"flips": 2.0, "twists": 2.0, "direction": "backward", "takeoff": "one_foot", "shape": "layout", "axis": "off_axis"}
"gainer":      {"flips": 1.0, "twists": 0.0, "direction": "backward", "takeoff": "one_foot", "shape": "tuck"}

The matcher computes a weighted distance between the measured 3D signature and each definition:

Property	Weight	Why
Flip count	30%	Strongest discriminator (single vs double vs triple)
Twist count	20%	Distinguishes full from non-twist variants
Direction	15%	Forward vs backward
Takeoff type	15%	Backflip vs gainer, double full vs cork
Axis type	15%	On-axis (flip) vs off-axis (cork)
Body shape	5%	Tuck vs layout vs pike

No training videos needed. Adding a new trick = adding one line to the definition table. The 1,837 known tricks map to 193 unique property combinations. The matching table grows linearly -- adding trick #1,838 is adding one row.

For Athletes and Coaches

Your clips are what make this system work. The more diverse the data, the more accurate PkVision becomes for everyone.

How You Can Help

Submit training clips -- Open a Clip Submission issue on GitHub with a link to your video.
Propose new tricks -- If a trick is missing from the catalog, open a Trick Proposal issue.
Review detections -- Try PkVision on your own clips and report inaccuracies.

Filming Guidelines

For best detection results:

Resolution: 720p or higher
Frame rate: 30fps or higher
Camera angle: Side or diagonal preferred
Framing: Full body visible throughout the trick
Timing: 1-2 seconds of buffer before and after the trick
Clothing: Contrasting colors against the background
Stability: Tripod or stable surface preferred
Content: One trick per clip when possible

See docs/CLIP_GUIDELINES.md for the complete guide.

For Judges and Federations

PkVision is built with competition integrity in mind.

Scoring

Top 3 by difficulty -- The system selects the 3 most difficult tricks detected, aligned with FIG competition scoring structures.
Weighted scores -- Each trick's score is difficulty x confidence, rewarding both ambition and clean execution.

Transparency

Full audit trail -- Every detection includes reasoning: which properties matched, the confidence level, and per-property breakdowns.
Human override -- Judges always have the final say. Overrides create new audit entries; the original AI decisions are never deleted or modified.
Immutable history -- The audit log is append-only. All entries (AI detections, scoring decisions, judge overrides) are preserved for review.

Neutrality

Nation-neutral -- No geographic bias in detection or scoring.
Multi-language -- Trick names and output are available in English and French, with a straightforward path to add more languages.

Multi-Camera Competition Mode

For competition-grade precision, PkVision supports multiple synchronized cameras:

Camera 1 (side)  --+
Camera 2 (front) --+-- Pose2Sim (3D triangulation) -- SMPL fitting -- Biomechanics -- Matching
Camera 3 (top)   --+

With 3-4 cameras, there is zero ambiguity in rotation axes, twist counts, and body shape. Single-camera mode already achieves good accuracy; multi-camera mode is for the highest precision.

Tools: Pose2Sim for multi-view triangulation, compatible with any camera (GoPro, smartphone, webcam).

References

Note: This project references FIG and Olympic standards for context but is not officially affiliated with the FIG, the IOC, or any national federation.

Trick Knowledge Base

PkVision draws from multiple sources to cover 1,800+ parkour tricks:

Source	Tricks	Data
Parkour Theory	1,837	Names, types, descriptions, prerequisites, subsequents
Loopkicks Tricktionary	943	Names, descriptions, categories (forward/backward/vertical)
Tricking Bible	45	Difficulty classes (A-F), type, origin, prerequisites
Name parsing	1,837 --> 193	Auto-parsed into physics parameters from trick names

All 1,837 trick names have been parsed into physics parameters (rotation axis, direction, count, twist, body shape, entry type) using the compositional naming convention of parkour. The name "Back Double Full" directly encodes: backward + 1 flip + 2 twists.

Trick Progression Trees

Prerequisite/subsequent data gives us the progression graph, used for difficulty estimation and suggesting which tricks an athlete should learn next:

Back Flip --> Back Full --> Back Double Full --> Back Triple Full
    |             |
  Gainer        Cork --> Double Cork --> Triple Cork
    |
  Webster

Getting Started

Requirements

Python 3.11+
NVIDIA GPU with CUDA (for GVHMR inference)
~6GB VRAM minimum (RTX 2060 or better)

Installation

git clone https://github.com/AirKyzzZ/pkvision.git
cd pkvision
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Install GVHMR (see docs/INSTALL_GVHMR.md for full guide)
git clone https://github.com/zju3dv/GVHMR.git
cd GVHMR && pip install -e . && cd ..

# Download model checkpoints (SMPL, GVHMR, ViTPose, HMR2)
# See docs/INSTALL_GVHMR.md for download links

Analyze a Video

# Single trick clip
python scripts/analyze.py --input video.mp4

# Full competition run (auto-segments tricks)
python scripts/analyze.py --input full_run.mp4 --segment

Run Tests

pytest tests/unit/ -q
# 250+ tests covering biomechanics, matching, segmentation, scoring

Adding a New Trick

Adding a trick requires no training, no video, no code changes. Edit data/parsed_tricks.json:

{
  "name": "My New Trick",
  "rotation_axis": "off_axis",
  "direction": "backward",
  "rotation_count": 2.0,
  "twist_count": 1.5,
  "body_shape": "layout",
  "entry": "one_leg",
  "family": "flip"
}

The matcher will immediately recognize this trick from any video that matches these properties.

Technology Stack

3D Body Mesh Recovery: GVHMR

GVHMR (SIGGRAPH Asia 2024) reconstructs a world-grounded 3D body mesh from monocular video. It outputs SMPL body model parameters aligned with gravity.

Why GVHMR over alternatives:

Gravity-aligned -- knows which direction is "up" regardless of camera angle
World coordinates -- positions in meters, not pixels
Best accuracy -- 19% better than WHAM on world-grounded trajectory metrics
Fast -- 1.9 seconds for 8.6 seconds of video on RTX 2060

2D Pose Estimation: ViTPose

ViTPose provides 17 COCO keypoints as input to GVHMR. YOLO handles person detection and tracking.

Biomechanical Analysis: scipy + custom

The swing-twist decomposition and joint angle extraction use scipy.spatial.transform.Rotation with custom code in core/pose/biomechanics.py.

Scoring & Segmentation: Custom

The scoring engine (core/scoring/engine.py) and run segmenter (core/recognition/segmentation.py) are custom implementations.

Research Context

PkVision draws on recent advances in computer vision and sports biomechanics:

Technology	Paper / Source	Role in PkVision
GVHMR	SIGGRAPH Asia 2024	3D body mesh recovery
ViTPose	TPAMI 2023	2D pose estimation
SMPL	SIGGRAPH Asia 2015	Parametric body model
Fujitsu JSS	Production system	Inspiration (gymnastics judging)
AthletePose3D	CVPR 2025 Workshop	Extreme pose fine-tuning data
Pose2Sim	Open source	Multi-camera 3D reconstruction
Swing-twist decomposition	Classical mechanics	Rotation analysis

Roadmap

Completed

3D body mesh recovery from monocular video (GVHMR)
Swing-twist decomposition for flip/twist counting
Body shape classification (tuck/pike/layout)
Takeoff detection (one-foot vs two-foot)
Zero-shot trick matching against 1,800+ definitions
Run segmentation for full competition runs
1,837 trick names parsed into physics parameters
Pipeline visualization figures for research paper

In Progress

Direction convention calibration (SMPL axis alignment)
Cork vs double full differentiation (off-axis detection refinement)
Multi-camera competition mode (Pose2Sim integration)

Planned

Landing quality scoring (joint angles at ground contact)
Real-time processing pipeline
Web interface for live competition judging
Obstacle detection (wall, bar, rail) for vault/bar tricks
AthletePose3D fine-tuning for better extreme pose accuracy

Contributing

We welcome contributions from developers, athletes, coaches, judges, and anyone interested in parkour and computer vision.

Code -- Bug fixes, features, pipeline improvements
Trick definitions -- Add physics parameters for new tricks
Training clips -- Submit videos of known tricks for validation
Biomechanics expertise -- Help refine body shape / takeoff / landing detection
Translations -- Add language codes to trick names

See docs/CONTRIBUTING.md for the full guide.

License

MIT License -- free for everyone, forever.

See LICENSE for the full text.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github		.github
api		api
assets		assets
core		core
data		data
db		db
docs		docs
ml		ml
notebooks		notebooks
scripts		scripts
tests		tests
worker		worker
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
NEXT_SESSION_PROMPT.md		NEXT_SESSION_PROMPT.md
README.md		README.md
TODO.md		TODO.md
requirements.txt		requirements.txt
skeleton_training.zip		skeleton_training.zip

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

What is Parkour?

Origins

The Trick System

From the Streets to the Olympics

What is PkVision?

Why 3D?

How It Works

Key Results

The Science

Swing-Twist Decomposition

Body Shape Classification

Takeoff Detection

Zero-Shot Matching

For Athletes and Coaches

How You Can Help

Filming Guidelines

For Judges and Federations

Scoring

Transparency

Neutrality

Multi-Camera Competition Mode

References

Trick Knowledge Base

Trick Progression Trees

Getting Started

Requirements

Installation

Analyze a Video

Run Tests

Adding a New Trick

Technology Stack

3D Body Mesh Recovery: GVHMR

2D Pose Estimation: ViTPose

Biomechanical Analysis: scipy + custom

Scoring & Segmentation: Custom

Research Context

Roadmap

Completed

In Progress

Planned

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages