SwingCoach

AI tennis coach powered by on-device vision inference and multi-device orchestration.

Record your swing on a Galaxy S25, get instant spoken coaching feedback analyzed by a vision language model running locally on a Copilot+ PC -- no cloud required.

Built for the Qualcomm Snapdragon Multiverse Hackathon at Columbia University, February 2026.

Track 2: Conversational AI Companion -- voice-driven wellness coaching with real-time contextually aware responses.

Team

Name	Email
James Zhang	tz2642@columbia.edu
Chenyang Yu	cy2758@columbia.edu
Chihao Yu	chy007@ucsd.edu
Yang Zhou	nbzy1995@gmail.com

Demo

  Galaxy S25                              Copilot+ PC
  (Snapdragon 8 Elite)                   (Snapdragon X Elite)
  ========================                ========================

  1. Record swing          ---- WiFi ---> 2. Receive video
                                          3. Extract frames (ffmpeg)
                                          4. Vision LLM analyzes swing
                                             (Qwen2.5-VL via LM Studio)
  6. Android TTS speaks    <--- WiFi ---- 5. Return coaching JSON
     coaching feedback
  7. Listen for follow-up
     questions by voice

Round-trip latency: ~2 seconds from recording stop to spoken feedback.

Architecture

swingcoach/
├── android/                  # Mobile app (Kotlin, Jetpack Compose)
│   ├── app/src/main/
│   │   ├── analysis/         #   EdgeServerAnalyzer, GeminiVideoAnalyzer, OpenRouterAnalyzer
│   │   ├── camera/           #   CameraX video-only recording (mic stays free)
│   │   ├── voice/            #   On-device SpeechRecognizer + command parsing
│   │   ├── tts/              #   Android TextToSpeech wrapper
│   │   ├── settings/         #   LLM provider config (Edge/Gemini/OpenRouter)
│   │   └── ui/               #   SwingViewModel, CoachScreen, SettingsScreen
│   └── app/src/test/         #   77 unit tests
│
├── backend/                  # Edge server (Python, FastAPI)
│   ├── app.py                #   REST API -- /api/analyze-swing
│   ├── lmstudio_service.py   #   Local VLM inference via LM Studio
│   ├── vision_provider.py    #   Routes to local (LM Studio) or cloud (Gemini)
│   ├── gemini_service.py     #   Gemini Vision + TTS (cloud fallback)
│   ├── pose.py               #   MediaPipe pose estimation (optional)
│   ├── analysis.py           #   Tennis swing scoring engine
│   ├── network_info.py       #   LAN IP discovery for phone pairing
│   └── tests/                #   28 pytest tests (unit + E2E)
│
└── frontend/                 # Web dashboard (React, TypeScript) -- optional

Multi-Device Pipeline

The app uses a phone + PC split architecture designed for the Snapdragon ecosystem:

Component	Device	Processor	Runs On-Device?
Video capture	Galaxy S25	Snapdragon 8 Elite	Yes
Speech-to-text	Galaxy S25	Snapdragon 8 Elite	Yes
Text-to-speech	Galaxy S25	Snapdragon 8 Elite	Yes
Vision LLM (Qwen2.5-VL)	Copilot+ PC	Snapdragon X Elite	Yes
Frame extraction (ffmpeg)	Copilot+ PC	Snapdragon X Elite	Yes
Pose estimation (MediaPipe)	Copilot+ PC	Snapdragon X Elite	Yes (optional)

All AI inference runs on-device across the two Snapdragon processors. Zero cloud dependency.

API Endpoints

Endpoint	Method	Description
`/health`	GET	Health check with provider and pose status
`/api/analyze-swing`	POST	Combined: vision + pose + TTS (main endpoint)
`/api/analyze-video`	POST	Pose estimation only
`/api/analyze-frame`	POST	Single frame joint angles
`/api/annotate-video`	POST	Video with skeleton overlay
`/api/network-info`	GET	LAN IP and connection URLs for phone pairing

Setup from Scratch

Prerequisites

Tool	Version	Install
Python	3.10+	Pre-installed on Copilot+ PC
LM Studio	Latest	lmstudio.ai
ffmpeg	Latest	`winget install Gyan.FFmpeg`
Android Studio	Latest	developer.android.com
JDK	17+	Bundled with Android Studio

Step 1: Backend (Copilot+ PC)

# Clone the repo
git clone https://github.com/JoeyYu23/swingcoach.git
cd swingcoach

# Create Python virtual environment
cd backend
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure environment
copy .env.example .env
# Edit .env -- set VISION_PROVIDER=local (default)

Step 2: Load the Vision Model

Open LM Studio
Search for and download Qwen2.5-VL-3B-Instruct (or 7B for better quality)
Load the model in LM Studio
Verify it's running: curl http://localhost:1234/v1/models

Step 3: Start the Backend

cd backend
.venv\Scripts\activate
python -m uvicorn app:app --host 0.0.0.0 --port 8001

The server prints connection info on startup:

SwingCoach API ready!
  Vision provider: local
  Pose estimation: unavailable
  LAN IP:  10.206.81.71
  Phone connect URL: http://10.206.81.71:8001

Note the Phone connect URL -- you'll enter this in the Android app.

Step 4: Android App (Galaxy S25)

Option A: Install pre-built APK

adb install swingcoach-debug.apk

Option B: Build from source

cd android

# Set JAVA_HOME (Windows)
set JAVA_HOME=C:\Program Files\Android\Android Studio\jbr

# Build and install
gradlew.bat installDebug

Step 5: Connect Phone to PC

Ensure both devices are on the same WiFi network
Open the SwingCoach app on the Galaxy S25
Go to Settings (gear icon)
Set Video Analysis provider to Edge Server
Enter the PC's URL: http://<PC_IP>:8001
Tap Save
Return to the Coach tab

Step 6: Use It

Say "start recording" and swing your racket
Say "stop" -- the video uploads to the PC
Within ~2 seconds, the phone speaks coaching feedback
Ask follow-up questions by voice: "Tell me more about my grip"

Configuration

Environment Variables (backend/.env)

# Vision provider: "local" (LM Studio) or "gemini" (Google Cloud)
VISION_PROVIDER=local

# LM Studio settings (used when VISION_PROVIDER=local)
LMSTUDIO_URL=http://localhost:1234
LMSTUDIO_MODEL=qwen2.5-vl-3b-instruct
NUM_FRAMES=6
MAX_FRAME_DIM=512

# Gemini API key (used when VISION_PROVIDER=gemini)
GEMINI_API_KEY=your_key_here

Android Settings

The app supports three LLM providers, configurable per-slot (video analysis vs conversation):

Provider	Use Case	Requires
Edge Server	Video analysis via PC backend	PC on same network
Gemini	Direct cloud video analysis	API key
OpenRouter	Alternative cloud models	API key + balance

Testing

Backend Tests (28 tests)

cd backend
.venv\Scripts\activate

# Unit tests (no external dependencies)
python -m pytest tests/ -v --ignore=tests/test_gemini_service.py -k "not e2e"

# E2E tests (requires LM Studio running)
python -m pytest tests/test_e2e_local_llm.py -v

# All tests
python -m pytest tests/ -v --ignore=tests/test_gemini_service.py

Android Unit Tests (77 tests)

cd android
set JAVA_HOME=C:\Program Files\Android\Android Studio\jbr
gradlew.bat testDebugUnitTest

Test Suite	Count	Coverage
SwingViewModelTest	24	State machine, dual-analyzer, conversation, voice
VoiceCommandParserTest	12	Command parsing, case insensitivity
OpenRouterAnalyzerTest	12	MockWebServer: auth, parsing, errors
LlmSettingsTest	11	SharedPreferences round-trip
GeminiVideoAnalyzerTest	10	MockWebServer: analyze, retry, conversation
AnalyzerFactoryTest	5	Provider routing
FakeSpeechPlayerTest	3	TTS fake behavior

Android Connected Tests (27 tests, requires device + backend)

cd android
gradlew.bat connectedDebugAndroidTest

Test Suite	Count	What It Tests
DeviceReadinessTest	10	Camera, network, TTS, assets, API keys
EdgeServerRealTest	2	Full phone-to-PC-to-phone pipeline
TtsRealWorkflowTest	5	Android TTS speaks and callbacks work
GeminiApiRealTest	4	Direct Gemini vision analysis
OpenRouterApiRealTest	5	OpenRouter API (skipped without key)
ExampleInstrumentedTest	1	Basic app context

Verify the Full Pipeline

# 1. Start LM Studio with Qwen2.5-VL loaded
# 2. Start the backend
cd backend && python -m uvicorn app:app --host 0.0.0.0 --port 8001

# 3. Test from command line
curl -X POST http://localhost:8001/api/analyze-swing ^
  -F "file=@tests/fixtures/sample_swing.mp4" ^
  -F "rotation=0"

# Expected: JSON with gemini_analysis.feedback_text containing coaching feedback

Technology Stack

Layer	Technology	Purpose
Mobile app	Kotlin 2.0, Jetpack Compose, CameraX	Video capture, voice UI
Speech-to-text	Android SpeechRecognizer	On-device voice commands
Text-to-speech	Android TextToSpeech	On-device spoken feedback
Edge server	Python 3.12, FastAPI, uvicorn	Video processing API
Vision LLM	Qwen2.5-VL via LM Studio	On-device swing analysis
Frame extraction	ffmpeg + Pillow	Video to JPEG frames
Pose estimation	MediaPipe BlazePose (optional)	33 keypoints, joint angles
HTTP client	OkHttp 4.11	Multipart video upload
Build	Gradle 9.1, AGP 9.0	Android build system

Key Design Decisions

ffmpeg over OpenCV: OpenCV has no ARM64 Windows wheels. ffmpeg subprocess + Pillow works natively on Snapdragon X Elite without WSL.
Lazy imports: cv2, numpy, google.genai are imported at function level, not module level. This allows the backend to start on ARM64 Windows even when optional dependencies are unavailable.
Video-only recording: The camera records WITHOUT audio, keeping the microphone free for the SpeechRecognizer to listen for "stop" commands during recording.
Dual-analyzer architecture: The Android app has separate analyzers for video (multimodal VLM) and conversation (text LLM), independently configurable via Settings.
LM Studio over direct ONNX: LM Studio provides an OpenAI-compatible API with automatic GPU/NPU acceleration on Snapdragon, avoiding manual model loading code.

Troubleshooting

Problem	Solution
Backend won't start	Check `python --version` (3.10+) and `pip install -r requirements.txt`
LM Studio 404	Ensure a model is loaded in LM Studio, not just downloaded
Phone can't reach PC	Both devices must be on the same WiFi. Check firewall for port 8001
App crashes on HTTP	Verify `usesCleartextTraffic="true"` in AndroidManifest.xml
Pose estimation unavailable	Expected on ARM64 Windows (mediapipe has no ARM64 wheels). Vision-only mode still works.
TTS not speaking	Install Google TTS from Play Store if missing
ffmpeg not found	Run `winget install Gyan.FFmpeg` and restart terminal

References

LM Studio -- local LLM inference
Qwen2.5-VL -- vision language model
MediaPipe -- pose estimation
Qualcomm AI Hub -- Snapdragon model optimization
FastAPI -- Python async web framework

License

MIT License -- see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.claude		.claude
android		android
backend		backend
embedded		embedded
frontend		frontend
swing_plots		swing_plots
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
dj.mp4		dj.mp4
export_metrics.py		export_metrics.py
federer.mp4		federer.mp4
index.html		index.html
llm_prompt_context.md		llm_prompt_context.md
master_biomechanics_reference.md		master_biomechanics_reference.md
metrics_dj.json		metrics_dj.json
metrics_federer.json		metrics_federer.json
server.py		server.py
setup-snapdragon.bat		setup-snapdragon.bat
setup-snapdragon.ps1		setup-snapdragon.ps1
setup-wsl.sh		setup-wsl.sh
swing_analyzer.py		swing_analyzer.py
swing_visualizer.py		swing_visualizer.py
swingcoach-debug.apk		swingcoach-debug.apk

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SwingCoach

Team

Demo

Architecture

Multi-Device Pipeline

API Endpoints

Setup from Scratch

Prerequisites

Step 1: Backend (Copilot+ PC)

Step 2: Load the Vision Model

Step 3: Start the Backend

Step 4: Android App (Galaxy S25)

Step 5: Connect Phone to PC

Step 6: Use It

Configuration

Environment Variables (backend/.env)

Android Settings

Testing

Backend Tests (28 tests)

Android Unit Tests (77 tests)

Android Connected Tests (27 tests, requires device + backend)

Verify the Full Pipeline

Technology Stack

Key Design Decisions

Troubleshooting

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SwingCoach

Team

Demo

Architecture

Multi-Device Pipeline

API Endpoints

Setup from Scratch

Prerequisites

Step 1: Backend (Copilot+ PC)

Step 2: Load the Vision Model

Step 3: Start the Backend

Step 4: Android App (Galaxy S25)

Step 5: Connect Phone to PC

Step 6: Use It

Configuration

Environment Variables (backend/.env)

Android Settings

Testing

Backend Tests (28 tests)

Android Unit Tests (77 tests)

Android Connected Tests (27 tests, requires device + backend)

Verify the Full Pipeline

Technology Stack

Key Design Decisions

Troubleshooting

References

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages