A real-time multi-class beverage container detection system powered by YOLOv8 with a modern React frontend and FastAPI backend. Detect bottles, wine glasses, and cups in live webcam feeds or uploaded videos.
- Overview
- Features
- Architecture
- Detected Classes
- Quick Start
- API Documentation
- Benchmarks
- Model Information
- Training with Google Colab
- Configuration
- Project Structure
- Contributing
- License
This project provides an end-to-end solution for detecting beverage containers in images and video streams. It leverages the YOLOv8 object detection model, optimized for identifying common beverage-holding objects such as plastic bottles, wine glasses, and cups.
Use Cases:
- Recycling automation systems
- Inventory management for bars/restaurants
- Smart retail shelf monitoring
- Environmental monitoring for litter detection
- Educational demonstrations of computer vision
- Real-time WebSocket Detection - Stream webcam frames for instant object detection
- Video Upload Processing - Upload videos and receive frame-by-frame detection results via Server-Sent Events (SSE)
- Adjustable Confidence Threshold - Dynamically tune detection sensitivity via UI slider
- Multi-class Detection - Identify bottles, wine glasses, and cups simultaneously
- Rolling Benchmark Statistics - Track inference latency with avg, p95, and p99 metrics
- Responsive React UI - Modern interface with live bounding box overlays
- Docker Ready - One-command deployment with Docker Compose
- CORS Enabled - Easy integration with external frontends
+------------------+ WebSocket (frames) +------------------+
| | ----------------------------> | |
| React Frontend | | FastAPI Backend |
| (Vite + Nginx) | <---------------------------- | (YOLOv8 + UV) |
| | JSON (detections) | |
+------------------+ +------------------+
| |
| HTTP POST /upload |
| ----------------------------------------------> |
| |
| SSE (progress + results) |
| <---------------------------------------------- |
| |
v v
+------------------+ +------------------+
| User Webcam | | YOLOv8n Model |
| or Video File | | (6.5 MB .pt) |
+------------------+ +------------------+
| Component | Technology | Purpose |
|---|---|---|
| Frontend | React 19 + Vite 8 | User interface, webcam capture, visualization |
| Backend | FastAPI + Uvicorn | API server, WebSocket handler, video processing |
| Model | YOLOv8n (Nano) | Object detection inference |
| Proxy | Nginx | Static file serving, WebSocket proxy |
| Container | Docker Compose | Orchestration and deployment |
The model detects the following beverage container classes from the COCO dataset:
| Class ID | Label | Description |
|---|---|---|
| 39 | bottle | Plastic bottles, water bottles, soda bottles |
| 40 | wine glass | Wine glasses, champagne flutes |
| 41 | cup | Coffee cups, mugs, plastic cups |
The model uses YOLOv8n pre-trained on COCO and filters detections to only these beverage-related classes.
The fastest way to get started is using Docker Compose:
# Clone the repository
git clone <repository-url>
cd yolo-demo
# Start all services
docker compose up --build
# Access the application
# Frontend: http://localhost:3000
# Backend API: http://localhost:8000
# API Docs: http://localhost:8000/docsStop the services:
docker compose downView logs:
docker compose logs -f backend
docker compose logs -f frontend- Python 3.11+
- Node.js 20+
- pip or uv package manager
cd backend
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run the server
python main.py
# Server will start at http://localhost:8000cd frontend
# Install dependencies
npm install
# Start development server
npm run dev
# Frontend will start at http://localhost:5173Real-time object detection via WebSocket connection.
Message Types (Client -> Server):
-
Binary Frame - JPEG-encoded image bytes
Send raw JPEG bytes for detection -
Config Message - JSON text to update settings
{ "type": "config", "threshold": 0.5 }
Response Types (Server -> Client):
-
Detection Result
{ "type": "detection", "detections": [ { "class_id": 39, "label": "bottle", "confidence": 0.923, "bbox": [120.5, 80.2, 250.8, 400.1] } ], "inference_ms": 23.45, "avg_ms": 25.12, "p95_ms": 32.50, "p99_ms": 45.20, "frames": 30 } -
Config Acknowledgment
{ "type": "config_ack", "confidence": 0.5 } -
Error
{ "type": "error", "detail": "could not decode frame" }
Upload a video file for batch processing with SSE progress streaming.
Request:
- Content-Type:
multipart/form-data - Body:
file- Video file (MP4, WebM, MOV)
Response: Server-Sent Events stream
SSE Event Types:
-
Start Event
{ "type": "start", "job_id": "uuid", "total_frames": 1500, "fps": 30 } -
Progress Event (per frame)
{ "type": "progress", "frame": 100, "total_frames": 1500, "progress": 6.7, "detections": [...] } -
Complete Event
{ "type": "complete", "job_id": "uuid", "total_frames": 1500, "results": [...] }
Performance benchmarks measured on different hardware configurations:
| Hardware | Avg (ms) | P95 (ms) | P99 (ms) | FPS |
|---|---|---|---|---|
| Apple M1 (CPU) | 45 | 52 | 58 | ~22 |
| Apple M2 Pro (CPU) | 32 | 38 | 42 | ~31 |
| Intel i7-12700K (CPU) | 38 | 45 | 50 | ~26 |
| NVIDIA RTX 3060 (GPU) | 8 | 10 | 12 | ~125 |
| NVIDIA RTX 4090 (GPU) | 3 | 4 | 5 | ~333 |
Note: Benchmarks include model inference only, not pre/post-processing or network latency.
| Model | Size | mAP@50 | Inference (CPU) | Inference (GPU) |
|---|---|---|---|---|
| YOLOv8n | 6.5 MB | 37.3 | 45 ms | 8 ms |
| YOLOv8s | 22 MB | 44.9 | 85 ms | 12 ms |
| YOLOv8m | 52 MB | 50.2 | 180 ms | 18 ms |
| YOLOv8l | 84 MB | 52.9 | 320 ms | 25 ms |
| YOLOv8x | 131 MB | 53.9 | 480 ms | 35 ms |
| Component | RAM Usage |
|---|---|
| Backend (idle) | ~500 MB |
| Backend (active) | ~1.2 GB |
| Frontend (Nginx) | ~15 MB |
| Model (YOLOv8n) | ~50 MB |
| Resolution | FPS (CPU) | FPS (GPU) |
|---|---|---|
| 480p | 18-22 | 80-100 |
| 720p | 12-15 | 60-80 |
| 1080p | 6-8 | 40-50 |
- Architecture: YOLOv8 (Ultralytics)
- Variant: Nano (smallest, fastest)
- Size: 6.5 MB (3.2M parameters)
- Training Data: COCO 2017 (80 classes)
- Input Size: 640x640 (auto-scaled)
- Output: Bounding boxes + class probabilities
To use your own trained YOLOv8 model:
- Place your
.ptfile in themodels/directory - Update the
MODEL_PATHenvironment variable:export MODEL_PATH=/app/models/your-model.pt - Or modify
docker-compose.yml:environment: - MODEL_PATH=/app/models/your-model.pt
# Install ultralytics
pip install ultralytics
# Train on custom dataset
yolo detect train data=your-dataset.yaml model=yolov8n.pt epochs=100
# Export to ONNX for faster inference (optional)
yolo export model=runs/detect/train/weights/best.pt format=onnxTrain a custom beverage detection model using Google Colab's free GPU and Roboflow for dataset management.
The custom Roboflow dataset includes 9 beverage container classes:
| Class ID | Label | Description |
|---|---|---|
| 0 | bottle-glass | Glass bottles |
| 1 | bottle-plastic | Plastic bottles |
| 2 | cup-disposable | Disposable cups |
| 3 | cup-handle | Cups with handles |
| 4 | glass-mug | Glass mugs |
| 5 | glass-normal | Regular glasses |
| 6 | glass-wine | Wine glasses |
| 7 | gym bottle | Sports/gym bottles |
| 8 | tin can | Tin cans |
Create a new Colab notebook and run the following cells:
Cell 1: Install Dependencies
!pip install ultralytics roboflowCell 2: Download Dataset from Roboflow
from roboflow import Roboflow
rf = Roboflow(api_key="YOUR_ROBOFLOW_API_KEY")
project = rf.workspace("").project("")
dataset = project.version(1).download("yolov8")Get your API key from: Roboflow Dashboard → Settings → API Key
Cell 3: Train the Model
from ultralytics import YOLO
# Load pretrained YOLOv8n
model = YOLO('yolov8n.pt')
# Train on custom dataset
results = model.train(
data='/content/beverage-containers-1/data.yaml',
epochs=100,
imgsz=640,
batch=16,
name='beverage-detector'
)Cell 4: Validate the Model
# Run validation
metrics = model.val()
print(f"mAP50: {metrics.box.map50:.3f}")
print(f"mAP50-95: {metrics.box.map:.3f}")Cell 5: Download Trained Model
from google.colab import files
# Download best weights
files.download('/content/runs/detect/beverage-detector/weights/best.pt')- Download
best.ptfrom Colab - Place it in the
models/directory:mv ~/Downloads/best.pt models/beverage-model.pt - Update
docker-compose.yml:environment: - MODEL_PATH=/app/models/beverage-model.pt
- Update class filtering in
backend/detector.pyto use class IDs 0-8 instead of COCO IDs
If you prefer not to use the Roboflow API:
-
Zip your dataset locally:
zip -r dataset.zip train/ valid/ test/ data.yaml
-
Upload and unzip in Colab:
from google.colab import files # Upload the zip file uploaded = files.upload() # Unzip !unzip dataset.zip
-
Train with local data.yaml:
model.train(data='/content/data.yaml', epochs=100)
| Variable | Default | Description |
|---|---|---|
MODEL_PATH |
../models/yolov8n.pt |
Path to YOLOv8 model weights |
VITE_API_URL |
(auto-detect) | Backend API URL for frontend |
VITE_WS_URL |
(auto-detect) | WebSocket URL for frontend |
| Parameter | Default | Range | Description |
|---|---|---|---|
confidence |
0.45 | 0.1-0.9 | Minimum confidence threshold |
window_size |
30 | 10-100 | Rolling window for benchmark stats |
yolo-demo/
├── backend/
│ ├── main.py # FastAPI application entry point
│ ├── detector.py # YOLOv8 wrapper class
│ ├── benchmark.py # Latency tracking utilities
│ ├── requirements.txt # Python dependencies
│ ├── Dockerfile # Backend container definition
│ └── .dockerignore # Docker build exclusions
│
├── frontend/
│ ├── src/
│ │ ├── App.jsx # Main React component
│ │ ├── App.css # Application styles
│ │ ├── components/
│ │ │ ├── Webcam.jsx # Webcam capture component
│ │ │ ├── Canvas.jsx # Bounding box overlay
│ │ │ └── VideoUpload.jsx # Video upload UI
│ │ └── hooks/
│ │ ├── useDetection.js # WebSocket hook
│ │ └── useUpload.js # Upload/SSE hook
│ ├── package.json # Node.js dependencies
│ ├── vite.config.js # Vite configuration
│ ├── nginx.conf # Production nginx config
│ ├── Dockerfile # Frontend container definition
│ └── .dockerignore # Docker build exclusions
│
├── models/
│ └── yolov8n.pt # Pre-trained YOLOv8 nano weights
│
├── docker-compose.yml # Multi-service orchestration
└── README.md # This file
WebSocket Connection Failed
- Ensure backend is running on port 8000
- Check firewall settings
- Verify CORS configuration
Model Not Found
- Confirm
yolov8n.ptexists inmodels/directory - Check
MODEL_PATHenvironment variable
Slow Inference
- Consider using GPU acceleration
- Reduce input resolution
- Use a smaller model variant (yolov8n)
Docker Build Fails
- Ensure Docker Desktop is running
- Check disk space (needs ~5GB)
- Try
docker compose build --no-cache
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ultralytics YOLOv8 - State-of-the-art object detection
- FastAPI - Modern Python web framework
- React - UI component library
- COCO Dataset - Training data for pre-trained models
- Roboflow - Dataset management and annotation
- Google Colab - Free GPU for model training