prosdevlab · prosdev · Jan 16, 2026 · Jan 15, 2026 · Jan 15, 2026 · Jan 15, 2026
diff --git a/benchmarks/.gitignore b/benchmarks/.gitignore
@@ -0,0 +1,10 @@
+# Benchmark results
+results/
+
+# Python cache
+__pycache__/
+*.pyc
+
+# Profiling outputs
+*.svg
+*.bin
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,332 @@
+# EventKit Performance Benchmarks
+
+This directory contains load testing scenarios for validating EventKit's performance characteristics.
+
+## Quick Start
+
+### 1. Start EventKit
+
+Choose your queue mode:
+
+**AsyncQueue Mode (Single-Server, In-Process)**:
+```bash
+# Terminal 1: Start EventKit API with AsyncQueue
+GCP_PROJECT_ID=eventkit-benchmark \
+GCP_GCS_BUCKET=eventkit-events \
+STORAGE_EMULATOR_HOST=http://localhost:4443 \
+EVENTKIT_RING_BUFFER_ENABLED=true \
+EVENTKIT_QUEUE_MODE=async \
+uv run uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000
+```
+
+**PubSub Mode (Distributed, External Queue)**:
+```bash
+# Terminal 1: Start Pub/Sub + GCS emulators
+docker compose up -d pubsub-emulator gcs-emulator
+
+# Terminal 2: Start EventKit API with PubSub
+GCP_PROJECT_ID=eventkit-benchmark \
+GCP_GCS_BUCKET=eventkit-events \
+PUBSUB_EMULATOR_HOST=localhost:8085 \
+STORAGE_EMULATOR_HOST=http://localhost:9023 \
+EVENTKIT_RING_BUFFER_ENABLED=true \
+EVENTKIT_QUEUE_MODE=pubsub \
+EVENTKIT_PUBSUB_TOPIC=eventkit-events \
+uv run uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000
+
+# Terminal 3: Start EventSubscriptionCoordinator workers
+# (Implementation specific - see streaming/coordinator.py for examples)
+```
+
+### 2. Run All Benchmarks
+
+```bash
+# Test AsyncQueue mode
+./benchmarks/run_benchmarks.sh async
+
+# Or test PubSub mode
+./benchmarks/run_benchmarks.sh pubsub
+```
+
+Results will be saved to `benchmarks/results/{queue_mode}/`.
+
+## Manual Testing
+
+### Run Locust UI
+
+For interactive testing with charts:
+
+```bash
+locust -f benchmarks/locustfile.py --host=http://localhost:8000
+```
+
+Then open http://localhost:8089 in your browser.
+
+### Run Specific Scenario
+
+```bash
+# Baseline throughput test
+locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
+    --users=100 --spawn-rate=10 --run-time=60s --headless \
+    --user BaselineUser
+
+# Realistic workload
+locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
+    --users=50 --spawn-rate=10 --run-time=60s --headless \
+    --user RealisticUser
+
+# Payload size comparison
+locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
+    --users=30 --spawn-rate=5 --run-time=60s --headless \
+    --user PayloadSizeUser
+```
+
+## Benchmark Scenarios
+
+### 1. BaselineUser
+**Goal**: Find maximum sustained throughput
+
+- Sends tiny identify events (~50 bytes)
+- High request rate (wait 1-5ms)
+- Measures raw processing capability
+
+### 2. PayloadSizeUser
+**Goal**: Measure impact of payload size
+
+- 40% tiny (~50 bytes)
+- 30% small (~500 bytes)
+- 20% medium (~5KB)
+- 10% large (~50KB)
+
+### 3. RealisticUser
+**Goal**: Simulate real CDP traffic
+
+- 60% Track events
+- 30% Identify events
+- 10% Page events
+- Varied properties (3-30 per event)
+
+### 4. BurstTrafficUser
+**Goal**: Test spike handling
+
+- Step load: 10 → 50 → 100 users
+- Fast event generation
+- Tests ring buffer and queue behavior
+
+### 5. ErrorRateUser
+**Goal**: Measure error handling overhead
+
+- Configurable error rate (default: 10%)
+- Invalid events (missing 'type' field)
+- Tests error store performance
+
+## Results
+
+After running benchmarks, you'll find:
+
+```
+benchmarks/results/
+├── baseline_stats.csv           # Request/response stats
+├── baseline_stats_history.csv   # Time-series data
+├── baseline_failures.csv        # Error details
+├── baseline.html                # Visual report
+├── baseline.log                 # Console output
+└── ... (same for each scenario)
+```
+
+## Analyzing Results
+
+### View HTML Reports
+
+Open any `.html` file in your browser for interactive charts and graphs.
+
+### CSV Analysis
+
+```bash
+# Install pandas if needed
+uv pip install pandas matplotlib
+
+# Analyze results
+python benchmarks/analyze_results.py benchmarks/results/
+```
+
+### Key Metrics to Look For
+
+**Throughput**:
+- Requests/second (RPS)
+- Total requests completed
+- Failure rate
+
+**Latency**:
+- Average response time
+- p50, p95, p99 percentiles
+- Max latency
+
+**Resource Usage** (monitor separately):
+- CPU utilization
+- Memory usage
+- Disk I/O
+- GCS write rate
+
+## Profiling
+
+### CPU Profiling with py-spy
+
+```bash
+# Terminal 1: Start EventKit
+uv run py-spy record --native -o profile.svg -- \
+    uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000
+
+# Terminal 2: Run load test
+locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
+    --users=50 --spawn-rate=10 --run-time=60s --headless \
+    --user RealisticUser
+
+# View flame graph
+open profile.svg
+```
+
+### Memory Profiling with memray
+
+```bash
+# Install memray
+uv pip install memray
+
+# Profile memory usage
+uv run memray run -o memory.bin \
+    uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000
+
+# Generate flame graph
+memray flamegraph memory.bin
+```
+
+## Comparing Queue Modes
+
+EventKit supports two queue modes with different trade-offs:
+
+### AsyncQueue (Single-Server)
+**Best for**: Development, single-instance deployments
+
+**Characteristics**:
+- In-process Python `asyncio.Queue`
+- Low latency (microseconds to enqueue)
+- No external dependencies
+- Simpler architecture
+- Limited to single server's resources
+
+**Benchmark this to measure**:
+- Maximum throughput for single instance
+- Memory pressure under load
+- Ring buffer → queue → loader latency
+
+### PubSub (Distributed)
+**Best for**: Production, horizontal scaling
+
+**Characteristics**:
+- External queue (Google Cloud Pub/Sub)
+- Higher latency (milliseconds to publish)
+- Distributed processing
+- Horizontal scalability
+- More complex architecture
+
+**Benchmark this to measure**:
+- Network overhead (API → Pub/Sub)
+- Multi-worker scalability
+- Fault tolerance (nack/redelivery)
+
+### Running Comparisons
+
+```bash
+# Benchmark AsyncQueue (single emulator needed)
+docker compose up -d gcs-emulator
+./benchmarks/run_benchmarks.sh async
+
+# Benchmark PubSub (both emulators needed)
+docker compose up -d pubsub-emulator gcs-emulator
+./benchmarks/run_benchmarks.sh pubsub
+
+# Compare results
+diff benchmarks/results/async/baseline.log \
+     benchmarks/results/pubsub/baseline.log
+
+# Stop emulators when done
+docker compose down
+```
+
+**What to expect**:
+- **AsyncQueue**: Higher throughput, lower latency (no network hops)
+- **PubSub**: Lower throughput (network + serialization overhead), but horizontally scalable
+
+## Configuration Tuning
+
+Test different configurations by setting environment variables:
+
+```bash
+# Increase batch size
+export EVENTKIT_EVENT_LOADER_BATCH_SIZE=1000
+export EVENTKIT_ASYNC_QUEUE_WORKERS=8
+
+# Disable ring buffer to test direct queue
+export EVENTKIT_RING_BUFFER_ENABLED=false
+
+# Adjust flush interval
+export EVENTKIT_EVENT_LOADER_FLUSH_INTERVAL=60
+
+# Run benchmarks
+./benchmarks/run_benchmarks.sh async
+```
+
+## Expected Results
+
+### Baseline Throughput
+- **Target**: 10,000+ events/sec
+- **Realistic**: 30,000-50,000 events/sec (local)
+- **Bottleneck**: Pydantic validation, Parquet serialization
+
+### Realistic Workload
+- **Target**: 5,000+ events/sec
+- **Realistic**: 10,000-20,000 events/sec (local)
+- **Latency**: p99 < 50ms
+
+### Resource Usage
+- **CPU**: 50-80% (4 cores) at 10k events/sec
+- **Memory**: 200-500 MB (steady state)
+- **Peak Memory**: < 2 GB during spikes
+
+## Troubleshooting
+
+### Locust fails to connect
+
+```bash
+# Check EventKit is running
+curl http://localhost:8000/health
+
+# Check metrics endpoint
+curl http://localhost:9090/metrics
+```
+
+### High failure rate
+
+- Check EventKit logs for errors
+- Verify event payloads are valid
+- Reduce load (lower users/spawn-rate)
+
+### Inconsistent results
+
+- Run benchmarks 3x and average results
+- Close other applications
+- Ensure CPU isn't throttled
+- Check disk space (ring buffer)
+
+## Next Steps
+
+1. **Run baselines**: Establish current performance
+2. **Identify bottlenecks**: Use profiling to find hot paths
+3. **Optimize**: Focus on highest-impact improvements
+4. **Validate**: Re-run benchmarks to measure improvement
+5. **Document**: Update README with actual numbers
+
+## Related
+
+- [EventKit Architecture](../ARCHITECTURE.md)
+- [EventKit README](../README.md)
diff --git a/benchmarks/__init__.py b/benchmarks/__init__.py
@@ -0,0 +1 @@
+"""EventKit performance benchmarks."""