Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions benchmarks/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Benchmark results
results/

# Python cache
__pycache__/
*.pyc

# Profiling outputs
*.svg
*.bin
332 changes: 332 additions & 0 deletions benchmarks/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,332 @@
# EventKit Performance Benchmarks

This directory contains load testing scenarios for validating EventKit's performance characteristics.

## Quick Start

### 1. Start EventKit

Choose your queue mode:

**AsyncQueue Mode (Single-Server, In-Process)**:
```bash
# Terminal 1: Start EventKit API with AsyncQueue
GCP_PROJECT_ID=eventkit-benchmark \
GCP_GCS_BUCKET=eventkit-events \
STORAGE_EMULATOR_HOST=http://localhost:4443 \
EVENTKIT_RING_BUFFER_ENABLED=true \
EVENTKIT_QUEUE_MODE=async \
uv run uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000
```

**PubSub Mode (Distributed, External Queue)**:
```bash
# Terminal 1: Start Pub/Sub + GCS emulators
docker compose up -d pubsub-emulator gcs-emulator

# Terminal 2: Start EventKit API with PubSub
GCP_PROJECT_ID=eventkit-benchmark \
GCP_GCS_BUCKET=eventkit-events \
PUBSUB_EMULATOR_HOST=localhost:8085 \
STORAGE_EMULATOR_HOST=http://localhost:9023 \
EVENTKIT_RING_BUFFER_ENABLED=true \
EVENTKIT_QUEUE_MODE=pubsub \
EVENTKIT_PUBSUB_TOPIC=eventkit-events \
uv run uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000

# Terminal 3: Start EventSubscriptionCoordinator workers
# (Implementation specific - see streaming/coordinator.py for examples)
```

### 2. Run All Benchmarks

```bash
# Test AsyncQueue mode
./benchmarks/run_benchmarks.sh async

# Or test PubSub mode
./benchmarks/run_benchmarks.sh pubsub
```

Results will be saved to `benchmarks/results/{queue_mode}/`.

## Manual Testing

### Run Locust UI

For interactive testing with charts:

```bash
locust -f benchmarks/locustfile.py --host=http://localhost:8000
```

Then open http://localhost:8089 in your browser.

### Run Specific Scenario

```bash
# Baseline throughput test
locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
--users=100 --spawn-rate=10 --run-time=60s --headless \
--user BaselineUser

# Realistic workload
locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
--users=50 --spawn-rate=10 --run-time=60s --headless \
--user RealisticUser

# Payload size comparison
locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
--users=30 --spawn-rate=5 --run-time=60s --headless \
--user PayloadSizeUser
```

## Benchmark Scenarios

### 1. BaselineUser
**Goal**: Find maximum sustained throughput

- Sends tiny identify events (~50 bytes)
- High request rate (wait 1-5ms)
- Measures raw processing capability

### 2. PayloadSizeUser
**Goal**: Measure impact of payload size

- 40% tiny (~50 bytes)
- 30% small (~500 bytes)
- 20% medium (~5KB)
- 10% large (~50KB)

### 3. RealisticUser
**Goal**: Simulate real CDP traffic

- 60% Track events
- 30% Identify events
- 10% Page events
- Varied properties (3-30 per event)

### 4. BurstTrafficUser
**Goal**: Test spike handling

- Step load: 10 β†’ 50 β†’ 100 users
- Fast event generation
- Tests ring buffer and queue behavior

### 5. ErrorRateUser
**Goal**: Measure error handling overhead

- Configurable error rate (default: 10%)
- Invalid events (missing 'type' field)
- Tests error store performance

## Results

After running benchmarks, you'll find:

```
benchmarks/results/
β”œβ”€β”€ baseline_stats.csv # Request/response stats
β”œβ”€β”€ baseline_stats_history.csv # Time-series data
β”œβ”€β”€ baseline_failures.csv # Error details
β”œβ”€β”€ baseline.html # Visual report
β”œβ”€β”€ baseline.log # Console output
└── ... (same for each scenario)
```

## Analyzing Results

### View HTML Reports

Open any `.html` file in your browser for interactive charts and graphs.

### CSV Analysis

```bash
# Install pandas if needed
uv pip install pandas matplotlib

# Analyze results
python benchmarks/analyze_results.py benchmarks/results/
```

### Key Metrics to Look For

**Throughput**:
- Requests/second (RPS)
- Total requests completed
- Failure rate

**Latency**:
- Average response time
- p50, p95, p99 percentiles
- Max latency

**Resource Usage** (monitor separately):
- CPU utilization
- Memory usage
- Disk I/O
- GCS write rate

## Profiling

### CPU Profiling with py-spy

```bash
# Terminal 1: Start EventKit
uv run py-spy record --native -o profile.svg -- \
uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000

# Terminal 2: Run load test
locust -f benchmarks/locustfile.py --host=http://localhost:8000 \
--users=50 --spawn-rate=10 --run-time=60s --headless \
--user RealisticUser

# View flame graph
open profile.svg
```

### Memory Profiling with memray

```bash
# Install memray
uv pip install memray

# Profile memory usage
uv run memray run -o memory.bin \
uvicorn eventkit.api.app:app --host=0.0.0.0 --port=8000

# Generate flame graph
memray flamegraph memory.bin
```

## Comparing Queue Modes

EventKit supports two queue modes with different trade-offs:

### AsyncQueue (Single-Server)
**Best for**: Development, single-instance deployments

**Characteristics**:
- In-process Python `asyncio.Queue`
- Low latency (microseconds to enqueue)
- No external dependencies
- Simpler architecture
- Limited to single server's resources

**Benchmark this to measure**:
- Maximum throughput for single instance
- Memory pressure under load
- Ring buffer β†’ queue β†’ loader latency

### PubSub (Distributed)
**Best for**: Production, horizontal scaling

**Characteristics**:
- External queue (Google Cloud Pub/Sub)
- Higher latency (milliseconds to publish)
- Distributed processing
- Horizontal scalability
- More complex architecture

**Benchmark this to measure**:
- Network overhead (API β†’ Pub/Sub)
- Multi-worker scalability
- Fault tolerance (nack/redelivery)

### Running Comparisons

```bash
# Benchmark AsyncQueue (single emulator needed)
docker compose up -d gcs-emulator
./benchmarks/run_benchmarks.sh async

# Benchmark PubSub (both emulators needed)
docker compose up -d pubsub-emulator gcs-emulator
./benchmarks/run_benchmarks.sh pubsub

# Compare results
diff benchmarks/results/async/baseline.log \
benchmarks/results/pubsub/baseline.log

# Stop emulators when done
docker compose down
```

**What to expect**:
- **AsyncQueue**: Higher throughput, lower latency (no network hops)
- **PubSub**: Lower throughput (network + serialization overhead), but horizontally scalable

## Configuration Tuning

Test different configurations by setting environment variables:

```bash
# Increase batch size
export EVENTKIT_EVENT_LOADER_BATCH_SIZE=1000
export EVENTKIT_ASYNC_QUEUE_WORKERS=8

# Disable ring buffer to test direct queue
export EVENTKIT_RING_BUFFER_ENABLED=false

# Adjust flush interval
export EVENTKIT_EVENT_LOADER_FLUSH_INTERVAL=60

# Run benchmarks
./benchmarks/run_benchmarks.sh async
```

## Expected Results

### Baseline Throughput
- **Target**: 10,000+ events/sec
- **Realistic**: 30,000-50,000 events/sec (local)
- **Bottleneck**: Pydantic validation, Parquet serialization

### Realistic Workload
- **Target**: 5,000+ events/sec
- **Realistic**: 10,000-20,000 events/sec (local)
- **Latency**: p99 < 50ms

### Resource Usage
- **CPU**: 50-80% (4 cores) at 10k events/sec
- **Memory**: 200-500 MB (steady state)
- **Peak Memory**: < 2 GB during spikes

## Troubleshooting

### Locust fails to connect

```bash
# Check EventKit is running
curl http://localhost:8000/health

# Check metrics endpoint
curl http://localhost:9090/metrics
```

### High failure rate

- Check EventKit logs for errors
- Verify event payloads are valid
- Reduce load (lower users/spawn-rate)

### Inconsistent results

- Run benchmarks 3x and average results
- Close other applications
- Ensure CPU isn't throttled
- Check disk space (ring buffer)

## Next Steps

1. **Run baselines**: Establish current performance
2. **Identify bottlenecks**: Use profiling to find hot paths
3. **Optimize**: Focus on highest-impact improvements
4. **Validate**: Re-run benchmarks to measure improvement
5. **Document**: Update README with actual numbers

## Related

- [EventKit Architecture](../ARCHITECTURE.md)
- [EventKit README](../README.md)
1 change: 1 addition & 0 deletions benchmarks/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""EventKit performance benchmarks."""
Loading
Loading