The TTB Label Verifier is designed with a fail-open architecture that prioritizes availability and rapid recovery over completeness. The system can operate in degraded mode with reduced capabilities rather than failing completely.
Traditional approach (fail-closed):
- Application won't start until all dependencies are available
- Long initialization times (15-20 minutes)
- Single point of failure (Ollama model download)
- Poor user experience during deployment
Our approach (fail-open):
- Application starts immediately with available backends
- Fast RTO: 2-3 minutes to operational
- Graceful degradation when Ollama unavailable
- Progressive enhancement as capabilities come online
When: Ollama model not yet downloaded or unavailable
Capabilities:
- ✅ Ollama OCR available (~10 seconds per image)
- ✅ All validation endpoints functional
- ✅ Batch processing works
- ❌ Higher accuracy AI analysis not available
Health Status:
{
"status": "degraded",
"backends": {
"ollama": {"available": false, "error": "Model 'llama3.2-vision' not found"}
},
"capabilities": {
"ocr_backends": [],
"degraded_mode": true
}
}When: All backends available
Capabilities:
- ✅ Ollama OCR (high accuracy)
- ✅ Users can choose backend per request
- ✅ All features operational
Health Status:
{
"status": "healthy",
"backends": {
"ollama": {"available": true, "error": null}
},
"capabilities": {
"ocr_backends": ["ollama"],
"degraded_mode": false
}
}- FastAPI web framework
- Health endpoint (
GET /health) - Returns system status - Verify endpoint (
POST /verify) - Single label verification - Batch endpoint (
POST /verify/batch) - Multi-label verification - Error handling - Returns 503 with Retry-After when Ollama requested but unavailable
- Ollama Backend - Lazy initialization, high accuracy (~10s), requires model download
- Brand name validation (fuzzy matching, 90% threshold)
- ABV validation (product-specific tolerances)
- Net contents validation (volume extraction)
- Government warning validation (exact format matching)
- Bottler information extraction
Timeline:
T+0:00 Instance launched
T+0:30 Docker installed
T+1:00 Ollama container started
T+2:00 Verifier app deployed (DEGRADED MODE) ✅ Traffic served
T+2:01 Background model download begins
T+10:00 Model download completes
T+10:30 Ollama backend available (FULL CAPABILITY) ✅
Key Features:
- Non-blocking deployment - App deploys immediately, doesn't wait for model
- Background download - 6.7 GB model downloads in parallel with traffic serving
- Self-healing - If model not in S3, downloads from Ollama and exports to S3
- Space-aware - Uses
/homeinstead of/tmpfor large file downloads
- Application Load Balancer - HTTPS termination, health checks
- EC2 t3.medium - Docker host (2 vCPU, 4 GB RAM, 50 GB disk)
- Docker Compose - Container orchestration
- S3 - Model artifact storage for fast recovery
- SSM - Remote access without SSH keys
- ACM - TLS certificate management
| Scenario | Old Design | New Design | Improvement |
|---|---|---|---|
| Cold start (no S3 model) | 15-20 min | 2-3 min (degraded) | 87% faster |
| Warm start (S3 model exists) | 8-10 min | 2-3 min (degraded) | 75% faster |
| Full capability | 15-20 min | 10-12 min (full) | 40% faster |
| Backend | Speed | Accuracy | Use Case |
|---|---|---|---|
| Ollama | ~10s | Excellent | High-accuracy requirements, edge cases |
Endpoint: GET /health
Purpose: Single source of truth for system status
Used by: ALB health checks, monitoring systems, operators
Response Fields:
status: Overall system health (healthy,degraded)backends: Per-backend availability and errorscapabilities: Available OCR backends and mode
- Application logs: Docker container stdout/stderr
- Model download logs:
/var/log/ollama-model-download.log - System logs: CloudWatch via SSM agent (optional)
Critical Metrics:
- Health endpoint status (should be 200, may show
degraded) - Response time percentiles (p50, p95, p99)
- Error rates by endpoint
- Backend availability (Ollama should always be 100%)
Alerts:
- Ollama unavailable >1 hour (warning - prolonged degradation)
- Error rate >5% (warning)
- Response time p95 >10s (warning)
- Public IP: EC2 instance has public IP (required for Docker Hub, S3, SSM in default VPC)
- Security Groups: Only ALB can reach port 8000, no SSH access
- HTTPS: Enforced via ALB with ACM certificate
- Future Enhancement: Remove public IP by adding VPC endpoints + NAT Gateway (see
infrastructure/FUTURE_ENHANCEMENTS.md)
- SSM: Remote access via AWS Systems Manager (no SSH keys)
- OIDC: GitHub Actions uses OIDC (no long-lived credentials)
- IAM: Least privilege roles for EC2 and GitHub Actions
- TLS 1.2+: Enforced on ALB
- No PII: Label images contain product info, not personal data
- Audit Trail: CloudTrail logs all API calls, ALB logs all requests
- Unit Tests - Individual components (validators, parsers)
- Integration Tests - API endpoints with mocked backends
- E2E Tests - Full stack with real OCR backends
- Load Tests - Performance and concurrency validation
- Model download progress endpoint - Real-time visibility into background download
- Metrics endpoint - Prometheus-compatible metrics
- Graceful shutdown - Drain connections before termination
- Remove public IP - Add VPC endpoints + NAT Gateway
- Multi-AZ deployment - HA across availability zones
- Auto-scaling - Scale based on request rate
- Artifact storage - Save request/response for debugging
- Webhooks - Async notification when batch complete
- Rate limiting - Protect against abuse
See infrastructure/FUTURE_ENHANCEMENTS.md for detailed enhancement proposals.
Internet
↓
CloudFront (WAF)
↓
AWS API Gateway
├── Authentication (API keys)
├── Rate limiting
├── Request throttling
├── CloudWatch metrics
↓
EKS Cluster
├── UI Node Pool
├── API Node Pool
├── Ollama Node Pool
Why API Gateway?
- ✅ Handles authentication/authorization
- ✅ Built-in rate limiting
- ✅ Request validation
- ✅ CloudWatch integration
- ✅ Usage plans and quotas
- ✅ No code changes needed