Production-grade JSON-RPC load balancer with intelligent routing, auto-failover, circuit breaking, and observability for Ethereum mainnet RPC providers.
- Overview
- Real-World Analogy
- Business Purpose
- Architecture Choices
- Features
- API Reference
- Local Development
- Docker Compose Setup
- Testing
- Monitoring & Observability
- Requirements Verification
A high-performance load balancer that distributes Ethereum JSON-RPC requests across multiple providers (Infura, Alchemy, etc.) with:
- Intelligent Routing: Round-robin or weighted strategies based on latency
- Auto-Failover: Circuit breakers automatically disable unhealthy providers
- Smart Caching: Redis-backed cache for deterministic RPC calls (finalized blocks, transactions)
- Observability: Prometheus metrics, structured logging, Grafana dashboards
- Operator Dashboard: Real-time monitoring UI for provider analytics and health
Tech Stack: TypeScript, Express, Redis, Prometheus, Grafana, Loki, Tempo
- Runtime: Node.js 18+
- Language: TypeScript 5.9
- Framework: Express 5.x
- Cache: Redis 7.0 (ioredis client)
- Circuit Breaker: Opossum
- HTTP Client: Axios
- Validation: Zod
- Metrics: Prometheus + prom-client
- Dashboards: Grafana
- Logging: Pino (structured JSON logs)
- Log Aggregation: Loki
- Distributed Tracing: Tempo
- Alerting: NodeMailer (email alerts)
- Framework: React 18+ with TypeScript
- Build Tool: Vite
- Styling: TailwindCSS
- Charts: Chart.js / Recharts
- HTTP Client: Axios
- State Management: React Context / Zustand
- Containerization: Docker + Docker Compose
- Testing: Vitest
- Load Testing: Autocannon
- Code Quality: ESLint + Prettier
Problem: At Luganodes, we rely on third-party RPC providers (Infura, Alchemy) to interact with Ethereum. Challenges:
- Cost Optimization: Some providers are expensive; naive round-robin wastes money on slow/unreliable providers
- Reliability: A single provider outage causes service downtime
- Performance: Redundant requests (e.g., fetching the same block 1000 times) are wasteful
Solution: This load balancer:
- Saves Money: Routes traffic to cost-effective, high-performing providers
- Increases Uptime: Auto-failover ensures 99.9%+ availability
- Boosts Speed: Caches ~40-60% of requests, reducing latency by 80%+
- Provides Visibility: Grafana dashboards show which providers are reliable/expensive
╔═════════════════════════════════════════════════════════════════════════════════╗
║ ║
║ CLIENT APPLICATIONS ║
║ (Web3 Apps, dApps, Wallets, Backend Services) ║
║ ║
╚═══════════════════════════════════════╦═════════════════════════════════════════╝
║
║ JSON-RPC Requests
║ POST / {"jsonrpc":"2.0", "method":"...", ...}
║
▼
╔════════════════════════════════════════════════════════════════════════════════╗
║ ETHEREUM RPC LOAD BALANCER ║
║ (Port 8080) ║
╠════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ REQUEST HANDLER (Express.js) │ ║
║ │ - Generate Correlation ID (UUID) │ ║
║ │ - Validate JSON-RPC payload │ ║
║ │ - Structured logging (Pino) │ ║
║ │ - CORS & middleware chain │ ║
║ └───────────────────────────────────┬───────────────────────────────────────┘ ║
║ │ ║
║ ▼ ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ INTELLIGENT CACHE LAYER │ ║
║ │ ┌─────────────────────────────────────────────────────────────────────┐ │ ║
║ │ │ Cache Decision Engine │ │ ║
║ │ │ - Is method cacheable? (eth_getBlockByNumber YES, eth_call NO) │ │ ║
║ │ │ - Contains "latest"? -> Skip cache │ │ ║
║ │ │ - Generate cache key: method + params + chain │ │ ║
║ │ └───────────────────────────┬─────────────────────────────────────────┘ │ ║
║ │ │ │ ║
║ │ ┌────────────────────┴─────────────────────┐ │ ║
║ │ │ │ │ ║
║ │ Cache HIT Cache MISS │ ║
║ │ │ │ │ ║
║ │ │ ┌──────────────────────────┐ │ │ ║
║ │ └───>│ REDIS CACHE │ │ │ ║
║ │ │ (Port 6379) │ │ │ ║
║ │ ├──────────────────────────┤ │ │ ║
║ │ │ Finalized: INF TTL │ │ │ ║
║ │ │ Recent: 5min TTL │ │ │ ║
║ │ │ Unfinalized: 30s TTL │ │ │ ║
║ │ └──────────────────────────┘ │ │ ║
║ │ │ │ │ ║
║ │ │ Return cached │ Forward request │ ║
║ │ │ response ▼ │ ║
║ └──────────────────────┼────────────────────────────────────────────────────┘ ║
║ │ │ ║
║ │ ▼ ║
║ ┌──────────────────────┼────────────────────────────────────────────────────┐ ║
║ │ PROVIDER MANAGER │ │ ║
║ │ │ │ ║
║ │ ┌───────────────────▼────────────────────────────────────────────────┐ │ ║
║ │ │ ROUTING STRATEGY SELECTOR │ │ ║
║ │ │ ┌──────────────────────┐ ┌────────────────────────────────────┐ │ │ ║
║ │ │ │ Round-Robin │ │ Weighted (EWMA Latency) │ │ │ ║
║ │ │ │ Equal distribution │ │ Faster providers get more load │ │ │ ║
║ │ │ └──────────────────────┘ └────────────────────────────────────┘ │ │ ║
║ │ └───────────────────────────┬────────────────────────────────────────┘ │ ║
║ │ │ │ ║
║ │ ┌───────────────────────────▼───────────────────────────────────────┐ │ ║
║ │ │ CIRCUIT BREAKER (Opossum) │ │ ║
║ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ ║
║ │ │ │ CLOSED │->│ OPEN │->│ HALF_OPEN │->│ CLOSED │ │ │ ║
║ │ │ │ (Normal) │ │ (Failed) │ │ (Testing) │ │(Recovered) │ │ │ ║
║ │ │ └────────────┘ └────────────┘ └────────────┘ └────────────┘ │ │ ║
║ │ │ - Failure threshold: 50% errors in window │ │ ║
║ │ │ - Timeout: 5s per request │ │ ║
║ │ │ - Reset timeout: 30s exponential backoff │ │ ║
║ │ └───────────────────────────────────────────────────────────────────┘ │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
╚═════════════════════════╦═══════════════════╦═══════════════════╦══════════════╝
║ ║ ║
▼ ▼ ▼
╔═════════════════════════╗ ╔════════════════════╗ ╔══════════════════╗
║ INFURA PROVIDER ║ ║ ALCHEMY PROVIDER ║ ║ QUICKNODE ║
║ (Ethereum Mainnet) ║ ║ (Ethereum Mainnet) ║ ║ (Backup) ║
╟─────────────────────────╢ ╟────────────────────╢ ╟──────────────────╢
║ Status: Healthy ║ ║ Status: Healthy ║ ║ Status: Healthy ║
║ Latency: 142ms ║ ║ Latency: 98ms ║ ║ Latency: 210ms ║
║ Weight: 32% ║ ║ Weight: 46% ║ ║ Weight: 22% ║
║ Requests: 4,521 ║ ║ Requests: 6,783 ║ ║ Requests: 2,156 ║
║ Errors: 12 (0.27%) ║ ║ Errors: 3 (0.04%) ║ ║ Errors: 45 (2%) ║
╚═══════════════╦═════════╝ ╚══════════╦═════════╝ ╚══════════╦═══════╝
║ ║ ║
╚══════════════════════╩══════════════════════╝
║
▼
╔═════════════════════════════════════════╗
║ ETHEREUM MAINNET BLOCKCHAIN ║
║ (Decentralized Network) ║
╚═════════════════════════════════════════╝
╔═════════════════════════════════════════════════════════════════════════════════╗
║ OBSERVABILITY & MONITORING STACK ║
╠═════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌─────────────────────┐ ┌─────────────────────┐ ┌─────────────────────────┐ ║
║ │ PROMETHEUS │ │ GRAFANA │ │ LOKI │ ║
║ │ (Port 9090) │ │ (Port 3001) │ │ (Port 3100) │ ║
║ ├─────────────────────┤ ├─────────────────────┤ ├─────────────────────────┤ ║
║ │ - Metrics scraping │->│ - Live dashboards │ │ - Log aggregation │ ║
║ │ - Time-series DB │ │ - Visualization │ │ - Full-text search │ ║
║ │ - PromQL queries │ │ - Alert manager │ │ - Log retention │ ║
║ │ - 15s scrape rate │ │ - Multi-tenancy │ │ - JSON parsing │ ║
║ └─────────────────────┘ └─────────────────────┘ └─────────────────────────┘ ║
║ ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ TEMPO (Distributed Tracing - Port 3200) │ ║
║ │ - End-to-end request tracing - Latency waterfall visualization │ ║
║ │ - Span correlation - Performance bottleneck detection │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ EMAIL ALERTING (NodeMailer) ║
║ - All providers down (CRITICAL) - Cache hit rate < 30% (WARNING) ║
║ - Only 1 provider remaining (WARN) - Error rate > 5% (WARNING) ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════════╝
╔═════════════════════════════════════════════════════════════════════════════════╗
║ ADMIN & OPERATOR INTERFACE ║
╠═════════════════════════════════════════════════════════════════════════════════╣
║ ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ ADMIN API (Port 3000) │ ║
║ │ Authentication: X-Admin-Token header │ ║
║ ├───────────────────────────────────────────────────────────────────────────┤ ║
║ │ Endpoints: │ ║
║ │ - POST /admin/providers -> Add new RPC provider │ ║
║ │ - DELETE /admin/providers/:id -> Remove provider │ ║
║ │ - PATCH /admin/providers/:id -> Update weight/status │ ║
║ │ - POST /admin/providers/:id/enable -> Force enable │ ║
║ │ - POST /admin/providers/:id/disable -> Force disable │ ║
║ │ - GET /admin/cache/stats -> Cache metrics │ ║
║ │ - DELETE /admin/cache -> Clear cache (all/provider/pattern) │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
║ ┌───────────────────────────────────────────────────────────────────────────┐ ║
║ │ OPERATOR DASHBOARD (React + TypeScript + Vite) │ ║
║ │ Real-time monitoring interface for DevOps/SRE teams │ ║
║ ├───────────────────────────────────────────────────────────────────────────┤ ║
║ │ Features: │ ║
║ │ - Live provider health status cards │ ║
║ │ - Request distribution pie/bar charts (Chart.js) │ ║
║ │ - Cache hit/miss rate trends │ ║
║ │ - Circuit breaker state visualization │ ║
║ │ - Provider latency comparison graphs │ ║
║ │ - Error rate alerts & notifications │ ║
║ │ - One-click provider enable/disable │ ║
║ └───────────────────────────────────────────────────────────────────────────┘ ║
║ ║
╚═════════════════════════════════════════════════════════════════════════════════╝
┏━━━━━━━━━━━━━━━━━━━━━━━┓
┃ CLIENT ┃
┃ (Web3 Application) ┃
┗━━━━━━━━━┯━━━━━━━━━━━━━┛
│
╔═══════════════════▼═══════════════════╗
║ 1. POST / (JSON-RPC Request) ║
║ { ║
║ "jsonrpc": "2.0", ║
║ "method": "eth_getBlockByNumber", ║
║ "params": ["0x12A4B7C", true], ║
║ "id": 1 ║
║ } ║
╚═══════════════════════════════════════╝
│
▼
╔═════════════════════════════════════════════════════════╗
║ 2. REQUEST HANDLER ║
║ ┌──────────────────────────────────────────────────┐ ║
║ │ - Generate Correlation ID: "req_abc123xyz" │ ║
║ │ - Validate JSON-RPC format │ ║
║ │ - Log incoming request │ ║
║ │ - Start latency timer │ ║
║ └──────────────────────────────────────────────────┘ ║
╚═════════════════════════════════════════════════════════╝
│
▼
╔═════════════════════════════════════════════════════════╗
║ 3. CACHE DECISION ENGINE ║
║ ┌──────────────────────────────────────────────────┐ ║
║ │ Is method cacheable? │ ║
║ │ -> eth_getBlockByNumber YES │ ║
║ │ -> eth_blockNumber NO │ ║
║ │ │ ║
║ │ Contains "latest" parameter? NO │ ║
║ │ │ ║
║ │ Generate Cache Key: │ ║
║ │ "eth:1:getBlockByNumber:0x12A4B7C:true" │ ║
║ └──────────────────────────────────────────────────┘ ║
╚═════════════════════════════════════════════════════════╝
│
┌─────────────────┴─────────────────┐
│ │
▼ ▼
┌─────────────────────────┐ ┌─────────────────────────┐
│ Cache HIT │ │ Cache MISS │
│ (Redis lookup: found) │ │ (No cached data) │
└───────────┬─────────────┘ └─────────┬───────────────┘
│ │
│ ▼
│ ╔════════════════════════════════════════╗
│ ║ 4. PROVIDER SELECTION ║
│ ║ ┌──────────────────────────────────┐ ║
│ ║ │ Routing Strategy: WEIGHTED │ ║
│ ║ │ │ ║
│ ║ │ Available Providers: │ ║
│ ║ │ - Infura (142ms) -> 32% load │ ║
│ ║ │ - Alchemy (98ms) -> 46% load │ ║
│ ║ │ - QuickNode (210ms) -> 22% load │ ║
│ ║ │ │ ║
│ ║ │ Random: 0.521 │ ║
│ ║ │ Selected: Alchemy (fastest!) │ ║
│ ║ └──────────────────────────────────┘ ║
│ ╚════════════════════════════════════════╝
│ │
│ ▼
│ ╔════════════════════════════════════════╗
│ ║ 5. CIRCUIT BREAKER CHECK ║
│ ║ ┌──────────────────────────────────┐ ║
│ ║ │ Provider: Alchemy │ ║
│ ║ │ State: CLOSED (Healthy) │ ║
│ ║ │ Recent Errors: 3/1000 (0.3%) │ ║
│ ║ │ Last Success: 2s ago │ ║
│ ║ │ Result: Allow request to pass │ ║
│ ║ └──────────────────────────────────┘ ║
│ ╚════════════════════════════════════════╝
│ │
│ ▼
│ ╔════════════════════════════════════════╗
│ ║ 6. FORWARD TO PROVIDER ║
│ ║ POST https://eth-mainnet.g.alchemy... ║
│ ║ Timeout: 5000ms ║
│ ╚════════════════════════════════════════╝
│ │
│ │ Latency: 98ms
│ ▼
│ ╔════════════════════════════════════════╗
│ ║ 7. RESPONSE RECEIVED ║
│ ║ { ║
│ ║ "jsonrpc": "2.0", ║
│ ║ "id": 1, ║
│ ║ "result": { ... block data ... } ║
│ ║ } ║
│ ╚════════════════════════════════════════╝
│ │
│ ▼
│ ╔════════════════════════════════════════╗
│ ║ 8. CACHE RESPONSE ║
│ ║ ┌──────────────────────────────────┐ ║
│ ║ │ Block: 19,400,000 │ ║
│ ║ │ Current: 19,500,000 │ ║
│ ║ │ Diff: 100,000 blocks > 64 │ ║
│ ║ │ Status: FINALIZED │ ║
│ ║ │ TTL: Infinite (1 year) │ ║
│ ║ │ Store in Redis: SUCCESS │ ║
│ ║ └──────────────────────────────────┘ ║
│ ╚════════════════════════════════════════╝
│ │
└───────────────────────────────┘
│
▼
╔═════════════════════════════════════════════════════════╗
║ 9. RETURN RESPONSE WITH METADATA ║
║ ┌──────────────────────────────────────────────────┐ ║
║ │ Headers: │ ║
║ │ - X-Cache-Hit: false │ ║
║ │ - X-Provider-Id: alchemy │ ║
║ │ - X-Correlation-Id: req_abc123xyz │ ║
║ │ - X-Response-Time: 98ms │ ║
║ │ │ ║
║ │ Body: { "jsonrpc": "2.0", "result": {...} } │ ║
║ └──────────────────────────────────────────────────┘ ║
╚═════════════════════════════════════════════════════════╝
│
▼
┏━━━━━━━━━━━━━━━━━━━━━━━┓
┃ CLIENT ┃
┃ (Response received) ┃
┗━━━━━━━━━━━━━━━━━━━━━━━┛
╔═══════════════════════════════════════════════════════╗
║ 10. METRICS & LOGGING ║
║ - Prometheus: rpc_requests_total{provider=alchemy}++ ║
║ - Pino: {"correlationId":"req_abc123xyz", ║
║ "method":"eth_getBlockByNumber", ║
║ "cacheHit":false, "latency":98} ║
╚═══════════════════════════════════════════════════════╝
╔══════════════════════════════════════╗
║ ║
║ CLOSED (Healthy) ║
║ ║
║ - All requests pass through ║
║ - Monitor failure rate ║
║ - Track error count in window ║
║ - Normal operation mode ║
║ ║
╚═════════════╦════════════════════════╝
║
║ Threshold Exceeded
║ (e.g., 50% errors in 10s window)
║ or 3 consecutive failures
║
▼
╔══════════════════════════════════════╗
║ ║
║ OPEN (Unhealthy) ║
║ ║
║ - Block ALL requests ║
║ - Fast-fail immediately (no delay) ║
║ - Return error instantly ║
║ - Wait for cooldown period ║
║ - Timer: 30s (exponential backoff) ║
║ ║
╚═════════════╦════════════════════════╝
║
║ After resetTimeout
║ (30s -> 60s -> 120s -> ...)
║
▼
╔══════════════════════════════════════╗
║ ║
║ HALF_OPEN (Testing) ║
║ ║
║ - Allow limited test requests ║
║ - Monitor closely for success ║
║ - One request at a time ║
║ - Decide: Recover or re-open ║
║ ║
╚══════╦══════════════════════╦════════╝
║ ║
SUCCESS ║ ║ FAILURE
(Provider OK) ║ ║ (Still broken)
▼ ▼
╔════════════════════╗ ╔═════════════════════╗
║ CLOSED ║ ║ OPEN ║
║ (Auto-recovered) ║ ║ (Retry later) ║
║ Resume normal ops ║ ║ Increase backoff ║
╚════════════════════╝ ╚═════════════════════╝
┌───────────────────────────────────────────────────────────────────────────┐
│ CIRCUIT BREAKER CONFIGURATION │
├───────────────────────────────────────────────────────────────────────────┤
│ Timeout: 5000ms (per request) │
│ Error Threshold: 50% (errors in rolling window) │
│ Reset Timeout: 30000ms (initial), exponential backoff │
│ Rolling Window: 10 seconds │
│ Volume Threshold: 10 requests (minimum before triggering) │
│ Failure Detector: HTTP 5xx, Timeout, Network Error │
└───────────────────────────────────────────────────────────────────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ TTL STRATEGY REFERENCE TABLE ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃ ┃
┃ ╔════════════════════════╦═══════════════╦══════════════════════════════╗ ┃
┃ ║ BLOCK AGE ║ TTL ║ REASON ║ ┃
┃ ╠════════════════════════╬═══════════════╬══════════════════════════════╣ ┃
┃ ║ >64 blocks from head ║ INF (1 year) ║ Finalized, immutable ║ ┃
┃ ║ 13-64 blocks from head ║ 5 minutes ║ Likely finalized, safe ║ ┃
┃ ║ 1-12 blocks from head ║ 30 seconds ║ Unfinalized, may reorg ║ ┃
┃ ║ "latest" parameter ║ NEVER CACHE ║ Always refers to chain head ║ ┃
┃ ╚════════════════════════╩═══════════════╩══════════════════════════════╝ ┃
┃ ┃
┃ ╔════════════════════════════════════════╦═══════════════════════════════╗ ┃
┃ ║ METHOD ║ CACHEABLE? ║ ┃
┃ ╠════════════════════════════════════════╬═══════════════════════════════╣ ┃
┃ ║ eth_getBlockByNumber (finalized) ║ YES (with TTL logic) ║ ┃
┃ ║ eth_getBlockByHash ║ YES (infinite TTL) ║ ┃
┃ ║ eth_getTransactionByHash ║ YES (infinite TTL) ║ ┃
┃ ║ eth_getTransactionReceipt ║ YES (infinite TTL) ║ ┃
┃ ║ eth_blockNumber ║ NO (always current) ║ ┃
┃ ║ eth_gasPrice ║ NO (highly volatile) ║ ┃
┃ ║ eth_call ║ NO (state-dependent) ║ ┃
┃ ║ eth_getBalance (with "latest") ║ NO (changes every block) ║ ┃
┃ ╚════════════════════════════════════════╩═══════════════════════════════╝ ┃
┃ ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛
╔═══════════════════════════════════════════════════════════════════════════════╗
║ CACHE PERFORMANCE METRICS (Expected) ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║ Cache Hit Rate: 40-60% (typical Web3 workload) ║
║ Latency Reduction: ~80% (500ms -> 100ms for cached requests) ║
║ Cost Savings: 30-50% reduction in provider API calls ║
║ Memory Usage (Redis): ~100MB for 50,000 cached blocks ║
║ Eviction Policy: allkeys-lru (Least Recently Used) ║
║ Max Memory: 256MB (configurable) ║
╚═══════════════════════════════════════════════════════════════════════════════╝
This project follows a clean layered architecture for maintainability, testability, and separation of concerns:
Benefits:
- Separation of Concerns: Each layer has a single, well-defined responsibility
- Testability: Easy to mock dependencies and write unit tests for each layer
- Maintainability: Changes in one layer don't cascade to others (loose coupling)
- Scalability: Individual layers can be optimized or replaced independently
- Reusability: Service layer logic can be reused across different controllers
- Clarity: New developers can quickly understand the codebase structure
✅ HTTP server accepting JSON-RPC requests on port 8080
✅ Multiple backend RPC providers (configurable via environment variables)
✅ Routing Strategies:
round-robin: Distribute evenly across healthy providersweighted: Route based on EWMA latency (faster providers get more traffic)
✅ Admin API for provider management (add/remove/enable/disable/update weights)
✅ Periodic health checks with staggered jitter (30s cycle)
✅ Success/failure rate tracking per provider
✅ Circuit Breaker: Disables providers after N failures, auto-recovery with exponential backoff
✅ Configurable thresholds (timeout, error threshold, reset timeout)
✅ Cacheable Methods: eth_getBlockByNumber, eth_getBlockByHash, eth_getTransactionByHash, eth_getTransactionReceipt
✅ Non-Cacheable Methods: eth_blockNumber, eth_gasPrice, eth_call, any call with "latest" parameter
✅ TTL Strategy:
- Infinite TTL for finalized blocks (>64 blocks old)
- 5-minute TTL for recent blocks (within 64 blocks)
- 30-second TTL for unfinalized blocks
✅ Redis-backed with automatic key generation
✅ Structured Logging (Pino): Request tracing with correlation IDs
✅ Prometheus Metrics:
- Per-provider: request count, success/failure rate, latency, circuit breaker state
- System-wide: cache hit rate, active providers, error rate by type
✅ Alerting: Email alerts for critical conditions (all providers down, cache issues)
✅ Grafana Dashboards: Pre-configured dashboards for provider health, request distribution, cache stats
✅ Request Retry: Automatically retries with a different provider on failure
✅ Docker Compose: Full stack (Redis, Prometheus, Grafana, Loki, Tempo)
✅ Load Testing: Autocannon-based load test scripts with realistic traffic patterns
✅ Operator Dashboard: React-based UI for real-time monitoring
For detailed verification of all problem statement requirements, see:
📄 REQUIREMENTS-VERIFICATION.md
Summary: ✅ ALL REQUIREMENTS MET (100%)
- ✅ Core Functionality (9/9)
- ✅ Health Monitoring (7/7)
- ✅ Intelligent Caching (10/10)
- ✅ Observability (9/9)
- ✅ Alerting (6/6)
- ✅ Bonus Features (3/3)
For complete API documentation, see API-REFERENCE.md.
Public Endpoints (Port 8080):
POST /- JSON-RPC proxy for Ethereum requestsGET /providers- View provider statistics (read-only)GET /health- Health checkGET /health/detailed- Detailed health informationGET /metrics- Prometheus metrics
Admin Endpoints (Port 8081) - Basic Auth Required (admin:changeme):
GET /admin/providers- View all providersGET /admin/providers/:id- View specific providerPOST /admin/providers- Add new providerPATCH /admin/providers/:id- Update provider (weight, enable/disable)DELETE /admin/providers/:id- Remove providerPOST /admin/providers/:id/enable- Force enable providerPOST /admin/providers/:id/disable- Force disable providerGET /admin/cache/stats- View cache statisticsDELETE /admin/cache- Clear cache (all/provider/pattern)
Authentication: Admin endpoints use HTTP Basic Authentication (not JWT).
# Example: View all providers
curl -u admin:changeme http://localhost:8081/admin/providers
# Example: JSON-RPC request
curl -X POST http://localhost:8080 \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "eth_blockNumber",
"params": [],
"id": 1
}'→ See API-REFERENCE.md for complete documentation with request/response examples.
- Node.js 18+ (Download)
- Docker & Docker Compose (Download)
- RPC Provider API Keys: Infura, Alchemy, or any Ethereum JSON-RPC provider (optional - public endpoints available)
-
Clone the repository
git clone https://github.com/HarshitPG/Ethereum-RPC-Load-Balancer.git cd Ethereum-RPC-Load-Balancer -
Install backend dependencies
npm install
-
Configure environment variables
cp .env.example .env
Edit
.envwith your configuration. The project includes working public RPC endpoints, or you can add your own API keys:# Public endpoints (works out of the box) INFURA_URL=https://eth.llamarpc.com ALCHEMY_URL=https://ethereum.publicnode.com # Or use your own keys # INFURA_URL=https://mainnet.infura.io/v3/YOUR_API_KEY # ALCHEMY_URL=https://eth-mainnet.g.alchemy.com/v2/YOUR_API_KEY
-
Start all services (Redis, Prometheus, Grafana, Loki, Tempo)
docker-compose up -d
This spins up:
- Redis (Port 6379)
- Prometheus (Port 9090)
- Grafana (Port 3001 - username:
admin, password:admin) - Loki (Port 3100)
- Tempo (Port 3200)
-
Start the backend application
# Development mode (hot reload) npm run dev # Production mode npm run build npm start
Backend runs on:
- Public API: http://localhost:8080
- Admin API: http://localhost:8081
-
Test the backend API
# Health check curl http://localhost:8080/health # JSON-RPC request curl -X POST http://localhost:8080 \ -H "Content-Type: application/json" \ -d '{ "jsonrpc": "2.0", "method": "eth_blockNumber", "params": [], "id": 1 }' # View providers curl http://localhost:8080/providers # Admin API (basic auth required) curl http://localhost:8081/admin/providers \ -u admin:changeme
-
Navigate to dashboard directory
cd dashboard -
Install frontend dependencies
npm install
-
Configure API endpoint (if needed)
Edit
dashboard/.envordashboard/src/config.tsto point to your backend:VITE_API_URL=http://localhost:8080 VITE_ADMIN_API_URL=http://localhost:8081
-
Start the frontend development server
npm run dev
Dashboard runs on: http://localhost:5173
-
Build for production
npm run build
Output will be in
dashboard/dist/
- Backend API: http://localhost:8080
- Admin API: http://localhost:8081 (Basic Auth:
admin:changeme) - Frontend Dashboard: http://localhost:5173
- Grafana: http://localhost:3001 (admin:admin)
- Prometheus: http://localhost:9090
- Redis: localhost:6379
To stop all services:
docker-compose downTo view logs:
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f redis
docker-compose logs -f prometheus
docker-compose logs -f grafanaTo reset data (clear cache, metrics):
docker-compose down -v
rm -rf data/Pre-configured dashboards are automatically provisioned:
-
RPC Load Balancer Overview
- Total requests, cache hit rate, error rate
- Provider health status
- Request distribution
-
Provider Analytics
- Per-provider request count, latency, errors
- Circuit breaker state transitions
- Success/failure rates
-
Cache Performance
- Hit/miss rates by method
- TTL distribution
- Memory usage
# Run all tests
npm test
# Run specific test file
npm test test/1-core-functionality.test.ts
# Run with coverage
npm test -- --coverage-
Core Functionality (
1-core-functionality.test.ts)- Provider registration, routing strategies, request handling
-
Health Monitoring (
2-health-monitoring.test.ts)- Health checks, circuit breakers, auto-failover
-
Caching Layer (
3-caching-layer.test.ts)- Cache hit/miss, TTL strategy, invalidation
-
Observability (
4-observability.test.ts)- Logging, metrics, correlation IDs
-
Alerting (
5-alerting.test.ts)- Email alerts, alert conditions
-
Circuit Breaker Integration (
7-circuit-breaker-integration.test.ts)- Failure detection, state transitions
-
Performance & Load Testing (
8-performance-load-testing.test.ts)- Throughput, latency under load
# Minimal load test (30 seconds)
bash scripts/load-test-minimal.sh
# Custom load test with autocannon
npx autocannon -c 50 -d 60 -m POST \
-H "Content-Type: application/json" \
-b '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
http://localhost:8080Expected Results:
- Throughput: 500-1000 req/s (cached), 100-200 req/s (uncached)
- Latency (p95): <100ms (cached), <500ms (uncached)
- Error Rate: <1%