Ethereum RPC Load Balancer - Backend

Production-grade JSON-RPC load balancer with intelligent routing, auto-failover, circuit breaking, and observability for Ethereum mainnet RPC providers.

📋 Table of Contents

Overview
Real-World Analogy
Business Purpose
Architecture Choices
Features
API Reference
Local Development
Docker Compose Setup
Testing
Monitoring & Observability
Requirements Verification

🎯 Overview

A high-performance load balancer that distributes Ethereum JSON-RPC requests across multiple providers (Infura, Alchemy, etc.) with:

Intelligent Routing: Round-robin or weighted strategies based on latency
Auto-Failover: Circuit breakers automatically disable unhealthy providers
Smart Caching: Redis-backed cache for deterministic RPC calls (finalized blocks, transactions)
Observability: Prometheus metrics, structured logging, Grafana dashboards
Operator Dashboard: Real-time monitoring UI for provider analytics and health

Tech Stack: TypeScript, Express, Redis, Prometheus, Grafana, Loki, Tempo

Tech Stack

Backend

Runtime: Node.js 18+
Language: TypeScript 5.9
Framework: Express 5.x
Cache: Redis 7.0 (ioredis client)
Circuit Breaker: Opossum
HTTP Client: Axios
Validation: Zod

Observability & Monitoring

Metrics: Prometheus + prom-client
Dashboards: Grafana
Logging: Pino (structured JSON logs)
Log Aggregation: Loki
Distributed Tracing: Tempo
Alerting: NodeMailer (email alerts)

Frontend (Dashboard)

Framework: React 18+ with TypeScript
Build Tool: Vite
Styling: TailwindCSS
Charts: Chart.js / Recharts
HTTP Client: Axios
State Management: React Context / Zustand

DevOps & Infrastructure

Containerization: Docker + Docker Compose
Testing: Vitest
Load Testing: Autocannon
Code Quality: ESLint + Prettier

💼 Business Purpose

Problem: At Luganodes, we rely on third-party RPC providers (Infura, Alchemy) to interact with Ethereum. Challenges:

Cost Optimization: Some providers are expensive; naive round-robin wastes money on slow/unreliable providers
Reliability: A single provider outage causes service downtime
Performance: Redundant requests (e.g., fetching the same block 1000 times) are wasteful

Solution: This load balancer:

Saves Money: Routes traffic to cost-effective, high-performing providers
Increases Uptime: Auto-failover ensures 99.9%+ availability
Boosts Speed: Caches ~40-60% of requests, reducing latency by 80%+
Provides Visibility: Grafana dashboards show which providers are reliable/expensive

🏗️ High-Level Architecture

System Architecture Diagram

╔═════════════════════════════════════════════════════════════════════════════════╗
║                                                                                 ║
║                            CLIENT APPLICATIONS                                  ║
║                   (Web3 Apps, dApps, Wallets, Backend Services)                 ║
║                                                                                 ║
╚═══════════════════════════════════════╦═════════════════════════════════════════╝
                                        ║
                                        ║  JSON-RPC Requests
                                        ║  POST / {"jsonrpc":"2.0", "method":"...", ...}
                                        ║
                                        ▼
╔════════════════════════════════════════════════════════════════════════════════╗
║                       ETHEREUM RPC LOAD BALANCER                               ║
║                              (Port 8080)                                       ║
╠════════════════════════════════════════════════════════════════════════════════╣
║                                                                                ║
║  ┌───────────────────────────────────────────────────────────────────────────┐ ║
║  │  REQUEST HANDLER (Express.js)                                             │ ║
║  │  - Generate Correlation ID (UUID)                                         │ ║
║  │  - Validate JSON-RPC payload                                              │ ║
║  │  - Structured logging (Pino)                                              │ ║
║  │  - CORS & middleware chain                                                │ ║
║  └───────────────────────────────────┬───────────────────────────────────────┘ ║
║                                      │                                         ║
║                                      ▼                                         ║
║  ┌───────────────────────────────────────────────────────────────────────────┐ ║
║  │  INTELLIGENT CACHE LAYER                                                  │ ║
║  │  ┌─────────────────────────────────────────────────────────────────────┐  │ ║
║  │  │  Cache Decision Engine                                              │  │ ║
║  │  │  - Is method cacheable? (eth_getBlockByNumber YES, eth_call NO)     │  │ ║
║  │  │  - Contains "latest"? -> Skip cache                                 │  │ ║
║  │  │  - Generate cache key: method + params + chain                      │  │ ║
║  │  └───────────────────────────┬─────────────────────────────────────────┘  │ ║
║  │                              │                                            │ ║
║  │         ┌────────────────────┴─────────────────────┐                      │ ║
║  │         │                                          │                      │ ║
║  │    Cache HIT                                  Cache MISS                  │ ║
║  │         │                                          │                      │ ║
║  │         │    ┌──────────────────────────┐          │                      │ ║
║  │         └───>│  REDIS CACHE             │          │                      │ ║
║  │              │  (Port 6379)             │          │                      │ ║
║  │              ├──────────────────────────┤          │                      │ ║
║  │              │ Finalized: INF TTL       │          │                      │ ║
║  │              │ Recent: 5min TTL         │          │                      │ ║
║  │              │ Unfinalized: 30s TTL     │          │                      │ ║
║  │              └──────────────────────────┘          │                      │ ║
║  │                      │                             │                      │ ║
║  │                      │ Return cached               │ Forward request      │ ║
║  │                      │ response                    ▼                      │ ║
║  └──────────────────────┼────────────────────────────────────────────────────┘ ║
║                         │                             │                        ║
║                         │                             ▼                        ║
║  ┌──────────────────────┼────────────────────────────────────────────────────┐ ║
║  │  PROVIDER MANAGER    │                                                    │ ║
║  │                      │                                                    │ ║
║  │  ┌───────────────────▼────────────────────────────────────────────────┐   │ ║
║  │  │  ROUTING STRATEGY SELECTOR                                         │   │ ║
║  │  │  ┌──────────────────────┐  ┌────────────────────────────────────┐  │   │ ║
║  │  │  │  Round-Robin         │  │  Weighted (EWMA Latency)           │  │   │ ║
║  │  │  │  Equal distribution  │  │  Faster providers get more load    │  │   │ ║
║  │  │  └──────────────────────┘  └────────────────────────────────────┘  │   │ ║
║  │  └───────────────────────────┬────────────────────────────────────────┘   │ ║
║  │                              │                                            │ ║
║  │  ┌───────────────────────────▼───────────────────────────────────────┐    │ ║
║  │  │  CIRCUIT BREAKER (Opossum)                                        │    │ ║
║  │  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐   │    │ ║
║  │  │  │   CLOSED   │->│    OPEN    │->│ HALF_OPEN  │->│   CLOSED   │   │    │ ║
║  │  │  │  (Normal)  │  │  (Failed)  │  │  (Testing) │  │(Recovered) │   │    │ ║
║  │  │  └────────────┘  └────────────┘  └────────────┘  └────────────┘   │    │ ║
║  │  │  - Failure threshold: 50% errors in window                        │    │ ║
║  │  │  - Timeout: 5s per request                                        │    │ ║
║  │  │  - Reset timeout: 30s exponential backoff                         │    │ ║
║  │  └───────────────────────────────────────────────────────────────────┘    │ ║
║  └───────────────────────────────────────────────────────────────────────────┘ ║
║                                                                                ║
╚═════════════════════════╦═══════════════════╦═══════════════════╦══════════════╝
                          ║                   ║                   ║
                          ▼                   ▼                   ▼
        ╔═════════════════════════╗ ╔════════════════════╗ ╔══════════════════╗
        ║   INFURA PROVIDER       ║ ║ ALCHEMY PROVIDER   ║ ║ QUICKNODE        ║
        ║   (Ethereum Mainnet)    ║ ║ (Ethereum Mainnet) ║ ║ (Backup)         ║
        ╟─────────────────────────╢ ╟────────────────────╢ ╟──────────────────╢
        ║ Status: Healthy         ║ ║ Status: Healthy    ║ ║ Status: Healthy  ║
        ║ Latency: 142ms          ║ ║ Latency: 98ms      ║ ║ Latency: 210ms   ║
        ║ Weight: 32%             ║ ║ Weight: 46%        ║ ║ Weight: 22%      ║
        ║ Requests: 4,521         ║ ║ Requests: 6,783    ║ ║ Requests: 2,156  ║
        ║ Errors: 12 (0.27%)      ║ ║ Errors: 3 (0.04%)  ║ ║ Errors: 45 (2%)  ║
        ╚═══════════════╦═════════╝ ╚══════════╦═════════╝ ╚══════════╦═══════╝
                        ║                      ║                      ║
                        ╚══════════════════════╩══════════════════════╝
                                             ║
                                             ▼
                        ╔═════════════════════════════════════════╗
                        ║  ETHEREUM MAINNET BLOCKCHAIN            ║
                        ║  (Decentralized Network)                ║
                        ╚═════════════════════════════════════════╝


╔═════════════════════════════════════════════════════════════════════════════════╗
║                         OBSERVABILITY & MONITORING STACK                        ║
╠═════════════════════════════════════════════════════════════════════════════════╣
║                                                                                 ║
║  ┌─────────────────────┐  ┌─────────────────────┐  ┌─────────────────────────┐  ║
║  │ PROMETHEUS          │  │ GRAFANA             │  │ LOKI                    │  ║
║  │ (Port 9090)         │  │ (Port 3001)         │  │ (Port 3100)             │  ║
║  ├─────────────────────┤  ├─────────────────────┤  ├─────────────────────────┤  ║
║  │ - Metrics scraping  │->│ - Live dashboards   │  │ - Log aggregation       │  ║
║  │ - Time-series DB    │  │ - Visualization     │  │ - Full-text search      │  ║
║  │ - PromQL queries    │  │ - Alert manager     │  │ - Log retention         │  ║
║  │ - 15s scrape rate   │  │ - Multi-tenancy     │  │ - JSON parsing          │  ║
║  └─────────────────────┘  └─────────────────────┘  └─────────────────────────┘  ║
║                                                                                 ║
║  ┌───────────────────────────────────────────────────────────────────────────┐  ║
║  │ TEMPO (Distributed Tracing - Port 3200)                                   │  ║
║  │ - End-to-end request tracing      - Latency waterfall visualization       │  ║
║  │ - Span correlation                - Performance bottleneck detection      │  ║
║  └───────────────────────────────────────────────────────────────────────────┘  ║
║                                                                                 ║
║  EMAIL ALERTING (NodeMailer)                                                    ║
║  - All providers down (CRITICAL)      - Cache hit rate < 30% (WARNING)          ║
║  - Only 1 provider remaining (WARN)   - Error rate > 5% (WARNING)               ║
║                                                                                 ║
╚═════════════════════════════════════════════════════════════════════════════════╝


╔═════════════════════════════════════════════════════════════════════════════════╗
║                          ADMIN & OPERATOR INTERFACE                             ║
╠═════════════════════════════════════════════════════════════════════════════════╣
║                                                                                 ║
║  ┌───────────────────────────────────────────────────────────────────────────┐  ║
║  │  ADMIN API (Port 3000)                                                    │  ║
║  │  Authentication: X-Admin-Token header                                     │  ║
║  ├───────────────────────────────────────────────────────────────────────────┤  ║
║  │  Endpoints:                                                               │  ║
║  │  - POST   /admin/providers          -> Add new RPC provider               │  ║
║  │  - DELETE /admin/providers/:id      -> Remove provider                    │  ║
║  │  - PATCH  /admin/providers/:id      -> Update weight/status               │  ║
║  │  - POST   /admin/providers/:id/enable   -> Force enable                   │  ║
║  │  - POST   /admin/providers/:id/disable  -> Force disable                  │  ║
║  │  - GET    /admin/cache/stats        -> Cache metrics                      │  ║
║  │  - DELETE /admin/cache              -> Clear cache (all/provider/pattern) │  ║
║  └───────────────────────────────────────────────────────────────────────────┘  ║
║                                                                                 ║
║  ┌───────────────────────────────────────────────────────────────────────────┐  ║
║  │  OPERATOR DASHBOARD (React + TypeScript + Vite)                           │  ║
║  │  Real-time monitoring interface for DevOps/SRE teams                      │  ║
║  ├───────────────────────────────────────────────────────────────────────────┤  ║
║  │  Features:                                                                │  ║
║  │  - Live provider health status cards                                      │  ║
║  │  - Request distribution pie/bar charts (Chart.js)                         │  ║
║  │  - Cache hit/miss rate trends                                             │  ║
║  │  - Circuit breaker state visualization                                    │  ║
║  │  - Provider latency comparison graphs                                     │  ║
║  │  - Error rate alerts & notifications                                      │  ║
║  │  - One-click provider enable/disable                                      │  ║
║  └───────────────────────────────────────────────────────────────────────────┘  ║
║                                                                                 ║
╚═════════════════════════════════════════════════════════════════════════════════╝

Request Flow Diagram

                            ┏━━━━━━━━━━━━━━━━━━━━━━━┓
                            ┃   CLIENT              ┃
                            ┃   (Web3 Application)  ┃
                            ┗━━━━━━━━━┯━━━━━━━━━━━━━┛
                                      │
                  ╔═══════════════════▼═══════════════════╗
                  ║  1. POST / (JSON-RPC Request)         ║
                  ║  {                                    ║
                  ║    "jsonrpc": "2.0",                  ║
                  ║    "method": "eth_getBlockByNumber",  ║
                  ║    "params": ["0x12A4B7C", true],     ║
                  ║    "id": 1                            ║
                  ║  }                                    ║
                  ╚═══════════════════════════════════════╝
                                      │
                                      ▼
        ╔═════════════════════════════════════════════════════════╗
        ║  2. REQUEST HANDLER                                     ║
        ║  ┌──────────────────────────────────────────────────┐   ║
        ║  │ - Generate Correlation ID: "req_abc123xyz"       │   ║
        ║  │ - Validate JSON-RPC format                       │   ║
        ║  │ - Log incoming request                           │   ║
        ║  │ - Start latency timer                            │   ║
        ║  └──────────────────────────────────────────────────┘   ║
        ╚═════════════════════════════════════════════════════════╝
                                      │
                                      ▼
        ╔═════════════════════════════════════════════════════════╗
        ║  3. CACHE DECISION ENGINE                               ║
        ║  ┌──────────────────────────────────────────────────┐   ║
        ║  │ Is method cacheable?                             │   ║
        ║  │    -> eth_getBlockByNumber         YES           │   ║
        ║  │    -> eth_blockNumber               NO           │   ║
        ║  │                                                  │   ║
        ║  │ Contains "latest" parameter?        NO           │   ║
        ║  │                                                  │   ║
        ║  │ Generate Cache Key:                              │   ║
        ║  │    "eth:1:getBlockByNumber:0x12A4B7C:true"       │   ║
        ║  └──────────────────────────────────────────────────┘   ║
        ╚═════════════════════════════════════════════════════════╝
                                      │
                    ┌─────────────────┴─────────────────┐
                    │                                   │
                    ▼                                   ▼
        ┌─────────────────────────┐       ┌─────────────────────────┐
        │  Cache HIT              │       │  Cache MISS             │
        │  (Redis lookup: found)  │       │  (No cached data)       │
        └───────────┬─────────────┘       └─────────┬───────────────┘
                    │                               │
                    │                               ▼
                    │              ╔════════════════════════════════════════╗
                    │              ║  4. PROVIDER SELECTION                 ║
                    │              ║  ┌──────────────────────────────────┐  ║
                    │              ║  │ Routing Strategy: WEIGHTED       │  ║
                    │              ║  │                                  │  ║
                    │              ║  │ Available Providers:             │  ║
                    │              ║  │ - Infura    (142ms) -> 32% load  │  ║
                    │              ║  │ - Alchemy   (98ms)  -> 46% load  │  ║
                    │              ║  │ - QuickNode (210ms) -> 22% load  │  ║
                    │              ║  │                                  │  ║
                    │              ║  │ Random: 0.521                    │  ║
                    │              ║  │ Selected: Alchemy (fastest!)     │  ║
                    │              ║  └──────────────────────────────────┘  ║
                    │              ╚════════════════════════════════════════╝
                    │                               │
                    │                               ▼
                    │              ╔════════════════════════════════════════╗
                    │              ║  5. CIRCUIT BREAKER CHECK              ║
                    │              ║  ┌──────────────────────────────────┐  ║
                    │              ║  │ Provider: Alchemy                │  ║
                    │              ║  │ State: CLOSED (Healthy)          │  ║
                    │              ║  │ Recent Errors: 3/1000 (0.3%)     │  ║
                    │              ║  │ Last Success: 2s ago             │  ║
                    │              ║  │ Result: Allow request to pass    │  ║
                    │              ║  └──────────────────────────────────┘  ║
                    │              ╚════════════════════════════════════════╝
                    │                               │
                    │                               ▼
                    │              ╔════════════════════════════════════════╗
                    │              ║  6. FORWARD TO PROVIDER                ║
                    │              ║  POST https://eth-mainnet.g.alchemy... ║
                    │              ║  Timeout: 5000ms                       ║
                    │              ╚════════════════════════════════════════╝
                    │                               │
                    │                               │ Latency: 98ms
                    │                               ▼
                    │              ╔════════════════════════════════════════╗
                    │              ║  7. RESPONSE RECEIVED                  ║
                    │              ║  {                                     ║
                    │              ║    "jsonrpc": "2.0",                   ║
                    │              ║    "id": 1,                            ║
                    │              ║    "result": { ... block data ... }    ║
                    │              ║  }                                     ║
                    │              ╚════════════════════════════════════════╝
                    │                               │
                    │                               ▼
                    │              ╔════════════════════════════════════════╗
                    │              ║  8. CACHE RESPONSE                     ║
                    │              ║  ┌──────────────────────────────────┐  ║
                    │              ║  │ Block: 19,400,000                │  ║
                    │              ║  │ Current: 19,500,000              │  ║
                    │              ║  │ Diff: 100,000 blocks > 64        │  ║
                    │              ║  │ Status: FINALIZED                │  ║
                    │              ║  │ TTL: Infinite (1 year)           │  ║
                    │              ║  │ Store in Redis: SUCCESS          │  ║
                    │              ║  └──────────────────────────────────┘  ║
                    │              ╚════════════════════════════════════════╝
                    │                               │
                    └───────────────────────────────┘
                                      │
                                      ▼
        ╔═════════════════════════════════════════════════════════╗
        ║  9. RETURN RESPONSE WITH METADATA                       ║
        ║  ┌──────────────────────────────────────────────────┐   ║
        ║  │ Headers:                                         │   ║
        ║  │ - X-Cache-Hit: false                             │   ║
        ║  │ - X-Provider-Id: alchemy                         │   ║
        ║  │ - X-Correlation-Id: req_abc123xyz                │   ║
        ║  │ - X-Response-Time: 98ms                          │   ║
        ║  │                                                  │   ║
        ║  │ Body: { "jsonrpc": "2.0", "result": {...} }      │   ║
        ║  └──────────────────────────────────────────────────┘   ║
        ╚═════════════════════════════════════════════════════════╝
                                      │
                                      ▼
                            ┏━━━━━━━━━━━━━━━━━━━━━━━┓
                            ┃   CLIENT              ┃
                            ┃   (Response received) ┃
                            ┗━━━━━━━━━━━━━━━━━━━━━━━┛

        ╔═══════════════════════════════════════════════════════╗
        ║  10. METRICS & LOGGING                                ║
        ║  - Prometheus: rpc_requests_total{provider=alchemy}++ ║
        ║  - Pino: {"correlationId":"req_abc123xyz",            ║
        ║           "method":"eth_getBlockByNumber",            ║
        ║           "cacheHit":false, "latency":98}             ║
        ╚═══════════════════════════════════════════════════════╝

Circuit Breaker State Machine

                    ╔══════════════════════════════════════╗
                    ║                                      ║
                    ║      CLOSED (Healthy)                ║
                    ║                                      ║
                    ║  - All requests pass through         ║
                    ║  - Monitor failure rate              ║
                    ║  - Track error count in window       ║
                    ║  - Normal operation mode             ║
                    ║                                      ║
                    ╚═════════════╦════════════════════════╝
                                  ║
                                  ║  Threshold Exceeded
                                  ║  (e.g., 50% errors in 10s window)
                                  ║  or 3 consecutive failures
                                  ║
                                  ▼
                    ╔══════════════════════════════════════╗
                    ║                                      ║
                    ║        OPEN (Unhealthy)              ║
                    ║                                      ║
                    ║  - Block ALL requests                ║
                    ║  - Fast-fail immediately (no delay)  ║
                    ║  - Return error instantly            ║
                    ║  - Wait for cooldown period          ║
                    ║  - Timer: 30s (exponential backoff)  ║
                    ║                                      ║
                    ╚═════════════╦════════════════════════╝
                                  ║
                                  ║  After resetTimeout
                                  ║  (30s -> 60s -> 120s -> ...)
                                  ║
                                  ▼
                    ╔══════════════════════════════════════╗
                    ║                                      ║
                    ║     HALF_OPEN (Testing)              ║
                    ║                                      ║
                    ║  - Allow limited test requests       ║
                    ║  - Monitor closely for success       ║
                    ║  - One request at a time             ║
                    ║  - Decide: Recover or re-open        ║
                    ║                                      ║
                    ╚══════╦══════════════════════╦════════╝
                           ║                      ║
              SUCCESS      ║                      ║  FAILURE
           (Provider OK)   ║                      ║  (Still broken)
                           ▼                      ▼
              ╔════════════════════╗    ╔═════════════════════╗
              ║  CLOSED            ║    ║  OPEN               ║
              ║  (Auto-recovered)  ║    ║  (Retry later)      ║
              ║  Resume normal ops ║    ║  Increase backoff   ║
              ╚════════════════════╝    ╚═════════════════════╝

┌───────────────────────────────────────────────────────────────────────────┐
│  CIRCUIT BREAKER CONFIGURATION                                            │
├───────────────────────────────────────────────────────────────────────────┤
│  Timeout:              5000ms (per request)                               │
│  Error Threshold:      50% (errors in rolling window)                     │
│  Reset Timeout:        30000ms (initial), exponential backoff             │
│  Rolling Window:       10 seconds                                         │
│  Volume Threshold:     10 requests (minimum before triggering)            │
│  Failure Detector:     HTTP 5xx, Timeout, Network Error                   │
└───────────────────────────────────────────────────────────────────────────┘

Intelligent Caching Strategy


┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃  TTL STRATEGY REFERENCE TABLE                                               ┃
┣━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┫
┃                                                                             ┃
┃  ╔════════════════════════╦═══════════════╦══════════════════════════════╗  ┃
┃  ║ BLOCK AGE              ║ TTL           ║ REASON                       ║  ┃
┃  ╠════════════════════════╬═══════════════╬══════════════════════════════╣  ┃
┃  ║ >64 blocks from head   ║ INF (1 year)  ║ Finalized, immutable         ║  ┃
┃  ║ 13-64 blocks from head ║ 5 minutes     ║ Likely finalized, safe       ║  ┃
┃  ║ 1-12 blocks from head  ║ 30 seconds    ║ Unfinalized, may reorg       ║  ┃
┃  ║ "latest" parameter     ║ NEVER CACHE   ║ Always refers to chain head  ║  ┃
┃  ╚════════════════════════╩═══════════════╩══════════════════════════════╝  ┃
┃                                                                             ┃
┃  ╔════════════════════════════════════════╦═══════════════════════════════╗ ┃
┃  ║ METHOD                                 ║ CACHEABLE?                    ║ ┃
┃  ╠════════════════════════════════════════╬═══════════════════════════════╣ ┃
┃  ║ eth_getBlockByNumber (finalized)       ║ YES (with TTL logic)          ║ ┃
┃  ║ eth_getBlockByHash                     ║ YES (infinite TTL)            ║ ┃
┃  ║ eth_getTransactionByHash               ║ YES (infinite TTL)            ║ ┃
┃  ║ eth_getTransactionReceipt              ║ YES (infinite TTL)            ║ ┃
┃  ║ eth_blockNumber                        ║ NO (always current)           ║ ┃
┃  ║ eth_gasPrice                           ║ NO (highly volatile)          ║ ┃
┃  ║ eth_call                               ║ NO (state-dependent)          ║ ┃
┃  ║ eth_getBalance (with "latest")         ║ NO (changes every block)      ║ ┃
┃  ╚════════════════════════════════════════╩═══════════════════════════════╝ ┃
┃                                                                             ┃
┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛

╔═══════════════════════════════════════════════════════════════════════════════╗
║  CACHE PERFORMANCE METRICS (Expected)                                         ║
╠═══════════════════════════════════════════════════════════════════════════════╣
║  Cache Hit Rate:           40-60% (typical Web3 workload)                     ║
║  Latency Reduction:        ~80% (500ms -> 100ms for cached requests)          ║
║  Cost Savings:             30-50% reduction in provider API calls             ║
║  Memory Usage (Redis):     ~100MB for 50,000 cached blocks                    ║
║  Eviction Policy:          allkeys-lru (Least Recently Used)                  ║
║  Max Memory:               256MB (configurable)                               ║
╚═══════════════════════════════════════════════════════════════════════════════╝

🏗️ Architecture Choices

Layered Architecture Pattern

This project follows a clean layered architecture for maintainability, testability, and separation of concerns:

Benefits:

Separation of Concerns: Each layer has a single, well-defined responsibility
Testability: Easy to mock dependencies and write unit tests for each layer
Maintainability: Changes in one layer don't cascade to others (loose coupling)
Scalability: Individual layers can be optimized or replaced independently
Reusability: Service layer logic can be reused across different controllers
Clarity: New developers can quickly understand the codebase structure

✨ Features

Core Functionality

✅ HTTP server accepting JSON-RPC requests on port 8080
✅ Multiple backend RPC providers (configurable via environment variables)
✅ Routing Strategies:

round-robin: Distribute evenly across healthy providers
weighted: Route based on EWMA latency (faster providers get more traffic)
✅ Admin API for provider management (add/remove/enable/disable/update weights)

Health Monitoring

✅ Periodic health checks with staggered jitter (30s cycle)
✅ Success/failure rate tracking per provider
✅ Circuit Breaker: Disables providers after N failures, auto-recovery with exponential backoff
✅ Configurable thresholds (timeout, error threshold, reset timeout)

Intelligent Caching

✅ Cacheable Methods: eth_getBlockByNumber, eth_getBlockByHash, eth_getTransactionByHash, eth_getTransactionReceipt
✅ Non-Cacheable Methods: eth_blockNumber, eth_gasPrice, eth_call, any call with "latest" parameter
✅ TTL Strategy:

Infinite TTL for finalized blocks (>64 blocks old)
5-minute TTL for recent blocks (within 64 blocks)
30-second TTL for unfinalized blocks
✅ Redis-backed with automatic key generation

Observability

✅ Structured Logging (Pino): Request tracing with correlation IDs
✅ Prometheus Metrics:

Per-provider: request count, success/failure rate, latency, circuit breaker state
System-wide: cache hit rate, active providers, error rate by type
✅ Alerting: Email alerts for critical conditions (all providers down, cache issues)
✅ Grafana Dashboards: Pre-configured dashboards for provider health, request distribution, cache stats

Bonus Features

✅ Request Retry: Automatically retries with a different provider on failure
✅ Docker Compose: Full stack (Redis, Prometheus, Grafana, Loki, Tempo)
✅ Load Testing: Autocannon-based load test scripts with realistic traffic patterns
✅ Operator Dashboard: React-based UI for real-time monitoring

✅ Requirements Verification

For detailed verification of all problem statement requirements, see:

📄 REQUIREMENTS-VERIFICATION.md

Summary: ✅ ALL REQUIREMENTS MET (100%)

✅ Core Functionality (9/9)
✅ Health Monitoring (7/7)
✅ Intelligent Caching (10/10)
✅ Observability (9/9)
✅ Alerting (6/6)
✅ Bonus Features (3/3)

📡 API Reference

For complete API documentation, see API-REFERENCE.md.

Quick Overview

Public Endpoints (Port 8080):

POST / - JSON-RPC proxy for Ethereum requests
GET /providers - View provider statistics (read-only)
GET /health - Health check
GET /health/detailed - Detailed health information
GET /metrics - Prometheus metrics

Admin Endpoints (Port 8081) - Basic Auth Required (admin:changeme):

GET /admin/providers - View all providers
GET /admin/providers/:id - View specific provider
POST /admin/providers - Add new provider
PATCH /admin/providers/:id - Update provider (weight, enable/disable)
DELETE /admin/providers/:id - Remove provider
POST /admin/providers/:id/enable - Force enable provider
POST /admin/providers/:id/disable - Force disable provider
GET /admin/cache/stats - View cache statistics
DELETE /admin/cache - Clear cache (all/provider/pattern)

Authentication: Admin endpoints use HTTP Basic Authentication (not JWT).

# Example: View all providers
curl -u admin:changeme http://localhost:8081/admin/providers

# Example: JSON-RPC request
curl -X POST http://localhost:8080 \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "eth_blockNumber",
    "params": [],
    "id": 1
  }'

→ See API-REFERENCE.md for complete documentation with request/response examples.

🚀 Local Development

Prerequisites

Node.js 18+ (Download)
Docker & Docker Compose (Download)
RPC Provider API Keys: Infura, Alchemy, or any Ethereum JSON-RPC provider (optional - public endpoints available)

Backend Setup

Clone the repository

git clone https://github.com/HarshitPG/Ethereum-RPC-Load-Balancer.git
cd Ethereum-RPC-Load-Balancer

Install backend dependencies
```
npm install
```

Configure environment variables

cp .env.example .env

Edit .env with your configuration. The project includes working public RPC endpoints, or you can add your own API keys:

# Public endpoints (works out of the box)
INFURA_URL=https://eth.llamarpc.com
ALCHEMY_URL=https://ethereum.publicnode.com

# Or use your own keys
# INFURA_URL=https://mainnet.infura.io/v3/YOUR_API_KEY
# ALCHEMY_URL=https://eth-mainnet.g.alchemy.com/v2/YOUR_API_KEY

Start all services (Redis, Prometheus, Grafana, Loki, Tempo)
```
docker-compose up -d
```
This spins up:
- Redis (Port 6379)
- Prometheus (Port 9090)
- Grafana (Port 3001 - username: admin, password: admin)
- Loki (Port 3100)
- Tempo (Port 3200)
Start the backend application
```
# Development mode (hot reload)
npm run dev

# Production mode
npm run build
npm start
```
Backend runs on:
- Public API: http://localhost:8080
- Admin API: http://localhost:8081

Test the backend API

# Health check
curl http://localhost:8080/health

# JSON-RPC request
curl -X POST http://localhost:8080 \
  -H "Content-Type: application/json" \
  -d '{
    "jsonrpc": "2.0",
    "method": "eth_blockNumber",
    "params": [],
    "id": 1
  }'

# View providers
curl http://localhost:8080/providers

# Admin API (basic auth required)
curl http://localhost:8081/admin/providers \
  -u admin:changeme

Frontend Setup (Operator Dashboard)

Navigate to dashboard directory
```
cd dashboard
```
Install frontend dependencies
```
npm install
```
Configure API endpoint (if needed)

Edit dashboard/.env or dashboard/src/config.ts to point to your backend:
```
VITE_API_URL=http://localhost:8080
VITE_ADMIN_API_URL=http://localhost:8081
```
Start the frontend development server
```
npm run dev
```
Dashboard runs on: http://localhost:5173
Build for production
```
npm run build
```
Output will be in dashboard/dist/

Access Services

Backend API: http://localhost:8080
Admin API: http://localhost:8081 (Basic Auth: admin:changeme)
Frontend Dashboard: http://localhost:5173
Grafana: http://localhost:3001 (admin:admin)
Prometheus: http://localhost:9090
Redis: localhost:6379

🐳 Docker Compose Stack

To stop all services:

docker-compose down

To view logs:

# All services
docker-compose logs -f

# Specific service
docker-compose logs -f redis
docker-compose logs -f prometheus
docker-compose logs -f grafana

To reset data (clear cache, metrics):

docker-compose down -v
rm -rf data/

Grafana Dashboards

Pre-configured dashboards are automatically provisioned:

RPC Load Balancer Overview
- Total requests, cache hit rate, error rate
- Provider health status
- Request distribution
Provider Analytics
- Per-provider request count, latency, errors
- Circuit breaker state transitions
- Success/failure rates
Cache Performance
- Hit/miss rates by method
- TTL distribution
- Memory usage

🧪 Testing

Unit & Integration Tests

# Run all tests
npm test

# Run specific test file
npm test test/1-core-functionality.test.ts

# Run with coverage
npm test -- --coverage

Test Suites

Core Functionality (1-core-functionality.test.ts)
- Provider registration, routing strategies, request handling
Health Monitoring (2-health-monitoring.test.ts)
- Health checks, circuit breakers, auto-failover
Caching Layer (3-caching-layer.test.ts)
- Cache hit/miss, TTL strategy, invalidation
Observability (4-observability.test.ts)
- Logging, metrics, correlation IDs
Alerting (5-alerting.test.ts)
- Email alerts, alert conditions
Circuit Breaker Integration (7-circuit-breaker-integration.test.ts)
- Failure detection, state transitions
Performance & Load Testing (8-performance-load-testing.test.ts)
- Throughput, latency under load

Load Testing

# Minimal load test (30 seconds)
bash scripts/load-test-minimal.sh

# Custom load test with autocannon
npx autocannon -c 50 -d 60 -m POST \
  -H "Content-Type: application/json" \
  -b '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8080

Expected Results:

Throughput: 500-1000 req/s (cached), 100-200 req/s (uncached)
Latency (p95): <100ms (cached), <500ms (uncached)
Error Rate: <1%

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
dashboard		dashboard
docker		docker
scripts		scripts
src		src
test		test
.env.example		.env.example
.gitignore		.gitignore
API-REFERENCE.md		API-REFERENCE.md
README.md		README.md
REQUIREMENTS-VERIFICATION.md		REQUIREMENTS-VERIFICATION.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

HarshitPG/Ethereum-RPC-Load-Balancer

Folders and files

Latest commit

History

Repository files navigation