Skip to content

GouBuliya/TokenRouter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

62 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

TokenRouter

LLM API Gateway with Intelligent Cache Optimization

Go Version License Tests Coverage GitHub Stars GitHub Forks GitHub Issues GitHub Pull Requests Release Last Commit

TokenRouter Banner


🎯 Why TokenRouter?

LLM providers charge 10x more for cache misses vs cache hits. TokenRouter transforms your LLM infrastructure:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Client A   │────▢│                                                         │────▢│   DeepSeek  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€     β”‚   TokenRouter Gateway                                    β”‚     β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Client B   │────▢│   Cache Optimization β€’ Deduplication β€’ Cost Tracking    │────▢│   OpenAI    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€     β”‚                                                         β”‚     β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚   Client C   │────▢│                                                         │────▢│  Anthropic  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Problem TokenRouter Solution Impact
Low cache hit rate (<30%) Structural convergence via Chunker + Arranger + Canonicalizer Cache hits >70%
Inconsistent tool ordering Alphabetical normalization for cross-user cache sharing Cross-user cache sharing
Duplicate concurrent requests In-memory deduplication (zero upstream calls) Eliminate redundant calls
No cost visibility Real-time Prometheus metrics (cache savings, dedup savings) Track every dollar saved

Result: Cache hit rates >70%, cost reduction up to 90%


πŸ“Š Performance Metrics

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    TokenRouter Performance Dashboard                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                          β”‚
β”‚  Throughput        P99 Latency      Cache Hit Rate      Cost Savings    β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”‚
β”‚  β”‚ 10,000   β”‚     β”‚  <50ms   β”‚     β”‚   >70%   β”‚       β”‚  Up to   β”‚     β”‚
β”‚  β”‚  req/s   β”‚     β”‚          β”‚     β”‚          β”‚       β”‚   90%    β”‚     β”‚
β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β”‚
β”‚                                                                          β”‚
β”‚  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 95%   β”‚
β”‚                                                                          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Based on load testing with 10,000 concurrent requests:

Metric Value Baseline Improvement
Throughput 10,000 req/s 1,000 req/s 10x
P99 Latency <50ms 200ms 75%↓
Cache Hit Rate >70% <30% 2.3x
Cost Savings Up to 90% 0% 90%↓
Dedup Rate >5% 0% New

Star History Chart


πŸ— Architecture

Every incoming request flows through this pipeline:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”
β”‚Inbound  │──▢│Chunker  │──▢│Arranger  │──▢│Canonicalizer  │──▢│CacheInjector│──▢│Hasher │──▢│Dedup │──▢│Outbound │──▢│Proxy  β”‚
β”‚Adapter  β”‚   β”‚         β”‚   β”‚          β”‚   β”‚               β”‚   β”‚             β”‚   β”‚       β”‚   β”‚      β”‚   β”‚Adapter  β”‚   β”‚       β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”˜
     β”‚              β”‚              β”‚                β”‚                  β”‚                β”‚           β”‚            β”‚
     β”‚              β”‚              β”‚                β”‚                  β”‚                β”‚           β”‚            β”‚
  Parse to      Split into    Order blocks:    Deterministic     Inject vendor-   Compute     Check     Build      Forward
  Envelope      Block types   System→Tool→     JSON serialization  specific cache  hashes     for       vendor-    to upstream
                              History→Query                         directives                 duplicates specific  format

Core Components

Component Function Impact Performance
Chunker Splits messages into System/Tool/History/Query blocks Structured processing <1ms
Arranger Orders blocks: System β†’ Tool (sorted) β†’ History β†’ Query Cache prefix alignment <1ms
Canonicalizer Deterministic JSON serialization Byte-perfect hash stability <2ms
CacheInjector Vendor-specific cache directives Maximize vendor KV cache <1ms
Hasher PrefixHash (cache) + FullHash (dedup) Intelligent routing <1ms
Dedup In-flight request deduplication Zero redundant calls <1ms

Total Pipeline Overhead: <10ms (P99)


πŸ“ˆ Comparison

Feature Comparison

Feature TokenRouter Cloudflare AI Gateway LiteLLM
KV Cache Optimization βœ… Structural convergence ❌ Passthrough only ❌ Passthrough only
Request Deduplication βœ… In-memory ❌ No ❌ No
Tool Normalization βœ… Alphabetical sort ❌ No ❌ No
Cost Tracking βœ… Real-time Prometheus ⚠️ Paid feature ⚠️ Basic
Open Source βœ… Full ❌ Proprietary βœ… Full
Self-Hosted βœ… Yes ❌ Cloud only βœ… Yes
Streaming Support βœ… Full βœ… Limited βœ… Full
Multi-Provider βœ… DeepSeek/OpenAI/Anthropic βœ… Multiple βœ… Multiple

Cost Comparison (1M tokens)

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    Cost per 1M Tokens (USD)                      β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                                 β”‚
β”‚  Direct API Call    β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ $1.00   β”‚
β”‚  (no optimization)  β”‚                                β”‚         β”‚
β”‚                     β”‚                                β”‚         β”‚
β”‚  With TokenRouter   β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                           β”‚ $0.10   β”‚
β”‚  (70% cache hit)    β”‚                                β”‚         β”‚
β”‚                     β”‚                                β”‚         β”‚
β”‚  Savings            β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ    β”‚ 90% ↓   β”‚
β”‚                     β”‚                                β”‚         β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸš€ Quick Start

Docker (Recommended)

# Clone repository
git clone https://github.com/GouBuliya/TokenRouter.git
cd TokenRouter/deployments

# Start all services
docker compose up -d

# View logs
docker compose logs -f

Access:

Source Build

# Clone repository
git clone https://github.com/GouBuliya/TokenRouter.git
cd TokenRouter

# Build
make build

# Run tests
make test

# Run locally (requires Postgres and Redis)
cp .env.example .env
# Edit .env with your API keys
make dev

πŸ’‘ Usage Examples

1. Create API Key

curl -X POST http://localhost:8080/admin/api-keys \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-key",
    "quota_usd": 100
  }'

Response:

{
  "id": "uuid-here",
  "key": "sk-tr-abc123...",
  "quota_usd": 100
}

⚠️ Save the key immediately - it's only shown once!

2. Chat Completion

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-tr-abc123..." \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "Hello, how are you?"}
    ]
  }'

3. With Tools

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-tr-abc123..." \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [
      {"role": "user", "content": "What is the weather in Beijing?"}
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_weather",
          "parameters": {
            "type": "object",
            "properties": {
              "city": {"type": "string"}
            },
            "required": ["city"]
          }
        }
      }
    ]
  }'

πŸ”§ Configuration

Environment Variables

Variable Description Default Required
PORT HTTP server port 8080 ❌
DATABASE_URL Postgres connection string - βœ…
REDIS_URL Redis connection string - βœ…
DEEPSEEK_API_KEY DeepSeek API key - βœ…
CACHE_INJECT_ENABLED Enable cache injection true ❌
DEDUP_ENABLED Enable request deduplication true ❌
TOOL_SORT_ENABLED Enable tool alphabetical sorting true ❌
DEDUP_TTL Deduplication TTL 2m ❌
LOG_LEVEL Log level (debug/info/warn/error) info ❌

See .env.example for full list.

Configuration Templates

Development Environment
PORT=8080
LOG_LEVEL=debug
DATABASE_URL=postgres://tokenrouter:tokenrouter@localhost:5432/tokenrouter?sslmode=disable
REDIS_URL=redis://localhost:6379/0
DEEPSEEK_API_KEY=sk-xxx
DEDUP_ENABLED=true
CACHE_INJECT_ENABLED=true
RATE_LIMIT_ENABLED=false  # Disable for development
Production Environment (Small Scale)
PORT=8080
LOG_LEVEL=warn
DATABASE_URL=postgres://user:pass@db.example.com:5432/tokenrouter?sslmode=require
REDIS_URL=redis://redis.example.com:6379/0
DEEPSEEK_API_KEY=sk-xxx
DB_MAX_OPEN_CONNS=50
DB_MAX_IDLE_CONNS=10
DB_CONN_MAX_LIFETIME=30m
AUTH_CACHE_TTL=5m
Production Environment (High Concurrency)
PORT=8080
LOG_LEVEL=error
DATABASE_URL=postgres://user:pass@db.example.com:5432/tokenrouter?sslmode=require
REDIS_URL=redis://redis-cluster.example.com:6379/0

# High concurrency settings
GLOBAL_CONCURRENT_LIMIT=10000
STREAM_CONCURRENT_LIMIT=6000
NON_STREAM_CONCURRENT_LIMIT=4000
PROVIDER_CONCURRENT_LIMIT=1000

DB_MAX_OPEN_CONNS=100
DB_MAX_IDLE_CONNS=25
DB_CONN_MAX_LIFETIME=1h

# Connection pool optimization
PROXY_MAX_IDLE_CONNS=10000
PROXY_MAX_IDLE_CONNS_PER_HOST=1000
PROXY_MAX_CONNS_PER_HOST=10000
PROXY_IDLE_CONN_TIMEOUT=90s

πŸ“š Documentation

Getting Started

Architecture

API Reference

Development


🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Workflow

# Fork and clone
git clone https://github.com/YOUR_USERNAME/TokenRouter.git
cd TokenRouter

# Create branch
git checkout -b feature/your-feature

# Make changes and test
make test
make lint

# Commit and push
git commit -am "feat: add your feature"
git push origin feature/your-feature

# Open Pull Request

Good First Issues

Look for issues labeled good first issue to get started.

Contributors


πŸ“„ License

This project is licensed under the Apache License 2.0.


πŸ™ Acknowledgments


πŸ“¬ Contact


Made with ❀️ for the AI community

⬆️ Back to top | πŸ“– Documentation | 🀝 Contributing

Star this repo Fork this repo Follow us

About

πŸš€ LLM API Gateway with Intelligent Cache Optimization. Reduce LLM API costs by up to 90% through structural request optimization and vendor KV cache maximization. Built with Go.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages