Hanzo Gateway

High-performance API gateway for Hanzo AI services. Routes 147+ API endpoints across production clusters with rate limiting, authentication forwarding, CORS, circuit breakers, and telemetry -- all driven by declarative JSON configuration.

Overview

Hanzo Gateway is the unified API entry point for all Hanzo platform traffic. It sits behind Hanzo Ingress (L7 reverse proxy) and routes requests to internal services with per-endpoint rate limiting, header forwarding, and circuit breaker protection.

Cluster	Domain	Endpoints	Rate Limit (global)	Rate Limit (per IP)
hanzo-k8s	`api.hanzo.ai`	133	5,000 req/s	100 req/s

The gateway can also be deployed independently by other organizations with their own configuration.

For full documentation, see docs.hanzo.ai/docs/services/gateway.

Architecture

                    Internet
                       |
              +--------+--------+
              |  Cloudflare CDN |
              +--------+--------+
                       |
              +--------+---------+
              | Hanzo Ingress    |
              | (L7 TLS/routing) |
              +--------+---------+
                       |
              +--------+---------+
              | Hanzo Gateway    |
              | 133 endpoints    |
              +---+----+----+----+
                  |    |    |
               Cloud  IAM  Commerce
               API         API

API Endpoints

OpenAI-Compatible LLM Routes (`api.hanzo.ai`)

These endpoints are fully compatible with the OpenAI API format. Point any OpenAI SDK client at https://api.hanzo.ai and it works out of the box.

Method	Path	Backend	Description
`POST`	`/v1/chat/completions`	cloud-api:8000	Chat completions (streaming and non-streaming)
`POST`	`/v1/completions`	cloud-api:8000	Text completions
`POST`	`/v1/messages`	cloud-api:8000	Anthropic Messages API compatibility
`GET`	`/v1/models`	cloud-api:8000	List available models
`POST`	`/v1/embeddings`	cloud-api:8000	Text embedding generation
`POST`	`/v1/images/generations`	cloud-api:8000	Image generation
`POST`	`/v1/audio/transcriptions`	cloud-api:8000	Audio transcription (Whisper)
`POST`	`/v1/audio/speech`	cloud-api:8000	Text-to-speech synthesis
`POST`	`/v1/zap`	cloud-api:8000	Hanzo Zap (structured extraction)
`POST`	`/v1/async-invoke`	cloud-api:8000	Async inference (long-running jobs)
`GET`	`/v1/async-invoke/{id}/status`	cloud-api:8000	Poll async job status
`GET`	`/v1/async-invoke/{id}`	cloud-api:8000	Retrieve async job result

Platform Service Routes (`api.hanzo.ai`)

All platform routes are available at both /{service}/* and /v1/{service}/*.

Path prefix	Backend	Description
`/auth/*`	iam:8000	IAM, authentication, OAuth
`/cloud/*`	cloud-api:8000	Cloud API (projects, deployments)
`/commerce/*`	commerce:8001	Commerce (orders, payments, products)
`/analytics/*`	analytics	Unified analytics and events
`/billing/*`	billing	Usage metering and invoicing
`/console/*`	console	Admin console API
`/agents/*`	agents	Agent orchestration
`/search/*`	search	AI-powered search
`/vector/*`	vector	Vector database operations
`/operative/*`	operative	Computer-use automation
`/bot/*`	bot	Bot framework (REST + WebSocket)
`/kms/*`	kms	Key management service
`/platform/*`	platform	PaaS deployment API
`/functions/*`	functions	Serverless functions
`/web3/*`	web3	Web3 and blockchain APIs
`/pricing/*`	pricing	Model pricing and rate cards
`/pricing/model/{name}`	pricing	Single model price lookup

Monitoring Endpoints

Path	Description
`/__health`	Gateway health check (port 8080)
`/health`	Application health check
`/pubsub/healthz`	PubSub health
`/pubsub/varz`	PubSub variables / metrics
`/pubsub/connz`	PubSub connections
`/pubsub/subsz`	PubSub subscriptions
`/pubsub/jsz`	PubSub JetStream

Model Routing

Hanzo Gateway proxies all LLM requests through the Hanzo Cloud API (cloud-api), which handles model routing, load balancing, and provider selection. The gateway itself is provider-agnostic -- it forwards authenticated requests and streams responses back to the client.

How It Works

Client                Gateway              Cloud API            Provider
  |                      |                     |                    |
  |-- POST /v1/chat ---->|                     |                    |
  |   model: "zen4"      |-- forward --------->|                    |
  |                      |                     |-- route to tier -->|
  |                      |                     |   (Fireworks)      |
  |<---- streaming ------|----- streaming -----|<--- streaming -----|

The client sends a request to api.hanzo.ai/v1/chat/completions with a model field.
The gateway forwards the request (with all auth headers) to the Cloud API backend.
The Cloud API resolves the model name to a provider and endpoint based on the model's tier and availability.
Responses stream back through the gateway to the client with no buffering.

Model Tiers

Tier	Models (examples)	Provider	Notes
Free	`zen3-nano`, `zen4-mini`	Hanzo DO cluster	Best-effort, rate-limited
Standard	`zen4-pro`, `zen3-vl`, `zen4-coder-flash`	Fireworks, Together	Low latency, high availability
Premium	`zen4`, `zen4-max`, `zen4-ultra`	Fireworks	Dedicated capacity, highest throughput
Third-party	`gpt-4o`, `claude-sonnet-4-20250514`, `gemini-2.5-pro`	OpenAI, Anthropic, Google	Pass-through with unified billing

The gateway does not need to know about model tiers -- it passes all requests to the Cloud API, which handles routing logic, fallback, and retries. Model availability is returned by GET /v1/models.

Authentication

All requests to api.hanzo.ai require a valid API key. Keys are issued through the Hanzo Console and scoped to a project.

API Key Authentication

Pass your API key in the Authorization header using the Bearer scheme:

curl https://api.hanzo.ai/v1/chat/completions \
  -H "Authorization: Bearer hk_live_..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zen4-pro",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Auth Flow

Client --> Gateway --> Cloud API --> IAM (hanzo.id)
                                        |
                                    Validate key
                                    Resolve org/project
                                    Check rate limits
                                    Return user context

The gateway validates bearer JWTs against IAM (hanzo.id) using JWKS and re-emits the canonical 3 identity headers. Opaque API keys (hk-, sk-, fw_, hz_, pk-) pass through to the backend services for validation.

Header Forwarding

After JWT validation, the gateway emits exactly three canonical identity headers to downstream services and strips every other vendor/legacy variant on ingress:

X-User-Id -- user ID from JWT sub claim
X-Org-Id -- org slug from JWT owner claim
X-Roles -- comma-joined role names from JWT roles claim

Auxiliary headers emitted by the gateway (derivatives of the JWT):

X-User-Email -- email from JWT email claim
X-Phone-Number -- phone from JWT phone_number/phone claim
X-User-IsAdmin -- "true" when the JWT asserts isAdmin

Standard passthrough headers:

Authorization -- Bearer token or API key
Content-Type -- Request body encoding
Accept -- Response format preference
X-Request-ID -- Client-provided request tracing ID

Headers stripped unconditionally on ingress (never trusted from clients): X-User-Id, X-Org-Id, X-Roles, X-User-Email, X-Phone-Number, X-User-IsAdmin, X-User-Role, X-User-Roles, X-User-Name, X-Tenant-Id, X-Org, and every X-IAM-* / X-HANZO-* variant.

Rate Limiting

The gateway enforces rate limits at two levels: global (across all clients) and per-client (by IP address).

Global Configuration

{
  "extra_config": {
    "qos/ratelimit/router": {
      "max_rate": 5000,
      "client_max_rate": 100,
      "strategy": "ip"
    }
  }
}

Parameter	Description	Default
`max_rate`	Total requests/second across all clients	5,000
`client_max_rate`	Requests/second per client IP	100
`strategy`	Client identification method	`ip`

Per-Endpoint Overrides

Individual endpoints can override the global limits. This is useful for high-traffic inference routes or sensitive administrative endpoints:

{
  "endpoint": "/v1/chat/completions",
  "method": "POST",
  "extra_config": {
    "qos/ratelimit/router": {
      "max_rate": 10000,
      "client_max_rate": 50,
      "strategy": "ip",
      "every": "1s"
    }
  }
}

The every field sets the time window for the rate counter. Default is "1s" (per second). Set to "1m" for per-minute limits.

Rate Limit Responses

When a client exceeds their rate limit, the gateway returns:

HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{"message": "rate limit exceeded"}

Observability

Logging

Structured logging is enabled by default with the [GATEWAY] prefix:

{
  "extra_config": {
    "telemetry/logging": {
      "level": "INFO",
      "prefix": "[GATEWAY]",
      "syslog": false,
      "stdout": true
    }
  }
}

Log levels: DEBUG, INFO, WARNING, ERROR, CRITICAL.

Health Check

# Gateway health (always returns 200 when the process is up)
curl http://localhost:8080/__health

# Application health (checks backend connectivity)
curl https://api.hanzo.ai/health

Metrics

The gateway exposes Prometheus-compatible metrics for scraping. Key metrics include:

Request count by endpoint and status code
Response latency histograms
Backend connection pool utilization
Circuit breaker state transitions
Rate limiter rejection counts

Circuit Breakers

Backend failures are automatically isolated. When a backend exceeds the error threshold, the circuit opens and requests are rejected immediately until the backend recovers. This prevents cascade failures across services.

Quick Start

Build from Source

# Build gateway binary
make build

# Build ingress sidecar binary
make build-ingress

# Run tests
make test

# Validate all configs
make validate

Run Locally

# Run with default config
./gateway run -c configs/hanzo/gateway.json

Docker

# Pull and run the latest image
docker run -p 8080:8080 ghcr.io/hanzoai/gateway:latest

# Build from source
make docker

Docker Compose

services:
  gateway:
    image: ghcr.io/hanzoai/gateway:latest
    ports:
      - "8080:8080"
    volumes:
      - ./configs/hanzo/gateway.json:/etc/gateway/gateway.json:ro
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:8080/__health"]
      interval: 10s
      timeout: 3s
      retries: 3
    restart: unless-stopped

Save as compose.yml and run:

docker compose up -d

Production Deployment

Hanzo Gateway runs on the hanzo-k8s DOKS cluster (do-sfo3-hanzo-k8s) in the hanzo namespace. Continuous deployment is handled by GitHub Actions -- every push to main builds a new image, applies the ConfigMap, and performs a rolling restart.

Deploy

# Apply config and restart pods
make deploy-hanzo

# Check status
make status

# Tail logs
make logs-hanzo

Infrastructure Details

Property	Value
Image	`ghcr.io/hanzoai/gateway:latest`
Replicas	2
Service type	ClusterIP (behind Ingress)
Namespace	`hanzo`
K8s context	`do-sfo3-hanzo-k8s`
Health check	`GET /__health` :8080
CI/CD	GitHub Actions (deploy.yml)

K8s Manifests

k8s/
  hanzo/
    deployment.yaml     # Gateway deployment (2 replicas)
    service.yaml        # ClusterIP service
    ingress.yaml        # Ingress resource for api.hanzo.ai

Configuration

All routing is defined in JSON configuration files. Each cluster has its own config.

Editing Routes

Edit the config file:
```
$EDITOR configs/hanzo/gateway.json
```
Validate the config:
```
make validate
```
Deploy:
```
make deploy-hanzo
```

The Makefile creates a ConfigMap from the JSON file and triggers a rolling restart.

Config Structure

{
  "version": 3,
  "name": "Hanzo API Gateway",
  "port": 8080,
  "timeout": "120s",
  "extra_config": {
    "router": {
      "return_error_msg": true
    },
    "qos/ratelimit/router": {
      "max_rate": 5000,
      "client_max_rate": 100,
      "strategy": "ip"
    },
    "telemetry/logging": {
      "level": "INFO",
      "prefix": "[GATEWAY]",
      "stdout": true
    }
  },
  "endpoints": [
    {
      "endpoint": "/v1/chat/completions",
      "method": "POST",
      "input_headers": ["*"],
      "output_encoding": "no-op",
      "backend": [{
        "url_pattern": "/api/chat/completions",
        "host": ["http://cloud-api.hanzo.svc.cluster.local:8000"],
        "encoding": "no-op"
      }]
    }
  ]
}

Repository Structure

configs/
  hanzo/
    gateway.json        # Hanzo API Gateway config (133 endpoints)
    ingress.json        # Hanzo Ingress sidecar config
k8s/
  hanzo/                # K8s manifests for hanzo-k8s cluster
cmd/
  gateway/              # Gateway binary entry point
  ingress/              # Ingress sidecar binary entry point
tests/                  # Integration tests
Dockerfile              # Multi-stage build (Go 1.25 + Alpine 3.23)
Makefile                # Build, test, validate, deploy commands

DNS

Domain	Path	Target
`*.hanzo.ai`	Cloudflare	hanzo-k8s LB -> Ingress -> Gateway

Related Projects

Hanzo Gateway is one of four products in the Hanzo AI infrastructure stack:

Product	Role	Repository
Hanzo Ingress	L7 reverse proxy, TLS termination, load balancing	`hanzoai/ingress`
Hanzo Gateway	API gateway, rate limiting, endpoint routing	`hanzoai/gateway`
Hanzo Engine	GPU inference engine, model serving	`hanzoai/engine`
Hanzo Edge	On-device inference runtime (mobile, web, embedded)	`hanzoai/edge`

Internet -> Ingress (TLS/L7) -> Gateway (API routing) -> Engine (inference) / Cloud API / Services
                                                          Edge (on-device, client-side)

License

MIT -- see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 152 Commits
.github/workflows		.github/workflows
builder		builder
cmd		cmd
configs		configs
k8s		k8s
tests		tests
.dockerignore		.dockerignore
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
LLM.md		LLM.md
Makefile		Makefile
PLUGIN_ARCHITECTURE.md		PLUGIN_ARCHITECTURE.md
README.md		README.md
SECURITY.md		SECURITY.md
auth_middleware.go		auth_middleware.go
auth_middleware_security_test.go		auth_middleware_security_test.go
auth_middleware_test.go		auth_middleware_test.go
backend_factory.go		backend_factory.go
base_ha_backend.go		base_ha_backend.go
base_ha_backend_test.go		base_ha_backend_test.go
base_ha_integration_test.go		base_ha_integration_test.go
base_network_backend.go		base_network_backend.go
base_network_backend_test.go		base_network_backend_test.go
deps.sh		deps.sh
encoding.go		encoding.go
executor.go		executor.go
find_glibc.sh		find_glibc.sh
go.mod		go.mod
go.sum		go.sum
handler_factory.go		handler_factory.go
plugin.go		plugin.go
proxy_factory.go		proxy_factory.go
rebrand.go		rebrand.go
router_engine.go		router_engine.go
routes.yaml		routes.yaml
sd.go		sd.go
widget_security.go		widget_security.go
widget_security_test.go		widget_security_test.go
zap_backend.go		zap_backend.go
zap_listener.go		zap_listener.go

Folders and files

Latest commit

History

Repository files navigation

Hanzo Gateway

Overview

Architecture

API Endpoints

OpenAI-Compatible LLM Routes (api.hanzo.ai)

Platform Service Routes (api.hanzo.ai)

Monitoring Endpoints

Model Routing

How It Works

Model Tiers

Authentication

API Key Authentication

Auth Flow

Header Forwarding

Rate Limiting

Global Configuration

Per-Endpoint Overrides

Rate Limit Responses

Observability

Logging

Health Check

Metrics

Circuit Breakers

Quick Start

Build from Source

Run Locally

Docker

Docker Compose

Production Deployment

Deploy

Infrastructure Details

K8s Manifests

Configuration

Editing Routes

Config Structure

Repository Structure

DNS

Related Projects

License

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

OpenAI-Compatible LLM Routes (`api.hanzo.ai`)

Platform Service Routes (`api.hanzo.ai`)

Packages