Nenya AI Gateway

A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.

Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API. For 23 providers we ship built-in adapters with specialized handling.

How Nenya handles the requests

+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.)    |
| OpenAI-compatible request                    |
| POST /v1/chat/completions + Bearer token     |
| or                                           |
| Anthropic Messages API request               |
| POST /v1/messages + x-api-key                |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Nenya Gateway                                |
| - auth check + RBAC enforcement              |
| - parse JSON + extract model                 |
| - resolve agent/provider                     |
| - optional cache (HIT => replay SSE)         |
| - optional MCP context/tool injection        |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Interceptor Chain (pluggable, best-effort)   |
| - RedactInterceptor  (regex patterns)        |
| - EntropyInterceptor (high-entropy strings)  |
| - TFIDFInterceptor   (relevance scoring)     |
| - BouncerInterceptor (engine summarization)  |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Token Budget Trimming (if payload > hard     |
| limit) drops oldest non-system messages and  |
| applies token-aware middle-out truncation    |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Routing                                      |
|  A) Standard forwarding                      |
|     - fallback chain + circuit breaker + RL  |
|  B) MCP multi-turn tool loop (if enabled)    |
|     - buffer SSE, execute MCP tools, re-send |
|  C) Context-limit retry                      |
|     - on upstream 413/context_exceeded,      |
|       summarize payload, retry with fallback |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Upstream LLM Providers                       |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
                        |
                        |  SSE stream
                        v
+----------------------------------------------+
| Nenya SSE Pipeline                           |
| - adapter response transforms                |
| - (optional) OpenAI→Anthropic conversion     |
| - usage accounting + stream filter           |
| - flush + (optional) cache capture           |
| - (optional) MCP auto-save                   |
+----------------------------------------------+
                        |
                        v
+----------------------------------------------+
| Client receives transparent SSE output       |
+----------------------------------------------+

Flow notes:

/v1/* endpoints require client bearer auth; /healthz, /statsz, /metrics do not.
Pipeline failures degrade gracefully and forward the request instead of returning a 500.
MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client.

Features

Routing & Agents

Config-driven provider registry — add providers via JSON, zero code changes
23 built-in providers with specialized adapters for wire format differences
Dynamic model discovery — fetches live model catalogs from providers at startup and on reload
Model registry — reference models by string shorthand with automatic provider/context resolution
Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
Three-tier model resolution — config overrides > discovered models > static registry
Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model's format attribute
Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd
Per-agent system prompts — inline or file-based

Security & Privacy

Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
3-Tier content pipeline — pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization
Context window compaction — sliding window summarization with configurable engine
Stale tool call pruning — compact old assistant+tool response pairs to save tokens
Thought pruning — strip reasoning blocks from assistant message history
Input validation — strict body limits, JSON sanitization, header filtering
Graceful degradation — never blocks requests due to engine or pipeline failures
Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions
Secure memory — mlock-protected token storage, read-only sealing, core dump prevention

Hardening (Deployment Security)

Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled
Non-root execution — runs as UID 65532 with dropped capabilities
Memory protection — LimitMEMLOCK=infinity and LimitCORE=0 in systemd
Read-only filesystem — immutable root + private /tmp
Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
Socket activation — seamless restarts with zero dropped connections

Reliability

Zero external dependencies — Go standard library only
Hot reload — systemctl reload nenya for zero-downtime config changes
Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification
Rate limiting — per upstream host (RPM/TPM) with per-provider overrides
Response cache — in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search
Graceful shutdown — 5s grace period for in-flight requests, MCP client cleanup
Context-limit auto-retry — upstream context-length errors trigger summarization and retry
Local engine lifecycle — pre-load and manage local Ollama models with LRU eviction
Structured errors — all error responses include error_kind field for programmatic diagnostics

MCP Tool Integration

Tool discovery — connect to MCP servers for automatic tool injection
Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
Auto-search — pre-fetch relevant context from MCP servers before forwarding
Auto-save — persist assistant responses to MCP memory servers

Quick Start

Run with Podman

Create minimal config and secrets:

mkdir -p config secrets
cat > config/config.json << 'EOF'
{
  "server": { "listen_addr": ":8080" },
  "agents": {
    "default": {
      "strategy": "fallback",
      "models": ["gemini-2.5-flash"]
    }
  }
}
EOF

cat > secrets/provider_keys.json << 'EOF'
{
  "provider_keys": {
    "gemini": "AIza..."
  }
}
EOF

cat > secrets/client.json << 'EOF'
{
  "client_token": "nk-$(openssl rand -hex 32)"
}
EOF

Run the container:

podman run -d \
  --name nenya \
  -p 8080:8080 \
  -v ./config:/etc/nenya:ro \
  -v ./secrets:/run/secrets/nenya:ro \
  -e NENYA_SECRETS_DIR=/run/secrets/nenya \
  --cap-drop=ALL \
  --cap-add=IPC_LOCK \
  --security-opt=no-new-privileges:true \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64M \
  ghcr.io/gumieri/nenya:latest

Test it:

curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
  http://localhost:8080/healthz

Or Install via Package Manager

Nenya provides native packages for major Linux distributions and community package managers:

Distribution	Command
Debian/Ubuntu (.deb)	Download `nenya_<version>_linux_amd64.deb` from the release page and run `sudo dpkg -i`
Fedora/RHEL (.rpm)	Download `nenya-<version>.x86_64.rpm` from the release page and run `sudo rpm -i`
Arch Linux (.pkg.tar.zst)	Download `nenya-<version>-x86_64.pkg.tar.zst` from the release page and run `sudo pacman -U`
Arch Linux (AUR)	`yay -S nenya-bin` (or your preferred AUR helper)
Nix/NixOS	Add `gumieri/nur-packages` to your NUR registry and use `nenya`

All packages install the binary to /usr/bin/nenya and include systemd service and socket units. After install, enable and start:

sudo systemctl enable --now nenya.socket
sudo systemctl enable --now nenya.service

Runtime Configuration

Nenya supports standard environment variables for deployment portability:

Variable	Default	Description
`PORT`	`8080`	Listening port (overrides `server.listen_addr`)
`HOST`	—	Optional bind address (e.g. `127.0.0.1`). Only used when combined with `PORT`
`NENYA_CONFIG_DIR`	`/etc/nenya/`	Configuration directory path
`NENYA_CONFIG_FILE`	—	Single config file path (takes precedence over `NENYA_CONFIG_DIR`)
`NENYA_SECRETS_DIR`	—	Secrets directory (overrides `CREDENTIALS_DIRECTORY`)

Example usage:

PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.json

Or in Docker:

docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest

Or Choose Your Deployment

Deploy Bare Metal (systemd) — Direct binary install, socket activation, hot reload
Deploy Container (Podman/Docker Compose) — compose.yml, image verification, security hardening
Deploy Kubernetes (Helm) — Helm chart, ConfigMap/Secret, ingress setup

API Endpoints

All /v1/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>. API keys support RBAC enforcement — agent scoping, endpoint allowlists, role-based permissions (admin bypasses all checks).

Endpoint	Auth	Description
`POST /v1/chat/completions`	Bearer + RBAC	OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn
`POST /v1/messages`	Bearer + RBAC	Anthropic Messages API with bidirectional format conversion
`GET /v1/models`	Bearer + RBAC	Live model catalog from discovered providers + static registry (context window, max tokens)
`POST /v1/embeddings`	Bearer + RBAC	Passthrough proxy
`POST /v1/responses`	Bearer + RBAC	Passthrough proxy
`POST /v1/images/generations`	Bearer + RBAC	Image generation (OpenAI-compatible)
`POST /v1/audio/transcriptions`	Bearer + RBAC	Audio transcription (Whisper-compatible, multipart support)
`POST /v1/audio/speech`	Bearer + RBAC	Text-to-speech synthesis (OpenAI-compatible)
`POST /v1/moderations`	Bearer + RBAC	Content moderation (OpenAI-compatible)
`POST /v1/rerank`	Bearer + RBAC	Re-ranking API (Cohere/Jina/Voyage-compatible)
`POST /v1/a2a`	Bearer + RBAC	Agent-to-Agent protocol (Google A2A)
`GET /v1/files`	Bearer + RBAC	File listing, upload, retrieval, deletion
`POST /v1/batches`	Bearer + RBAC	Batch API operations
`POST /proxy/{provider}/*`	Bearer + RBAC	Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming)
`GET /healthz`	None	Engine health probe
`GET /statsz`	None	Token usage, circuit breaker state, MCP server status
`GET /metrics`	None	Prometheus-compatible metrics
`GET /debug/pprof/*`	Bearer	Go profiling endpoints (disabled by default, see `debug.pprof_enabled`)

See docs/PASSTHROUGH_PROXY.md for detailed passthrough proxy usage.

Documentation

Document	Description
Providers	All 23 providers, capabilities matrix, special behaviors, adding custom providers
Configuration	Full config reference, directory mode, all sections and fields
Deploy Bare Metal	Systemd unit, config.d layout, secrets, hot reload
Deploy Container	Podman/Docker Compose, image verification, security notes
Deploy Kubernetes	Helm chart usage, ConfigMap/Secret, ingress setup
Passthrough Proxy	Raw provider endpoint proxying, SSE streaming, auth injection
Architecture	Package DAG, request lifecycle, circuit breaker, SSE pipeline
MCP Integration	MCP server integration, tool discovery, multi-turn execution
Adapters	Adapter system internals, auth styles, capability flags
Secrets Format	Systemd credentials, env var fallback, container/K8s deployment
Security	Vulnerability reporting policy

License

Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 396 Commits
.github		.github
.opencode/plans		.opencode/plans
cmd		cmd
config		config
deploy		deploy
docs		docs
examples		examples
internal		internal
packaging/scripts		packaging/scripts
.containerignore		.containerignore
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
go.sum		go.sum
install.sh		install.sh
mise.toml		mise.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nenya AI Gateway

How Nenya handles the requests

Features

Routing & Agents

Security & Privacy

Hardening (Deployment Security)

Reliability

MCP Tool Integration

Quick Start

Run with Podman

Or Install via Package Manager

Runtime Configuration

Or Choose Your Deployment

API Endpoints

Documentation

License

About

Uh oh!

Releases 9

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Nenya AI Gateway

How Nenya handles the requests

Features

Routing & Agents

Security & Privacy

Hardening (Deployment Security)

Reliability

MCP Tool Integration

Quick Start

Run with Podman

Or Install via Package Manager

Runtime Configuration

Or Choose Your Deployment

API Endpoints

Documentation

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages