A lightweight, zero-dependency AI API Gateway written in Go. Nenya sits between your AI coding clients and upstream LLM providers, adding secret redaction, context management, agent routing, and MCP tool integration — all with transparent SSE streaming. Security-hardened: non-root execution, mlock for secrets, seccomp + no-new-privileges.
Compatible with any provider that implements the OpenAI Or Anthropic Chat Completions API. For 23 providers we ship built-in adapters with specialized handling.
+----------------------------------------------+
| Client (Cursor / OpenCode / Aider / etc.) |
| OpenAI-compatible request |
| POST /v1/chat/completions + Bearer token |
| or |
| Anthropic Messages API request |
| POST /v1/messages + x-api-key |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Nenya Gateway |
| - auth check + RBAC enforcement |
| - parse JSON + extract model |
| - resolve agent/provider |
| - optional cache (HIT => replay SSE) |
| - optional MCP context/tool injection |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Interceptor Chain (pluggable, best-effort) |
| - RedactInterceptor (regex patterns) |
| - EntropyInterceptor (high-entropy strings) |
| - TFIDFInterceptor (relevance scoring) |
| - BouncerInterceptor (engine summarization) |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Token Budget Trimming (if payload > hard |
| limit) drops oldest non-system messages and |
| applies token-aware middle-out truncation |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Routing |
| A) Standard forwarding |
| - fallback chain + circuit breaker + RL |
| B) MCP multi-turn tool loop (if enabled) |
| - buffer SSE, execute MCP tools, re-send |
| C) Context-limit retry |
| - on upstream 413/context_exceeded, |
| summarize payload, retry with fallback |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Upstream LLM Providers |
| Anthropic | Gemini | DeepSeek | Mistral | ...|
+----------------------------------------------+
|
| SSE stream
v
+----------------------------------------------+
| Nenya SSE Pipeline |
| - adapter response transforms |
| - (optional) OpenAI→Anthropic conversion |
| - usage accounting + stream filter |
| - flush + (optional) cache capture |
| - (optional) MCP auto-save |
+----------------------------------------------+
|
v
+----------------------------------------------+
| Client receives transparent SSE output |
+----------------------------------------------+
Flow notes:
/v1/*endpoints require client bearer auth;/healthz,/statsz,/metricsdo not.- Pipeline failures degrade gracefully and forward the request instead of returning a 500.
- MCP-enabled agents can run local/remote tools without exposing MCP complexity to the client.
- Config-driven provider registry — add providers via JSON, zero code changes
- 23 built-in providers with specialized adapters for wire format differences
- Dynamic model discovery — fetches live model catalogs from providers at startup and on reload
- Model registry — reference models by string shorthand with automatic provider/context resolution
- Multi-provider model resolution — when a model exists in multiple providers, all are added to the agent's fallback chain
- Three-tier model resolution — config overrides > discovered models > static registry
- Per-model wire format — models from multi-format gateways (OpenCode Zen) auto-convert between OpenAI, Anthropic, and Gemini wire formats based on the model's
formatattribute - Agent fallback chains — round-robin or sequential with circuit breaker and automatic failover
- Latency-aware routing — auto-reorder targets by historical median response time with ±5% jitter to prevent thundering herd
- Per-agent system prompts — inline or file-based
- Tier-0 regex secret filter — always-on redaction of AWS keys, GitHub tokens, passwords, etc.
- 3-Tier content pipeline — pluggable interceptor chain: regex redaction, entropy filtering, TF-IDF relevance scoring, engine summarization
- Context window compaction — sliding window summarization with configurable engine
- Stale tool call pruning — compact old assistant+tool response pairs to save tokens
- Thought pruning — strip reasoning blocks from assistant message history
- Input validation — strict body limits, JSON sanitization, header filtering
- Graceful degradation — never blocks requests due to engine or pipeline failures
- Role-Based Access Control (RBAC) — per-API key roles (admin, user, read-only) with agent and endpoint restrictions
- Secure memory — mlock-protected token storage, read-only sealing, core dump prevention
- Secure memory (default): All tokens stored in mlock-protected RAM, sealed read-only after init, core dumps disabled
- Non-root execution — runs as UID 65532 with dropped capabilities
- Memory protection —
LimitMEMLOCK=infinityandLimitCORE=0in systemd - Read-only filesystem — immutable root + private
/tmp - Seccomp + no-new-privileges — restricted syscalls, prevents privilege escalation
- Zero-trust secrets — loaded via systemd credentials or container mounts, never to disk
- Socket activation — seamless restarts with zero dropped connections
- Zero external dependencies — Go standard library only
- Hot reload —
systemctl reload nenyafor zero-downtime config changes - Circuit breaker — per agent+provider+model with automatic failover, exponential backoff, and semantic error classification
- Rate limiting — per upstream host (RPM/TPM) with per-provider overrides
- Response cache — in-memory LRU with SHA-256 fingerprinting and optional semantic similarity search
- Graceful shutdown — 5s grace period for in-flight requests, MCP client cleanup
- Context-limit auto-retry — upstream context-length errors trigger summarization and retry
- Local engine lifecycle — pre-load and manage local Ollama models with LRU eviction
- Structured errors — all error responses include
error_kindfield for programmatic diagnostics
- Tool discovery — connect to MCP servers for automatic tool injection
- Multi-turn execution — intercept tool calls, execute against MCP servers, forward results
- Auto-search — pre-fetch relevant context from MCP servers before forwarding
- Auto-save — persist assistant responses to MCP memory servers
Create minimal config and secrets:
mkdir -p config secrets
cat > config/config.json << 'EOF'
{
"server": { "listen_addr": ":8080" },
"agents": {
"default": {
"strategy": "fallback",
"models": ["gemini-2.5-flash"]
}
}
}
EOF
cat > secrets/provider_keys.json << 'EOF'
{
"provider_keys": {
"gemini": "AIza..."
}
}
EOF
cat > secrets/client.json << 'EOF'
{
"client_token": "nk-$(openssl rand -hex 32)"
}
EOFRun the container:
podman run -d \
--name nenya \
-p 8080:8080 \
-v ./config:/etc/nenya:ro \
-v ./secrets:/run/secrets/nenya:ro \
-e NENYA_SECRETS_DIR=/run/secrets/nenya \
--cap-drop=ALL \
--cap-add=IPC_LOCK \
--security-opt=no-new-privileges:true \
--read-only \
--tmpfs /tmp:rw,noexec,nosuid,size=64M \
ghcr.io/gumieri/nenya:latestTest it:
curl -H "Authorization: Bearer $(jq -r '.client_token' secrets/client.json)" \
http://localhost:8080/healthzNenya provides native packages for major Linux distributions and community package managers:
| Distribution | Command |
|---|---|
| Debian/Ubuntu (.deb) | Download nenya_<version>_linux_amd64.deb from the release page and run sudo dpkg -i |
| Fedora/RHEL (.rpm) | Download nenya-<version>.x86_64.rpm from the release page and run sudo rpm -i |
| Arch Linux (.pkg.tar.zst) | Download nenya-<version>-x86_64.pkg.tar.zst from the release page and run sudo pacman -U |
| Arch Linux (AUR) | yay -S nenya-bin (or your preferred AUR helper) |
| Nix/NixOS | Add gumieri/nur-packages to your NUR registry and use nenya |
All packages install the binary to /usr/bin/nenya and include systemd service and socket units. After install, enable and start:
sudo systemctl enable --now nenya.socket
sudo systemctl enable --now nenya.serviceNenya supports standard environment variables for deployment portability:
| Variable | Default | Description |
|---|---|---|
PORT |
8080 |
Listening port (overrides server.listen_addr) |
HOST |
— | Optional bind address (e.g. 127.0.0.1). Only used when combined with PORT |
NENYA_CONFIG_DIR |
/etc/nenya/ |
Configuration directory path |
NENYA_CONFIG_FILE |
— | Single config file path (takes precedence over NENYA_CONFIG_DIR) |
NENYA_SECRETS_DIR |
— | Secrets directory (overrides CREDENTIALS_DIRECTORY) |
Example usage:
PORT=9090 HOST=127.0.0.1 ./nenya --config /path/to/config.jsonOr in Docker:
docker run -e PORT=9090 -p 9090:9090 ghcr.io/gumieri/nenya:latest- Deploy Bare Metal (systemd) — Direct binary install, socket activation, hot reload
- Deploy Container (Podman/Docker Compose) — compose.yml, image verification, security hardening
- Deploy Kubernetes (Helm) — Helm chart, ConfigMap/Secret, ingress setup
All /v1/* endpoints require Authorization: Bearer <client_token> or Bearer <api_key_token>.
API keys support RBAC enforcement — agent scoping, endpoint allowlists, role-based permissions (admin bypasses all checks).
| Endpoint | Auth | Description |
|---|---|---|
POST /v1/chat/completions |
Bearer + RBAC | OpenAI-compatible chat with SSE streaming, agent fallback, MCP multi-turn |
POST /v1/messages |
Bearer + RBAC | Anthropic Messages API with bidirectional format conversion |
GET /v1/models |
Bearer + RBAC | Live model catalog from discovered providers + static registry (context window, max tokens) |
POST /v1/embeddings |
Bearer + RBAC | Passthrough proxy |
POST /v1/responses |
Bearer + RBAC | Passthrough proxy |
POST /v1/images/generations |
Bearer + RBAC | Image generation (OpenAI-compatible) |
POST /v1/audio/transcriptions |
Bearer + RBAC | Audio transcription (Whisper-compatible, multipart support) |
POST /v1/audio/speech |
Bearer + RBAC | Text-to-speech synthesis (OpenAI-compatible) |
POST /v1/moderations |
Bearer + RBAC | Content moderation (OpenAI-compatible) |
POST /v1/rerank |
Bearer + RBAC | Re-ranking API (Cohere/Jina/Voyage-compatible) |
POST /v1/a2a |
Bearer + RBAC | Agent-to-Agent protocol (Google A2A) |
GET /v1/files |
Bearer + RBAC | File listing, upload, retrieval, deletion |
POST /v1/batches |
Bearer + RBAC | Batch API operations |
POST /proxy/{provider}/* |
Bearer + RBAC | Arbitrary provider endpoint passthrough (all HTTP methods, SSE streaming) |
GET /healthz |
None | Engine health probe |
GET /statsz |
None | Token usage, circuit breaker state, MCP server status |
GET /metrics |
None | Prometheus-compatible metrics |
GET /debug/pprof/* |
Bearer | Go profiling endpoints (disabled by default, see debug.pprof_enabled) |
See docs/PASSTHROUGH_PROXY.md for detailed passthrough proxy usage.
| Document | Description |
|---|---|
| Providers | All 23 providers, capabilities matrix, special behaviors, adding custom providers |
| Configuration | Full config reference, directory mode, all sections and fields |
| Deploy Bare Metal | Systemd unit, config.d layout, secrets, hot reload |
| Deploy Container | Podman/Docker Compose, image verification, security notes |
| Deploy Kubernetes | Helm chart usage, ConfigMap/Secret, ingress setup |
| Passthrough Proxy | Raw provider endpoint proxying, SSE streaming, auth injection |
| Architecture | Package DAG, request lifecycle, circuit breaker, SSE pipeline |
| MCP Integration | MCP server integration, tool discovery, multi-turn execution |
| Adapters | Adapter system internals, auth styles, capability flags |
| Secrets Format | Systemd credentials, env var fallback, container/K8s deployment |
| Security | Vulnerability reporting policy |
Apache 2.0. See LICENSE.