Ollama Proxy

A minimal asynchronous proxy for non-streaming Ollama requests.

Exposes a single stable HTTP endpoint, distributes requests across healthy backends discovered automatically by scanning a configured localhost port range, and retries on failure.

Features

POST /api/generate — non-streaming only
Least-in-flight load balancing with port tie-breaking
Automatic backend discovery via periodic health probes (GET /api/tags)
Immediate unhealthy marking on request or probe failure; re-enabled only after a successful scan
Configurable retry with per-request backend exclusion
Debug endpoints: GET /__proxy/health, GET /__proxy/backends

Project layout

proxy/
    app.py          FastAPI app, routes, lifespan
    config.py       Settings loaded from environment variables
    state.py        Backend dataclass and registry helpers
    scanner.py      Background scan loop and health probes
    forwarder.py    Backend selection, request forwarding, retry logic
main.py             Entry point (uvicorn)
tests/
    test_proxy.py   Full test suite (25 tests)

Requirements

Python ≥ 3.10
fastapi, httpx, uvicorn[standard]

Install:

pip install -r requirements.txt

Running

python3 main.py

Or with uvicorn directly:

uvicorn proxy.app:app --host 0.0.0.0 --port 11434

Configuration

All settings are read from environment variables. Defaults are shown.

Variable	Default	Description
`PROXY_HOST`	`0.0.0.0`	Bind address for the proxy
`PROXY_PORT`	`11434`	Bind port for the proxy
`BACKEND_HOST`	`127.0.0.1`	Host where backends are reachable
`BACKEND_PORT_START`	`24000`	First port in the scan range (inclusive)
`BACKEND_PORT_END`	`24100`	Last port in the scan range (inclusive)
`SCAN_INTERVAL_SECONDS`	`5`	Seconds between full backend scans
`PROBE_TIMEOUT_SECONDS`	`1.0`	Timeout for each health probe
`MAX_RETRIES`	`2`	Max additional attempts after the first failure (3 total)
`BACKEND_CONNECT_TIMEOUT_SECONDS`	`2.0`	TCP connect timeout for forwarded requests
`BACKEND_READ_TIMEOUT_SECONDS`	`600.0`	Read timeout for forwarded requests
`REJECT_STREAMING`	`true`	Reject requests with `"stream": true`
`LOG_LEVEL`	`INFO`	Logging level (`DEBUG`, `INFO`, `WARNING`, …)

API

`POST /api/generate`

Proxy endpoint. Accepts an Ollama generate request body and returns the backend response unchanged.

Request body (example):

{
  "model": "my-model",
  "prompt": "Hello",
  "stream": false
}

"stream": true is rejected with 400 Bad Request.
"stream" is defaulted to false if absent.

Responses:

Status	Meaning
`200`	Backend response (JSON forwarded unchanged)
`400`	Invalid request (streaming requested or malformed JSON)
`503`	No healthy backends available
`502`	All backend attempts failed

`GET /__proxy/health`

{
  "status": "ok",
  "healthy_backends": 3
}

`GET /__proxy/backends`

{
  "backends": [
    {"port": 24001, "healthy": true, "in_flight": 1},
    {"port": 24002, "healthy": false, "in_flight": 0}
  ]
}

Backend discovery

Each Ollama worker is expected on a port within [BACKEND_PORT_START, BACKEND_PORT_END], typically exposed via a reverse SSH tunnel terminating on localhost.

The scanner probes GET http://127.0.0.1:<port>/api/tags every SCAN_INTERVAL_SECONDS. A backend is healthy only when the probe returns HTTP 200. On any failure (connection error, timeout, non-200) the backend is marked unhealthy. Recovery happens only when a later scan succeeds.

On a forwarded request failure the backend is also marked unhealthy immediately and the request is retried on another healthy backend (up to MAX_RETRIES additional attempts).

Running tests

pip install -r requirements-dev.txt
python3 -m pytest tests/ -v

The test suite (25 tests) covers:

Streaming rejection
No-healthy-backends 503
Successful forwarding
Retry on 5xx and transport errors
No retry on 4xx
Least-in-flight and port tie-breaking selection
Immediate unhealthy marking on failures
Backend exclusion within a single request
Scanner probe success and failure cases
State helper correctness
Debug endpoint responses

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
proxy		proxy
sge		sge
tests		tests
.gitignore		.gitignore
DEFINITION.md		DEFINITION.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ollama Proxy

Features

Project layout

Requirements

Running

Configuration

API

`POST /api/generate`

`GET /__proxy/health`

`GET /__proxy/backends`

Backend discovery

Running tests

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ollama Proxy

Features

Project layout

Requirements

Running

Configuration

API

POST /api/generate

GET /__proxy/health

GET /__proxy/backends

Backend discovery

Running tests

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`POST /api/generate`

`GET /__proxy/health`

`GET /__proxy/backends`

Packages