gridaco · softmarshmallow · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026 · Jun 12, 2026
diff --git a/SECURITY.md b/SECURITY.md
@@ -402,6 +402,34 @@ http://localhost:*`. The nonce is generated in the proxy, exposed
    the BYOK provider; key material never returns to renderer. Closes
    the exfil path even if all four layers above were bypassed.
 
+**Endpoint providers (local LLMs, #806).** The agent host additionally
+serves `/providers/endpoints/*` — CRUD over user-configured
+OpenAI-compatible endpoints (Ollama preset, self-hosted gateways),
+persisted at `${userData}/endpoints.json`. The split that keeps layer 5
+intact: an endpoint **config** (base URL + registered model list) is
+plain readable config the renderer may list back, while an endpoint's
+optional **API key** rides the `/secrets/*` surface under the endpoint's
+id (the secrets-route allowlist admits configured endpoint ids) and is
+never readable. The config validator
+(`packages/grida-ai-agent/src/protocol/endpoints.ts`) pins the shape —
+http(s) URL, bounded sizes, unknown fields dropped — so a config write
+cannot smuggle credentials or blobs into the readable store. The
+`base_url` is user-owned egress by design (the desktop user points their
+own agent at their own endpoint — same trust model as BYOK), and the
+routes sit behind the same CORS/Referer/Basic-Auth stack as everything
+else. The `/providers/endpoints/probe` route makes the host GET a
+user-supplied URL's model listing (the renderer's grida.co origin cannot
+reach a local Ollama itself) — the same egress a configured run already
+performs; responses are parsed and reduced to
+`{id, tool_call, contextWindow}` rows with bounded reads (timeout + size
+cap), never proxied raw. On sandboxed
+platforms the srt network policy additionally bounds all of this
+structurally: outbound to **localhost** is permitted via the
+`allowLocalBinding` local-ip rule (how the user's own `ollama serve` is
+reached), while a config pointing at an arbitrary **remote** host is
+blocked unless that host is in the enumerated `allowed_domains` — a
+hostile config cannot turn the sidecar into an open exfil channel.
+
 **Electron-side hardening (mandatory; see the
 [Electron security checklist](https://www.electronjs.org/docs/latest/tutorial/security)).**
 `contextIsolation: true`, `nodeIntegration: false`, `sandbox: true`,

diff --git a/desktop/src/preload.ts b/desktop/src/preload.ts
@@ -445,6 +445,18 @@ const bridge: DesktopBridge = {
     },
   },
 
+  providers: {
+    list_endpoints: () => agentClient.providers.list_endpoints(),
+    set_endpoint: async (config) => {
+      await agentClient.providers.set_endpoint(config);
+    },
+    delete_endpoint: async (id) => {
+      await agentClient.providers.delete_endpoint(id);
+    },
+    info: () => agentClient.providers.info(),
+    probe_endpoint: (baseUrl) => agentClient.providers.probe_endpoint(baseUrl),
+  },
+
   agent: {
     run: (opts, onChunk) =>
       // Fresh runs always return a stream (only `reconnect` may return

diff --git a/docs/editor/desktop/_category_.json b/docs/editor/desktop/_category_.json
@@ -0,0 +1,8 @@
+{
+  "label": "Desktop",
+  "link": {
+    "type": "generated-index",
+    "title": "Grida Desktop",
+    "description": "Guides for the Grida Desktop app."
+  }
+}
diff --git a/docs/editor/desktop/img/local-models-configured.webp b/docs/editor/desktop/img/local-models-configured.webp
diff --git a/docs/editor/desktop/local-models.md b/docs/editor/desktop/local-models.md
@@ -0,0 +1,127 @@
+---
+title: Local Models (Ollama)
+description: Run the Grida Desktop agent on AI models that live on your own machine — no account, no API key.
+keywords:
+  - ollama
+  - local llm
+  - local ai
+  - byok
+  - grida desktop
+  - ai agent
+format: md
+doc_tasks:
+  - update
+---
+
+# Local Models (Ollama)
+
+Grida Desktop's AI agent can run on models that live entirely on your own
+machine, served by [Ollama](https://ollama.com). There is no account to
+create and no API key to paste — your prompts, files, and the model's
+responses never leave your computer.
+
+You can use local models alongside provider keys (OpenRouter, Vercel), or
+as your only setup.
+
+## Requirements
+
+- **Grida Desktop** installed.
+- **Ollama** installed and running (`ollama serve` — the desktop Ollama app
+  runs it for you).
+- At least one model pulled, for example:
+
+  ```sh
+  ollama pull gpt-oss:20b
+  ```
+
+A note on expectations: local models vary widely in how well they drive
+the agent. The agent leans on tool calling (reading and writing files,
+running commands, planning), and small models often handle this poorly.
+Models in the ~30B class and up are recommended for agent tasks.
+
+## Set up Ollama
+
+Open **Settings** from the app menu, find the **Local Models** card, and
+click **Set up Ollama**. The base URL is prefilled with Ollama's local
+address (`http://localhost:11434/v1`), and the models you have pulled are
+detected automatically.
+
+![The Local Models card after setup, with an auto-detected model and its context window and tool-support badges](./img/local-models-configured.webp)
+
+Review the list and click **Save**:
+
+- Each detected model shows its **context window** and **tool-calling**
+  support as read-only badges. These come from the endpoint itself and
+  refresh whenever you open Settings (and on **Detect**, useful after you
+  `ollama pull` a new model). For a model that is currently loaded, the
+  context window is the size your server actually allocated; otherwise it
+  is the model's maximum.
+- A model you add manually by id (for example on a gateway that doesn't
+  report capabilities) keeps editable fields instead — there, you are
+  the data source. Manually added models default to a conservative
+  `8192` context.
+
+The first model in the list is the default — background work like session
+titles and summaries also runs on it.
+
+## Use a local model
+
+Registered models appear in the model picker in every agent composer,
+grouped under the endpoint name (for example `gpt-oss:20b · Ollama`).
+Pick one and chat as usual. Everything the agent does — reading your
+workspace files, making edits, planning — runs against the local model.
+Each session remembers the model it ran with.
+
+If you have no provider key configured at all, the agent uses your Ollama
+setup automatically.
+
+## Models without tool support
+
+The agent works through tool calls, so a model that cannot make them
+loses most of its abilities. Tool support is detected per model — Ollama
+reports it, and `ollama show <model>` lists `tools` when a model supports
+tool calling. When you select a model without tool support, the composer
+shows a warning, but you can still chat with it.
+
+## Troubleshooting
+
+- **The model errors immediately.** Check that Ollama is running: open
+  `http://localhost:11434` in a browser — it should answer
+  `Ollama is running`.
+- **A model is missing from the picker.** Only registered models appear.
+  Click **Detect** in **Settings → Local Models** after pulling a new
+  model, or add its id manually.
+- **Long sessions stop or degrade.** The detected context window may be
+  larger than what your serving configuration actually allows (it
+  converges to the served size once the model has been loaded). To pin a
+  smaller value, set an override in the config file — see below.
+- **Slow responses.** Local speed is your hardware's speed. Smaller
+  models respond faster but handle agent tasks worse.
+
+## Other OpenAI-compatible endpoints
+
+The base URL accepts any OpenAI-compatible server on your machine, so a
+local gateway such as LiteLLM or vLLM works the same way: point the base
+URL at it and register the models it serves. If the gateway needs an API
+key, save it in the card's **API key** field (it appears once the
+endpoint is saved) — the key is stored by the agent host and never shown
+back. Ollama itself needs no key.
+
+## Advanced: the config file
+
+Everything on this page is stored as plain JSON in `endpoints.json` (the
+settings card links to it). Detected values refresh automatically, so
+hand-edits to them won't stick — if an endpoint reports a value that is
+wrong for your setup (for example, your server caps context below the
+model's maximum), pin the correction in the model's `overrides` instead.
+Overrides always win over detected values, and detection never touches
+them:
+
+```json
+{
+  "id": "gemma4:31b-mlx",
+  "tool_call": true,
+  "contextWindow": 262144,
+  "overrides": { "contextWindow": 32768 }
+}
+```