diff --git a/.agents/skills/debug-inference/SKILL.md b/.agents/skills/debug-inference/SKILL.md
new file mode 100644
index 00000000..26f87b91
--- /dev/null
+++ b/.agents/skills/debug-inference/SKILL.md
@@ -0,0 +1,345 @@
+---
+name: debug-inference
+description: Debug why inference.local or external inference setup is failing. Use when the user cannot reach a local model server, has provider base URL issues, sees inference verification failures, hits protocol mismatches, or needs to diagnose inference on local vs remote gateways. Trigger keywords - debug inference, inference.local, local inference, ollama, vllm, sglang, trtllm, NIM, inference failing, model server unreachable, failed to verify inference endpoint, host.openshell.internal.
+---
+
+# Debug Inference
+
+Diagnose why OpenShell inference is failing and recommend exact fix commands.
+
+Use `openshell` CLI commands to inspect the active gateway, provider records, managed inference config, and sandbox behavior. Use a short sandbox probe when needed to confirm end-to-end routing.
+
+## Overview
+
+OpenShell supports two different inference paths. Diagnose the correct one first.
+
+1. **Managed inference** through `https://inference.local`
+   - Configured by `openshell inference set`
+   - Shared by every sandbox on the active gateway
+   - Credentials and model are injected by OpenShell
+2. **Direct external inference** to hosts like `api.openai.com`
+   - Controlled by `network_policies`
+   - Requires the application to call the external host directly
+   - Requires provider attachment and network access to be configured separately
+
+For local or self-hosted engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments, the most common managed inference pattern is an `openai` provider with `OPENAI_BASE_URL` pointing at a host the gateway can reach.
+
+## Prerequisites
+
+- `openshell` is on the PATH
+- The active gateway is running
+- You know the failing setup, or can infer it from commands and config
+
+## Tools Available
+
+Use these commands first:
+
+```bash
+# Which gateway is active, and can the CLI reach it?
+openshell status
+
+# Show managed inference config for inference.local
+openshell inference get
+
+# Inspect the provider record referenced by inference.local
+openshell provider get <provider-name>
+
+# Inspect gateway topology details when remote/local confusion is suspected
+openshell gateway info
+
+# Run a minimal end-to-end probe from a sandbox
+openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
+```
+
+## Workflow
+
+When the user asks to debug inference, run diagnostics automatically in this order. Stop and report findings as soon as a root cause is identified.
+
+### Determine Context
+
+Establish these facts first:
+
+1. Is the application calling `https://inference.local` or a direct external host?
+2. Which gateway is active, and is it local, remote, or cloud?
+3. Which provider and model are configured for managed inference?
+4. Is the upstream local to the gateway host, or somewhere else?
+
+### Step 0: Check the Active Gateway
+
+Run:
+
+```bash
+openshell status
+openshell gateway info
+```
+
+Look for:
+
+- Active gateway name and endpoint
+- Whether the gateway is local or remote
+- Whether `host.openshell.internal` would point to the local machine or a remote host
+
+Common mistake:
+
+- **Laptop-local model + remote gateway**: `host.openshell.internal` points to the remote gateway host, not your laptop. A laptop-local Ollama or vLLM server will not be reachable without a tunnel or shared reachable network path.
+
+### Step 1: Check Whether Managed Inference Is Configured
+
+Run:
+
+```bash
+openshell inference get
+```
+
+Interpretation:
+
+- **`Not configured`**: `inference.local` has no backend yet. Fix by configuring it:
+
+  ```bash
+  openshell inference set --provider <name> --model <id>
+  ```
+
+- **Provider and model shown**: Continue to provider inspection.
+
+### Step 2: Inspect the Provider Record
+
+Run:
+
+```bash
+openshell provider get <provider-name>
+```
+
+Check:
+
+- Provider type matches the client API shape
+  - `openai` for OpenAI-compatible engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments
+  - `anthropic` for Anthropic Messages API
+  - `nvidia` for NVIDIA-hosted OpenAI-compatible endpoints
+- Required credential key exists
+- `*_BASE_URL` override is correct when using a self-hosted endpoint
+
+Fix examples:
+
+```bash
+openshell provider create --name ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
+
+openshell provider update ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
+```
+
+### Step 3: Check Local Host Reachability
+
+For host-backed local inference, confirm the upstream server:
+
+- Binds to `0.0.0.0`, not only `127.0.0.1`
+- Runs on the same machine as the gateway
+- Is reachable through `host.openshell.internal`, the host's LAN IP, or another reachable hostname
+
+Common mistakes:
+
+- **Base URL uses `127.0.0.1` or `localhost`**: usually wrong for managed inference. Replace with `host.openshell.internal` or the host's LAN IP.
+- **Server binds only to loopback**: reconfigure it to bind to `0.0.0.0`.
+- **Inference engine runs as a system service**: changing the bind address may require updating the service configuration and restarting the service before the new listener becomes reachable.
+
+### Step 4: Check Request Shape
+
+Managed inference only works for `https://inference.local` and supported inference API paths.
+
+Supported patterns include:
+
+- `POST /v1/chat/completions`
+- `POST /v1/completions`
+- `POST /v1/responses`
+- `POST /v1/messages`
+- `GET /v1/models`
+
+Common mistakes:
+
+- **Wrong scheme**: `http://inference.local` instead of `https://inference.local`
+- **Unsupported path**: request does not match a known inference API
+- **Protocol mismatch**: Anthropic client against an `openai` provider, or vice versa
+
+Fix guidance:
+
+- Use a supported path and provider type
+- Point OpenAI-compatible SDKs at `https://inference.local/v1`
+- If the SDK requires an API key, pass any non-empty placeholder such as `test`
+
+### Step 5: Probe from a Sandbox
+
+Run a minimal request from inside a sandbox:
+
+```bash
+openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
+```
+
+Interpretation:
+
+- **`cluster inference is not configured`**: set the managed route with `openshell inference set`
+- **`connection not allowed by policy`** on `inference.local`: unsupported method or path
+- **`no compatible route`**: provider type and client API shape do not match
+- **Connection refused / upstream unavailable / verification failures**: base URL, bind address, topology, or credentials are wrong
+
+### Step 6: Reapply or Repair the Managed Route
+
+After fixing the provider, repoint `inference.local`:
+
+```bash
+openshell inference set --provider <name> --model <id>
+```
+
+If the endpoint is intentionally offline and you only want to save the config:
+
+```bash
+openshell inference set --provider <name> --model <id> --no-verify
+```
+
+Inference updates are hot-reloaded to all sandboxes on the active gateway within about 5 seconds by default.
+
+### Step 7: Diagnose Direct External Inference
+
+If the application calls `api.openai.com`, `api.anthropic.com`, or another external host directly, this is not a managed inference issue.
+
+Check instead:
+
+1. The application is configured to call the external hostname directly
+2. A provider with the needed credentials exists
+3. The sandbox is launched with that provider attached
+4. `network_policies` allow that host, port, and HTTP rules
+
+Use the `generate-sandbox-policy` skill when the user needs help authoring policy YAML.
+
+## Fix: Local Host Inference Timeouts (Firewall)
+
+Use this fix when a sandbox can reach `https://inference.local`, but OpenShell reports an upstream timeout against a host-local backend such as Ollama.
+
+Example symptom:
+
+```json
+{"error":"request to http://host.docker.internal:11434/v1/models timed out"}
+```
+
+### When This Happens
+
+This failure commonly appears on Linux hosts that:
+
+- Run the OpenShell gateway in Docker
+- Route `inference.local` to a host-local OpenAI-compatible endpoint such as Ollama
+- Have a host firewall or networking configuration that denies container-to-host traffic by default
+
+In this case, OpenShell routing is usually working correctly. The failing hop is container-to-host traffic on the backend port.
+
+### Why CoreDNS Is Not the Cause
+
+This is not the same issue as the Colima CoreDNS fix.
+
+OpenShell injects `host.docker.internal` and `host.openshell.internal` into sandbox pods with `hostAliases`. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall or network policy, not CoreDNS.
+
+### Verify the Problem
+
+1. Confirm the model server works on the host:
+
+   ```bash
+   curl -sS http://127.0.0.1:11434/v1/models
+   ```
+
+2. Confirm the host gateway address also works on the host:
+
+   ```bash
+   curl -sS http://172.17.0.1:11434/v1/models
+   ```
+
+3. Test the same endpoint from the OpenShell cluster container:
+
+   ```bash
+   docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
+   ```
+
+If steps 1 and 2 succeed but step 3 times out, the host firewall or network configuration is blocking the container-to-host path.
+
+### Fix
+
+Allow the Docker bridge network used by the OpenShell cluster to reach the host-local inference port. The exact command depends on your firewall tooling (iptables, nftables, firewalld, UFW, etc.), but the rule should allow:
+
+- **Source**: the Docker bridge subnet used by the OpenShell cluster container (commonly `172.18.0.0/16`)
+- **Destination**: the host gateway IP injected into sandbox pods for `host.docker.internal` (commonly `172.17.0.1`)
+- **Port**: the inference server port (e.g. `11434/tcp` for Ollama)
+
+To find the actual values on your system:
+
+```bash
+# Docker bridge subnet for the OpenShell cluster network
+docker network inspect $(docker network ls --filter name=openshell -q) --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
+
+# Host gateway IP visible from inside the container
+docker exec openshell-cluster-<gateway> cat /etc/hosts | grep host.docker.internal
+```
+
+Adjust the source subnet, destination IP, or port to match your local Docker network layout.
+
+### Verify the Fix
+
+1. Re-run the cluster container check:
+
+   ```bash
+   docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
+   ```
+
+2. Re-test from a sandbox:
+
+   ```bash
+   curl -sS https://inference.local/v1/models
+   ```
+
+Both commands should return the upstream model list.
+
+### If It Still Fails
+
+- Confirm the backend listens on a host-reachable address: `ss -ltnp | rg ':11434\b'`
+- Confirm the provider points at the host alias path you expect: `openshell provider get <provider-name>`
+- Confirm the active inference route: `openshell inference get`
+- Inspect sandbox logs for upstream timeout details: `openshell logs <sandbox-name> --since 10m`
+
+## Common Failure Patterns
+
+| Symptom | Likely cause | Fix |
+|---------|--------------|-----|
+| `openshell inference get` shows `Not configured` | No managed inference route configured | `openshell inference set --provider <name> --model <id>` |
+| `failed to verify inference endpoint` | Bad base URL, wrong credentials, wrong provider type, or upstream not reachable | Fix provider config, then rerun `openshell inference set`; use `--no-verify` only when the endpoint is intentionally offline |
+| Base URL uses `127.0.0.1` | Loopback points at the wrong runtime | Use `host.openshell.internal` or another gateway-reachable host |
+| Local engine works only when gateway is local | Gateway moved to remote host | Run the engine on the gateway host, add a tunnel, or use direct external access |
+| `connection not allowed by policy` on `inference.local` | Unsupported path or method | Use a supported inference API path |
+| `no compatible route` | Provider type does not match request shape | Switch provider type or change the client API |
+| Direct call to external host is denied | Missing policy or provider attachment | Update `network_policies` and launch sandbox with the right provider |
+| SDK fails on empty auth token | Client requires a non-empty API key even though OpenShell injects the real one | Use any placeholder token such as `test` |
+| Upstream timeout from container to host-local backend | Host firewall or network config blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) |
+
+## Full Diagnostic Dump
+
+Run this when you want a compact report before deciding on a fix:
+
+```bash
+echo "=== Gateway Status ==="
+openshell status
+
+echo "=== Gateway Info ==="
+openshell gateway info
+
+echo "=== Managed Inference ==="
+openshell inference get
+
+echo "=== Providers ==="
+openshell provider list
+
+echo "=== Selected Provider ==="
+openshell provider get <provider-name>
+
+echo "=== Sandbox Probe ==="
+openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
+```
+
+When you report back, state:
+
+1. Which inference path is failing (`inference.local` vs direct external)
+2. Whether gateway topology is part of the problem
+3. The most likely root cause
+4. The exact fix commands the user should run
diff --git a/.agents/skills/openshell-cli/SKILL.md b/.agents/skills/openshell-cli/SKILL.md
index 211c0e19..132c9968 100644
--- a/.agents/skills/openshell-cli/SKILL.md
+++ b/.agents/skills/openshell-cli/SKILL.md
@@ -208,7 +208,7 @@ openshell sandbox delete sandbox-1 sandbox-2 sandbox-3   # Multiple at once
 
 This is the most important multi-step workflow. It enables a tight feedback cycle where sandbox policy is refined based on observed activity.
 
-**Key concept**: Policies have static fields (immutable after creation: `filesystem_policy`, `landlock`, `process`) and dynamic fields (hot-reloadable on a running sandbox: `network_policies`, `inference`). Only dynamic fields can be updated without recreating the sandbox.
+**Key concept**: Policies have static fields (immutable after creation: `filesystem_policy`, `landlock`, `process`) and one dynamic field (`network_policies`). Only `network_policies` can be updated without recreating the sandbox.
 
 ```
 Create sandbox with initial policy
@@ -272,7 +272,7 @@ Edit `current-policy.yaml` to allow the blocked actions. **For policy content au
 - Enforcement modes (`audit` vs `enforce`)
 - Binary matching patterns
 
-Only `network_policies` and `inference` sections can be modified at runtime. If `filesystem_policy`, `landlock`, or `process` need changes, the sandbox must be recreated.
+Only `network_policies` can be modified at runtime. If `filesystem_policy`, `landlock`, or `process` need changes, the sandbox must be recreated.
 
 ### Step 5: Push the updated policy
 
@@ -564,4 +564,5 @@ $ openshell sandbox upload --help
 |-------|------------|
 | `generate-sandbox-policy` | Creating or modifying policy YAML content (network rules, L7 inspection, access presets, endpoint configuration) |
 | `debug-openshell-cluster` | Diagnosing cluster startup or health failures |
+| `debug-inference` | Diagnosing `inference.local`, host-backed local inference, and provider base URL issues |
 | `tui-development` | Developing features for the OpenShell TUI (`openshell term`) |
diff --git a/.agents/skills/openshell-cli/cli-reference.md b/.agents/skills/openshell-cli/cli-reference.md
index 59ab9e3d..e344f20d 100644
--- a/.agents/skills/openshell-cli/cli-reference.md
+++ b/.agents/skills/openshell-cli/cli-reference.md
@@ -270,7 +270,7 @@ View sandbox logs. Supports one-shot and streaming.
 
 ### `openshell policy set <name> --policy <PATH>`
 
-Update the policy on a live sandbox. Only dynamic fields (`network_policies`, `inference`) can be changed at runtime.
+Update the policy on a live sandbox. Only the dynamic `network_policies` field can be changed at runtime.
 
 | Flag | Default | Description |
 |------|---------|-------------|
diff --git a/.agents/skills/triage-issue/SKILL.md b/.agents/skills/triage-issue/SKILL.md
index 1fe72276..083dcdd8 100644
--- a/.agents/skills/triage-issue/SKILL.md
+++ b/.agents/skills/triage-issue/SKILL.md
@@ -91,7 +91,7 @@ Check whether the issue body contains a substantive agent diagnostic section. Lo
    >
    > This issue was opened without an agent investigation.
    >
-   > OpenShell is an agent-first project — before we triage this, please point your coding agent at the repo and have it investigate. Your agent can load skills like `debug-openshell-cluster` (for cluster issues), `openshell-cli` (for usage questions), or `generate-sandbox-policy` (for policy help).
+   > OpenShell is an agent-first project - before we triage this, please point your coding agent at the repo and have it investigate. Your agent can load skills like `debug-openshell-cluster` (for cluster issues), `debug-inference` (for inference setup issues), `openshell-cli` (for usage questions), or `generate-sandbox-policy` (for policy help).
    >
    > See [CONTRIBUTING.md](https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md#before-you-open-an-issue) for the full workflow.
    >
@@ -123,6 +123,7 @@ Based on the sub-agent's analysis, also attempt to validate the report directly:
 - For bug reports: check the relevant code paths, look for the described failure mode
 - For feature requests: assess feasibility against the existing architecture
 - For cluster/infrastructure issues: reference the `debug-openshell-cluster` skill's known failure patterns
+- For inference and provider-topology issues: reference the `debug-inference` skill's known failure patterns
 - For CLI/usage issues: reference the `openshell-cli` skill's command reference
 
 ## Step 5: Classify
diff --git a/.github/ISSUE_TEMPLATE/bug_report.yml b/.github/ISSUE_TEMPLATE/bug_report.yml
index 9018938e..a4b531a0 100644
--- a/.github/ISSUE_TEMPLATE/bug_report.yml
+++ b/.github/ISSUE_TEMPLATE/bug_report.yml
@@ -8,7 +8,7 @@ body:
       value: |
         ## Agent-First Troubleshooting
 
-        OpenShell is an agent-first project. Before filing this bug, point your coding agent at the repo and have it investigate using the available skills (`debug-openshell-cluster`, `openshell-cli`, etc.). See [CONTRIBUTING.md](https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md) for the full skills table.
+        OpenShell is an agent-first project. Before filing this bug, point your coding agent at the repo and have it investigate using the available skills (`debug-openshell-cluster`, `debug-inference`, `openshell-cli`, etc.). See [CONTRIBUTING.md](https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md) for the full skills table.
 
   - type: textarea
     id: agent-diagnostic
@@ -18,9 +18,10 @@ body:
         Paste the output from your agent's investigation of this bug. What skills did it load? What did it find? What did it try?
       placeholder: |
         Example:
-        - Loaded `debug-openshell-cluster` skill
-        - Ran `openshell doctor logs` — found CoreDNS timeout errors
-        - Agent attempted DNS config fix but the issue persists because...
+        - Loaded `debug-inference` skill
+        - Ran `openshell inference get` and `openshell provider get ollama`
+        - Found `OPENAI_BASE_URL=http://127.0.0.1:11434/v1`, which is unreachable from the gateway
+        - Updated the provider to use `host.openshell.internal`, but the issue persists because the gateway is remote
     validations:
       required: true
 
@@ -71,7 +72,7 @@ body:
       options:
         - label: I pointed my agent at the repo and had it investigate this issue
           required: true
-        - label: I loaded relevant skills (e.g., `debug-openshell-cluster`, `openshell-cli`)
+        - label: I loaded relevant skills (e.g., `debug-openshell-cluster`, `debug-inference`, `openshell-cli`)
           required: true
         - label: My agent could not resolve this — the diagnostic above explains why
           required: true
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
index fa5c48f2..da08fcd1 100644
--- a/.github/ISSUE_TEMPLATE/config.yml
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -3,9 +3,9 @@ contact_links:
   - name: Have a question?
     url: https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md#agent-skills-for-contributors
     about: >
-      Point your agent at the repo. It has skills for CLI usage, cluster
-      debugging, policy generation, and more. See CONTRIBUTING.md for the
-      full skills table.
+      Point your agent at the repo. It has skills for CLI usage, cluster and
+      inference debugging, policy generation, and more. See CONTRIBUTING.md
+      for the full skills table.
   - name: Security vulnerability?
     url: https://github.com/NVIDIA/OpenShell/blob/main/SECURITY.md
     about: >
diff --git a/.github/workflows/issue-triage.yml b/.github/workflows/issue-triage.yml
index 39da3b4e..241af7f6 100644
--- a/.github/workflows/issue-triage.yml
+++ b/.github/workflows/issue-triage.yml
@@ -53,7 +53,7 @@ jobs:
               body: [
                 'This issue appears to have been opened without an agent investigation.',
                 '',
-                'OpenShell is an agent-first project — please point your coding agent at the repo and have it diagnose this before we triage. Your agent can load skills like `debug-openshell-cluster`, `openshell-cli`, and `generate-sandbox-policy`.',
+                'OpenShell is an agent-first project - please point your coding agent at the repo and have it diagnose this before we triage. Your agent can load skills like `debug-openshell-cluster`, `debug-inference`, `openshell-cli`, and `generate-sandbox-policy`.',
                 '',
                 'See [CONTRIBUTING.md](https://github.com/NVIDIA/OpenShell/blob/main/CONTRIBUTING.md#before-you-open-an-issue) for the full workflow.',
               ].join('\n')
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 70bf5ef4..e4e4f834 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -7,7 +7,7 @@ OpenShell is built agent-first. We design systems and use agents to implement th
 This project ships with [agent skills](#agent-skills-for-contributors) that can diagnose problems, explore the codebase, generate policies, and walk you through common workflows. Before filing an issue:
 
 1. Clone the repo and point your coding agent at it.
-2. Load the relevant skill — `debug-openshell-cluster` for cluster problems, `openshell-cli` for usage questions, `generate-sandbox-policy` for policy help.
+2. Load the relevant skill - `debug-openshell-cluster` for cluster problems, `debug-inference` for inference setup problems, `openshell-cli` for usage questions, `generate-sandbox-policy` for policy help.
 3. Have your agent investigate. Let it run diagnostics, read the architecture docs, and attempt a fix.
 4. If the agent cannot resolve it, open an issue **with the agent's diagnostic output attached**. The issue template requires this.
 
@@ -16,12 +16,13 @@ This project ships with [agent skills](#agent-skills-for-contributors) that can
 - A real bug that your agent confirmed and could not fix.
 - A feature proposal with a design — not a "please build this" request.
 - An infrastructure problem that the `debug-openshell-cluster` skill could not resolve.
+- An inference setup problem that the `debug-inference` skill could not resolve.
 - Security vulnerabilities must follow [SECURITY.md](SECURITY.md) — **not** GitHub issues.
 
 ### When NOT to Open an Issue
 
 - Questions about how things work — your agent can answer these from the codebase and architecture docs.
-- Configuration problems — your agent can diagnose these with `openshell-cli` and `debug-openshell-cluster`.
+- Configuration problems - your agent can diagnose these with `openshell-cli`, `debug-openshell-cluster`, and `debug-inference`.
 - "How do I..." requests — the skills cover CLI usage, policy generation, TUI development, and more.
 
 ## Agent Skills for Contributors
@@ -32,6 +33,7 @@ Skills live in `.agents/skills/`. Your agent's harness can discover and load the
 |----------|-------|---------|
 | Getting Started | `openshell-cli` | CLI usage, sandbox lifecycle, provider management, BYOC workflows |
 | Getting Started | `debug-openshell-cluster` | Diagnose cluster startup failures and health issues |
+| Getting Started | `debug-inference` | Diagnose `inference.local`, host-backed local inference, and direct external inference setup issues |
 | Contributing | `create-spike` | Investigate a problem, produce a structured GitHub issue |
 | Contributing | `build-from-issue` | Plan and implement work from a GitHub issue (maintainer workflow) |
 | Contributing | `create-github-issue` | Create well-structured GitHub issues |
diff --git a/README.md b/README.md
index b30b8134..f44dc815 100644
--- a/README.md
+++ b/README.md
@@ -205,7 +205,7 @@ cd OpenShell
 # Point your agent here — it will discover the skills in .agents/skills/ automatically
 ```
 
-Your agent can load skills for CLI usage (`openshell-cli`), cluster troubleshooting (`debug-openshell-cluster`), policy generation (`generate-sandbox-policy`), and more. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full skills table.
+Your agent can load skills for CLI usage (`openshell-cli`), cluster troubleshooting (`debug-openshell-cluster`), inference troubleshooting (`debug-inference`), policy generation (`generate-sandbox-policy`), and more. See [CONTRIBUTING.md](CONTRIBUTING.md) for the full skills table.
 
 ## Built With Agents
 
diff --git a/architecture/security-policy.md b/architecture/security-policy.md
index efbcee76..60d29130 100644
--- a/architecture/security-policy.md
+++ b/architecture/security-policy.md
@@ -84,7 +84,7 @@ Policy fields fall into two categories based on when they are enforced:
 | Category | Fields | Enforcement Point | Updatable? |
 |----------|--------|-------------------|------------|
 | **Static** | `filesystem_policy`, `landlock`, `process` | Applied once in the child process `pre_exec` (after `fork()`, before `exec()`). Kernel-level Landlock rulesets and UID/GID changes cannot be reversed. | No -- immutable after sandbox creation |
-| **Dynamic** | `network_policies`, `inference` | Evaluated at runtime by the OPA engine on every proxy CONNECT request and L7 rule check. The OPA engine can be atomically replaced. | Yes -- via `openshell policy set` |
+| **Dynamic** | `network_policies` | Evaluated at runtime by the OPA engine on every proxy CONNECT request and L7 rule check. The OPA engine can be atomically replaced. | Yes -- via `openshell policy set` |
 
 Attempting to change a static field in an update request returns an `INVALID_ARGUMENT` error with a message indicating which field cannot be modified. See `crates/openshell-server/src/grpc.rs` -- `validate_static_fields_unchanged()`.
 
@@ -1092,9 +1092,6 @@ network_policies:
     binaries:
       - { path: /usr/local/bin/python3.13 }
 
-inference:
-  allowed_routes:
-    - local
 ```
 
 ---
diff --git a/docs/about/architecture.md b/docs/about/architecture.md
index 5497c61e..07864bb3 100644
--- a/docs/about/architecture.md
+++ b/docs/about/architecture.md
@@ -51,11 +51,11 @@ Every outbound connection from agent code passes through the same decision path:
 
 1. The agent process opens an outbound connection (API call, package install, git clone, and so on).
 2. The proxy inside the sandbox intercepts the connection and identifies which binary opened it.
-3. The proxy queries the policy engine with the destination, port, and calling binary.
-4. The policy engine returns one of three decisions:
-   - **Allow** — the destination and binary match a policy block. Traffic flows directly to the external service.
-   - **Route for inference** — no policy block matched, but inference routing is configured. The privacy router intercepts the request, strips the original credentials, injects the configured backend credentials, and forwards to the managed model endpoint.
-   - **Deny** — no match and no inference route. The connection is blocked and logged.
+3. If the target is `https://inference.local`, the proxy handles it as managed inference before policy evaluation. OpenShell strips sandbox-supplied credentials, injects the configured backend credentials, and forwards the request to the managed model endpoint.
+4. For every other destination, the proxy queries the policy engine with the destination, port, and calling binary.
+5. The policy engine returns one of two decisions:
+   - **Allow** - the destination and binary match a policy block. Traffic flows directly to the external service.
+   - **Deny** - no policy block matched. The connection is blocked and logged.
 
 For REST endpoints with TLS termination enabled, the proxy also decrypts TLS and checks each HTTP request against per-method, per-path rules before allowing it through.
 
diff --git a/docs/inference/configure.md b/docs/inference/configure.md
index bf0103a7..fb048dc9 100644
--- a/docs/inference/configure.md
+++ b/docs/inference/configure.md
@@ -58,10 +58,10 @@ $ openshell provider create \
     --name my-local-model \
     --type openai \
     --credential OPENAI_API_KEY=empty-if-not-required \
-    --config OPENAI_BASE_URL=http://192.168.10.15/v1
+    --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
 ```
 
-Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running on your network. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication.
+Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running where the gateway runs. For host-backed local inference, use `host.openshell.internal` or the host's LAN IP. Avoid `127.0.0.1` and `localhost`. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication.
 
 ::::
 
@@ -131,8 +131,14 @@ response = client.chat.completions.create(
 
 The client-supplied `model` and `api_key` values are not sent upstream. The privacy router injects the real credentials from the configured provider and rewrites the model before forwarding.
 
+Some SDKs require a non-empty API key even though `inference.local` does not use the sandbox-provided value. In those cases, pass any placeholder such as `test` or `unused`.
+
 Use this endpoint when inference should stay local to the host for privacy and security reasons. External providers that should be reached directly belong in `network_policies` instead.
 
+When the upstream runs on the same machine as the gateway, bind it to `0.0.0.0` and point the provider at `host.openshell.internal` or the host's LAN IP. `127.0.0.1` and `localhost` usually fail because the request originates from the gateway or sandbox runtime, not from your shell.
+
+If the gateway runs on a remote host or behind a cloud deployment, `host.openshell.internal` points to that remote machine, not to your laptop. A laptop-local Ollama or vLLM process is not reachable from a remote gateway unless you add your own tunnel or shared network path.
+
 ### Verify the Endpoint from a Sandbox
 
 `openshell inference set` and `openshell inference update` verify the resolved upstream endpoint by default before saving the configuration. If the endpoint is not live yet, retry with `--no-verify` to persist the route without the probe.
@@ -152,11 +158,13 @@ A successful response confirms the privacy router can reach the configured backe
 
 - Gateway-scoped: Every sandbox using the active gateway sees the same `inference.local` backend.
 - HTTPS only: `inference.local` is intercepted only for HTTPS traffic.
+- Hot reload: Provider and inference changes are picked up within about 5 seconds by default.
 
 ## Next Steps
 
 Explore related topics:
 
 - To understand the inference routing flow and supported API patterns, refer to {doc}`index`.
+- To follow a complete Ollama-based local setup, refer to {doc}`/tutorials/local-inference-ollama`.
 - To control external endpoints, refer to [Policies](/sandboxes/policies.md).
 - To manage provider records, refer to {doc}`../sandboxes/manage-providers`.
diff --git a/docs/inference/index.md b/docs/inference/index.md
index 842a3fb1..3a34affb 100644
--- a/docs/inference/index.md
+++ b/docs/inference/index.md
@@ -43,9 +43,9 @@ If code calls an external inference host directly, that traffic is evaluated onl
 | Property | Detail |
 |---|---|
 | Credentials | No sandbox API keys needed. Credentials come from the configured provider record. |
-| Configuration | One provider and one model define sandbox inference. |
+| Configuration | One provider and one model define sandbox inference for the active gateway. Every sandbox on that gateway sees the same `inference.local` backend. |
 | Provider support | OpenAI, Anthropic, and NVIDIA providers all work through the same endpoint. |
-| Hot-refresh | OpenShell picks up provider credential changes and inference updates without recreating sandboxes. |
+| Hot-refresh | OpenShell picks up provider credential changes and inference updates without recreating sandboxes. Changes propagate within about 5 seconds by default. |
 
 ## Supported API Patterns
 
diff --git a/docs/sandboxes/manage-sandboxes.md b/docs/sandboxes/manage-sandboxes.md
index d2790d57..5306120a 100644
--- a/docs/sandboxes/manage-sandboxes.md
+++ b/docs/sandboxes/manage-sandboxes.md
@@ -121,7 +121,7 @@ OpenShell Terminal combines sandbox status and live logs in a single real-time d
 $ openshell term
 ```
 
-Use the terminal to spot blocked connections marked `action=deny` and inference interceptions marked `action=inspect_for_inference`. If a connection is blocked unexpectedly, add the host to your network policy. Refer to {doc}`policies` for the workflow.
+Use the terminal to spot blocked connections marked `action=deny` and inference-related proxy activity. If a connection is blocked unexpectedly, add the host to your network policy. Refer to {doc}`policies` for the workflow.
 
 ## Port Forwarding
 
diff --git a/docs/sandboxes/policies.md b/docs/sandboxes/policies.md
index 9c182de6..3ee7b50d 100644
--- a/docs/sandboxes/policies.md
+++ b/docs/sandboxes/policies.md
@@ -30,7 +30,7 @@ Use this page to apply and iterate policy changes on running sandboxes. For a fu
 
 ## Policy Structure
 
-A policy has static sections `filesystem_policy`, `landlock`, and `process` that are locked at sandbox creation, and dynamic sections `network_policies` and `inference` that are hot-reloadable on a running sandbox.
+A policy has static sections `filesystem_policy`, `landlock`, and `process` that are locked at sandbox creation, and a dynamic section `network_policies` that is hot-reloadable on a running sandbox.
 
 ```yaml
 version: 1
@@ -63,9 +63,6 @@ network_policies:
     binaries:
       - path: /usr/bin/curl
 
-# Dynamic: hot-reloadable. Routing hints this sandbox can use for inference (e.g. local, nvidia).
-inference:
-  allowed_routes: [local]
 ```
 
 Static sections are locked at sandbox creation. Changing them requires destroying and recreating the sandbox.
@@ -76,8 +73,7 @@ Dynamic sections can be updated on a running sandbox with `openshell policy set`
 | `filesystem_policy` | Static | Controls which directories the agent can access on disk. Paths are split into `read_only` and `read_write` lists. Any path not listed in either list is inaccessible. Set `include_workdir: true` to automatically add the agent's working directory to `read_write`. [Landlock LSM](https://docs.kernel.org/security/landlock.html) enforces these restrictions at the kernel level. |
 | `landlock` | Static | Configures Landlock LSM enforcement behavior. Set `compatibility` to `best_effort` (use the highest ABI the host kernel supports) or `hard_requirement` (fail if the required ABI is unavailable). |
 | `process` | Static | Sets the OS-level identity for the agent process. `run_as_user` and `run_as_group` default to `sandbox`. Root (`root` or `0`) is rejected. The agent also runs with seccomp filters that block dangerous system calls. |
-| `network_policies` | Dynamic | Controls network access for the sandbox. Each block has a name, a list of endpoints (host, port, protocol, and optional rules), and a list of binaries allowed to use those endpoints. <br>Every outbound connection goes through the proxy, which queries the {doc}`policy engine <../about/architecture>` with the destination and calling binary. A connection is allowed only when both match an entry in the same policy block. <br>For endpoints with `protocol: rest` and `tls: terminate`, each HTTP request is also checked against that endpoint's `rules` (method and path). <br>Endpoints without `protocol` or `tls` allow the TCP stream through without inspecting payloads. <br>If no endpoint matches and inference routes are configured, the request may be rerouted for inference. Otherwise the connection is denied. |
-| `inference` | Dynamic | Controls which inference routing backends the sandbox can use. Set `allowed_routes` to a list of route names (for example, `[local]` or `[local, nvidia]`). When an outbound request does not match any `network_policies` entry, the proxy checks whether the destination matches a configured inference route. If it does and the route is in `allowed_routes`, the request is forwarded to that backend. |
+| `network_policies` | Dynamic | Controls network access for ordinary outbound traffic from the sandbox. Each block has a name, a list of endpoints (host, port, protocol, and optional rules), and a list of binaries allowed to use those endpoints. <br>Every outbound connection except `https://inference.local` goes through the proxy, which queries the {doc}`policy engine <../about/architecture>` with the destination and calling binary. A connection is allowed only when both match an entry in the same policy block. <br>For endpoints with `protocol: rest` and `tls: terminate`, each HTTP request is also checked against that endpoint's `rules` (method and path). <br>Endpoints without `protocol` or `tls` allow the TCP stream through without inspecting payloads. <br>If no endpoint matches, the connection is denied. Configure managed inference separately through {doc}`../inference/configure`. |
 
 ## Apply a Custom Policy
 
@@ -137,7 +133,7 @@ The following steps outline the hot-reload policy update workflow.
    $ openshell policy get <name> --full > current-policy.yaml
    ```
 
-4. Edit the YAML: add or adjust `network_policies` entries, binaries, `access` or `rules`, or `inference.allowed_routes`.
+4. Edit the YAML: add or adjust `network_policies` entries, binaries, `access`, or `rules`.
 
 5. Push the updated policy. Exit codes: 0 = loaded, 1 = validation failed, 124 = timeout.
 
diff --git a/docs/tutorials/index.md b/docs/tutorials/index.md
index 04638807..fcfa968d 100644
--- a/docs/tutorials/index.md
+++ b/docs/tutorials/index.md
@@ -43,6 +43,15 @@ Launch Claude Code in a sandbox, diagnose a policy denial, and iterate on a cust
 +++
 {bdg-secondary}`Tutorial`
 :::
+
+:::{grid-item-card} Local Inference with Ollama
+:link: local-inference-ollama
+:link-type: doc
+
+Route inference to a local Ollama server, verify it from a sandbox, and reuse the same pattern for other OpenAI-compatible engines.
++++
+{bdg-secondary}`Tutorial`
+:::
 ::::
 
 ```{toctree}
@@ -50,4 +59,5 @@ Launch Claude Code in a sandbox, diagnose a policy denial, and iterate on a cust
 
 First Network Policy <first-network-policy>
 GitHub Push Access <github-sandbox>
+Local Inference with Ollama <local-inference-ollama>
 ```
diff --git a/docs/tutorials/local-inference-ollama.md b/docs/tutorials/local-inference-ollama.md
new file mode 100644
index 00000000..f8e82273
--- /dev/null
+++ b/docs/tutorials/local-inference-ollama.md
@@ -0,0 +1,156 @@
+---
+title:
+  page: Run Local Inference with Ollama
+  nav: Local Inference with Ollama
+description: Configure inference.local to route sandbox requests to a local Ollama server running on the gateway host.
+topics:
+- Generative AI
+- Cybersecurity
+tags:
+- Tutorial
+- Inference Routing
+- Ollama
+- Local Inference
+- Sandbox
+content:
+  type: tutorial
+  difficulty: technical_intermediate
+  audience:
+  - engineer
+---
+
+<!--
+  SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+-->
+
+# Run Local Inference with Ollama
+
+This tutorial shows how to route sandbox inference to a model running locally.
+
+:::{note}
+This tutorial uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
+:::
+
+After completing this tutorial, you will know how to:
+
+- Expose a local inference server to OpenShell sandboxes.
+- Verify end-to-end inference from inside a sandbox.
+
+## Prerequisites
+
+- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding.
+
+If your gateway runs on a remote host or in a cloud deployment, Ollama must also run there. Another common scenario is running a model and the gateway on different nodes in the same local network.
+
+Install [Ollama](https://ollama.com/) with:
+
+```console  
+$ curl -fsSL https://ollama.com/install.sh | sh
+```
+
+## Step 1: Start Ollama on All Interfaces
+
+By default, Ollama listens only on the loopback address (`127.0.0.1`), which is not reachable from the OpenShell gateway or sandboxes. Start Ollama so it listens on all interfaces:
+
+```console
+$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
+```
+
+:::{tip}
+If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first, then start it manually with the correct bind address:
+
+```console
+$ systemctl stop ollama
+$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
+```
+:::
+
+## Step 2: Pull a Model
+
+In a second terminal, pull a lightweight model:
+
+```console
+$ ollama run qwen3.5:0.8b
+```
+
+This downloads the model and starts an interactive session. Type `/bye` to exit the session. The model stays available for inference after you exit.
+
+:::{note}
+`qwen3.5:0.8b` is a good smoke-test target for verifying your local inference setup, but it is best suited for simple tasks. For more complex coding, reasoning, or agent workflows, use a stronger open model such as Nemotron or another larger open-source model that fits your hardware.
+:::
+
+Confirm the model is available:
+
+```console
+$ ollama ps
+```
+
+You should see `qwen3.5:0.8b` in the output.
+
+## Step 3: Create a Provider for Ollama
+
+Create an OpenAI-compatible provider that points at Ollama through `host.openshell.internal`:
+
+```console
+$ openshell provider create \
+    --name ollama \
+    --type openai \
+    --credential OPENAI_API_KEY=empty \
+    --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
+```
+
+This works because OpenShell injects `host.openshell.internal` so sandboxes and the gateway can refer back to the gateway host machine. If that hostname is not the best fit for your environment, you can also use the host's LAN IP.
+
+## Step 4: Configure Local Inference with Ollama
+
+Set the managed inference route for the active gateway:
+
+```console
+$ openshell inference set --provider ollama --model qwen3.5:0.8b
+```
+
+If the command succeeds, OpenShell has verified that the upstream is reachable and accepts the expected OpenAI-compatible request shape.
+
+Confirm the saved config:
+
+```console
+$ openshell inference get
+```
+
+You should see `Provider: ollama` and `Model: qwen3.5:0.8b`.
+
+## Step 5: Verify from Inside a Sandbox
+
+Run a simple request through `https://inference.local`:
+
+```console
+$ openshell sandbox create -- \
+    curl https://inference.local/v1/chat/completions \
+    --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
+```
+
+The response should be JSON from the upstream model. The `model` reported in the response may show the real model resolved by OpenShell.
+
+## Troubleshooting
+
+If setup fails, check these first:
+
+- Ollama is bound to `0.0.0.0`, not only `127.0.0.1`
+- `OPENAI_BASE_URL` uses `http://host.openshell.internal:11434/v1`
+- The gateway and Ollama run on the same machine
+- The configured model exists in Ollama
+- The app calls `https://inference.local`, not `http://inference.local`
+
+Useful commands:
+
+```console
+$ openshell status
+$ openshell inference get
+$ openshell provider get ollama
+```
+
+## Next Steps
+
+- To learn more about managed inference, refer to {doc}`/inference/index`.
+- To configure a different self-hosted backend, refer to {doc}`/inference/configure`.