NVIDIA · johntmyers · Mar 20, 2026 · Mar 20, 2026
@@ -36,7 +36,7 @@ uv tool install -U openshell
 ### Create a sandbox
 
 ```bash
-openshell sandbox create -- claude  # or opencode, codex, copilot, ollama
+openshell sandbox create -- claude  # or opencode, codex, copilot
 ```
 
 A gateway is created automatically on first use. To deploy on a remote host instead, pass `--remote user@host` to the create command.

@@ -9,8 +9,8 @@ The following table summarizes the agents that run in OpenShell sandboxes. All a
 | [Codex](https://developers.openai.com/codex) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | No coverage | Pre-installed. Requires a custom policy with OpenAI endpoints and Codex binary paths. Requires `OPENAI_API_KEY`. |
 | [GitHub Copilot CLI](https://docs.github.com/en/copilot/github-copilot-in-the-cli) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | Full coverage | Pre-installed. Works out of the box. Requires `GITHUB_TOKEN` or `COPILOT_GITHUB_TOKEN`. |
 | [OpenClaw](https://openclaw.ai/) | [`openclaw`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/openclaw) | Bundled | Agent orchestration layer. Launch with `openshell sandbox create --from openclaw`. |
-| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenClaw. Launch with `openshell sandbox create --from ollama`. |
+| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenCode. Launch with `openshell sandbox create --from ollama`. |
 
 More community agent sandboxes are available in the {doc}`../sandboxes/community-sandboxes` catalog.
 
-For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.
+For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.
@@ -81,6 +81,12 @@ $ openshell provider create \
 
 Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running where the gateway runs. For host-backed local inference, use `host.openshell.internal` or the host's LAN IP. Avoid `127.0.0.1` and `localhost`. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication.
 
+:::{tip}
+For a self-contained setup, the Ollama community sandbox bundles Ollama inside the sandbox itself — no host-level provider needed. See {doc}`/tutorials/local-inference-ollama` for details.
+:::
+
+Ollama also supports cloud-hosted models using the `:cloud` tag suffix (e.g., `qwen3.5:cloud`). 
+
 ::::
 
 ::::{tab-item} Anthropic

@@ -43,7 +43,7 @@ The following community sandboxes are available in the catalog.
 | Sandbox | Description |
 |---|---|
 | `base` | Foundational image with system tools and dev environment |
-| `ollama` | Ollama with cloud and local model support, Claude Code, Codex, and OpenClaw pre-installed |
+| `ollama` | Ollama with cloud and local model support, Claude Code, OpenCode, and Codex pre-installed. Use `ollama launch` inside the sandbox to start coding agents with zero config.
 | `openclaw` | Open agent manipulation and control |
 | `sdg` | Synthetic data generation workflows |
 

@@ -0,0 +1,221 @@
+---
+title:
+  page: Inference with Ollama
+  nav: Inference with Ollama
+description: Run local and cloud models inside an OpenShell sandbox using the Ollama community sandbox, or route sandbox requests to a host-level Ollama server.
+topics:
+- Generative AI
+- Cybersecurity
+tags:
+- Tutorial
+- Inference Routing
+- Ollama
+- Local Inference
+- Sandbox
+content:
+  type: tutorial
+  difficulty: technical_intermediate
+  audience:
+  - engineer
+---
+
+<!--
+  SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+  SPDX-License-Identifier: Apache-2.0
+-->
+
+# Run Local Inference with Ollama
+
+This tutorial covers two ways to use Ollama with OpenShell:
+
+1. **Ollama sandbox (recommended)** — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start.
+2. **Host-level Ollama** — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes.
+
+After completing this tutorial, you will know how to:
+
+- Launch the Ollama community sandbox for a batteries-included experience.
+- Use `ollama launch` to start coding agents inside a sandbox.
+- Expose a host-level Ollama server to sandboxes through `inference.local`.
+
+## Prerequisites
+
+- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding.
+
+## Option A: Ollama Community Sandbox (Recommended)
+
+The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.
+
+### Step 1: Create the Sandbox
+
+```console
+$ openshell sandbox create --from ollama
+```
+
+This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running.
+
+:::
+
+### Step 2: Chat with a Model
+
+Chat with a local model
+
+```console
+$ ollama run qwen3.5
+```
+
+Or a cloud model 
+
+```console
+$ ollama run kimi-k2.5:cloud
+```
+
+
+Or use `ollama launch` to start a coding agent with Ollama as the model backend:
+
+```console
+$ ollama launch claude
+$ ollama launch codex
+$ ollama launch opencode
+```
+
+For CI/CD and automated workflows, `ollama launch` supports a headless mode:
+
+```console
+$ ollama launch claude --yes --model qwen3.5
+```
+
+### Model Recommendations
+
+| Use case | Model | Notes |
+|---|---|---|
+| Smoke test | `qwen3.5:0.8b` | Fast, lightweight, good for verifying setup |
+| Coding and reasoning | `qwen3.5` | Strong tool calling support for agentic workflows |
+| Complex tasks | `nemotron-3-super` | 122B parameter model, needs 48GB+ VRAM |
+| No local GPU | `qwen3.5:cloud` | Runs on Ollama's cloud infrastructure, no `ollama pull` required |
+
+:::{note}
+Cloud models use the `:cloud` tag suffix and do not require local hardware. 
+
+```console
+$ openshell sandbox create --from ollama
+```
+:::
+
+### Tool Calling
+
+Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the [Ollama model library](https://ollama.com/library) for the latest models.
+
+### Updating Ollama
+
+To update Ollama inside a running sandbox:
+
+```console
+$ update-ollama
+```
+
+Or auto-update on every sandbox start:
+
+```console
+$ openshell sandbox create --from ollama -e OLLAMA_UPDATE=1
+```
+
+## Option B: Host-Level Ollama
+
+Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through `inference.local`.
+
+:::{note}
+This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
+:::
+
+### Step 1: Install and Start Ollama
+
+Install [Ollama](https://ollama.com/) on the gateway host:
+
+```console
+$ curl -fsSL https://ollama.com/install.sh | sh
+```
+
+Start Ollama on all interfaces so it is reachable from sandboxes:
+
+```console
+$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
+```
+
+:::{tip}
+If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first:
+
+```console
+$ systemctl stop ollama
+$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
+```
+:::
+
+### Step 2: Pull a Model
+
+In a second terminal, pull a model:
+
+```console
+$ ollama run qwen3.5:0.8b
+```
+
+Type `/bye` to exit the interactive session. The model stays loaded.
+
+### Step 3: Create a Provider
+
+Create an OpenAI-compatible provider pointing at the host Ollama:
+
+```console
+$ openshell provider create \
+    --name ollama \
+    --type openai \
+    --credential OPENAI_API_KEY=empty \
+    --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
+```
+
+OpenShell injects `host.openshell.internal` so sandboxes and the gateway can reach the host machine. You can also use the host's LAN IP.
+
+### Step 4: Set Inference Routing
+
+```console
+$ openshell inference set --provider ollama --model qwen3.5:0.8b
+```
+
+Confirm:
+
+```console
+$ openshell inference get
+```
+
+### Step 5: Verify from a Sandbox
+
+```console
+$ openshell sandbox create -- \
+    curl https://inference.local/v1/chat/completions \
+    --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
+```
+
+The response should be JSON from the model.
+
+## Troubleshooting
+
+Common issues and fixes:
+
+- **Ollama not reachable from sandbox** — Ollama must be bound to `0.0.0.0`, not `127.0.0.1`. This applies to host-level Ollama only; the community sandbox handles this automatically.
+- **`OPENAI_BASE_URL` wrong** — Use `http://host.openshell.internal:11434/v1`, not `localhost` or `127.0.0.1`.
+- **Model not found** — Run `ollama ps` to confirm the model is loaded. Run `ollama pull <model>` if needed.
+- **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`.
+- **AMD GPU driver issues** — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.
+
+Useful commands:
+
+```console
+$ openshell status
+$ openshell inference get
+$ openshell provider get ollama
+```
+
+## Next Steps
+
+- To learn more about managed inference, refer to {doc}`/inference/index`.
+- To configure a different self-hosted backend, refer to {doc}`/inference/configure`.
+- To explore more community sandboxes, refer to {doc}`/sandboxes/community-sandboxes`.