Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ uv tool install -U openshell
### Create a sandbox

```bash
openshell sandbox create -- claude # or opencode, codex, copilot, ollama
openshell sandbox create -- claude # or opencode, codex, copilot
```

A gateway is created automatically on first use. To deploy on a remote host instead, pass `--remote user@host` to the create command.
Expand Down
4 changes: 2 additions & 2 deletions docs/about/supported-agents.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ The following table summarizes the agents that run in OpenShell sandboxes. All a
| [Codex](https://developers.openai.com/codex) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | No coverage | Pre-installed. Requires a custom policy with OpenAI endpoints and Codex binary paths. Requires `OPENAI_API_KEY`. |
| [GitHub Copilot CLI](https://docs.github.com/en/copilot/github-copilot-in-the-cli) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | Full coverage | Pre-installed. Works out of the box. Requires `GITHUB_TOKEN` or `COPILOT_GITHUB_TOKEN`. |
| [OpenClaw](https://openclaw.ai/) | [`openclaw`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/openclaw) | Bundled | Agent orchestration layer. Launch with `openshell sandbox create --from openclaw`. |
| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenClaw. Launch with `openshell sandbox create --from ollama`. |
| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenCode. Launch with `openshell sandbox create --from ollama`. |

More community agent sandboxes are available in the {doc}`../sandboxes/community-sandboxes` catalog.

For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.
For a complete support matrix, refer to the {doc}`../reference/support-matrix` page.
6 changes: 6 additions & 0 deletions docs/inference/configure.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,12 @@ $ openshell provider create \

Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running where the gateway runs. For host-backed local inference, use `host.openshell.internal` or the host's LAN IP. Avoid `127.0.0.1` and `localhost`. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication.

:::{tip}
For a self-contained setup, the Ollama community sandbox bundles Ollama inside the sandbox itself — no host-level provider needed. See {doc}`/tutorials/local-inference-ollama` for details.
:::

Ollama also supports cloud-hosted models using the `:cloud` tag suffix (e.g., `qwen3.5:cloud`).

::::

::::{tab-item} Anthropic
Expand Down
2 changes: 1 addition & 1 deletion docs/sandboxes/community-sandboxes.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The following community sandboxes are available in the catalog.
| Sandbox | Description |
|---|---|
| `base` | Foundational image with system tools and dev environment |
| `ollama` | Ollama with cloud and local model support, Claude Code, Codex, and OpenClaw pre-installed |
| `ollama` | Ollama with cloud and local model support, Claude Code, OpenCode, and Codex pre-installed. Use `ollama launch` inside the sandbox to start coding agents with zero config.
| `openclaw` | Open agent manipulation and control |
| `sdg` | Synthetic data generation workflows |

Expand Down
221 changes: 221 additions & 0 deletions docs/tutorials/inference-ollama.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
---
title:
page: Inference with Ollama
nav: Inference with Ollama
description: Run local and cloud models inside an OpenShell sandbox using the Ollama community sandbox, or route sandbox requests to a host-level Ollama server.
topics:
- Generative AI
- Cybersecurity
tags:
- Tutorial
- Inference Routing
- Ollama
- Local Inference
- Sandbox
content:
type: tutorial
difficulty: technical_intermediate
audience:
- engineer
---

<!--
SPDX-FileCopyrightText: Copyright (c) 2025-2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
-->

# Run Local Inference with Ollama

This tutorial covers two ways to use Ollama with OpenShell:

1. **Ollama sandbox (recommended)** — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start.
2. **Host-level Ollama** — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes.

After completing this tutorial, you will know how to:

- Launch the Ollama community sandbox for a batteries-included experience.
- Use `ollama launch` to start coding agents inside a sandbox.
- Expose a host-level Ollama server to sandboxes through `inference.local`.

## Prerequisites

- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding.

## Option A: Ollama Community Sandbox (Recommended)

The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches.

### Step 1: Create the Sandbox

```console
$ openshell sandbox create --from ollama
```

This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running.

:::

### Step 2: Chat with a Model

Chat with a local model

```console
$ ollama run qwen3.5
```

Or a cloud model

```console
$ ollama run kimi-k2.5:cloud
```


Or use `ollama launch` to start a coding agent with Ollama as the model backend:

```console
$ ollama launch claude
$ ollama launch codex
$ ollama launch opencode
```

For CI/CD and automated workflows, `ollama launch` supports a headless mode:

```console
$ ollama launch claude --yes --model qwen3.5
```

### Model Recommendations

| Use case | Model | Notes |
|---|---|---|
| Smoke test | `qwen3.5:0.8b` | Fast, lightweight, good for verifying setup |
| Coding and reasoning | `qwen3.5` | Strong tool calling support for agentic workflows |
| Complex tasks | `nemotron-3-super` | 122B parameter model, needs 48GB+ VRAM |
| No local GPU | `qwen3.5:cloud` | Runs on Ollama's cloud infrastructure, no `ollama pull` required |

:::{note}
Cloud models use the `:cloud` tag suffix and do not require local hardware.

```console
$ openshell sandbox create --from ollama
```
:::

### Tool Calling

Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the [Ollama model library](https://ollama.com/library) for the latest models.

### Updating Ollama

To update Ollama inside a running sandbox:

```console
$ update-ollama
```

Or auto-update on every sandbox start:

```console
$ openshell sandbox create --from ollama -e OLLAMA_UPDATE=1
```

## Option B: Host-Level Ollama

Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through `inference.local`.

:::{note}
This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name.
:::

### Step 1: Install and Start Ollama

Install [Ollama](https://ollama.com/) on the gateway host:

```console
$ curl -fsSL https://ollama.com/install.sh | sh
```

Start Ollama on all interfaces so it is reachable from sandboxes:

```console
$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
```

:::{tip}
If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first:

```console
$ systemctl stop ollama
$ OLLAMA_HOST=0.0.0.0:11434 ollama serve
```
:::

### Step 2: Pull a Model

In a second terminal, pull a model:

```console
$ ollama run qwen3.5:0.8b
```

Type `/bye` to exit the interactive session. The model stays loaded.

### Step 3: Create a Provider

Create an OpenAI-compatible provider pointing at the host Ollama:

```console
$ openshell provider create \
--name ollama \
--type openai \
--credential OPENAI_API_KEY=empty \
--config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
```

OpenShell injects `host.openshell.internal` so sandboxes and the gateway can reach the host machine. You can also use the host's LAN IP.

### Step 4: Set Inference Routing

```console
$ openshell inference set --provider ollama --model qwen3.5:0.8b
```

Confirm:

```console
$ openshell inference get
```

### Step 5: Verify from a Sandbox

```console
$ openshell sandbox create -- \
curl https://inference.local/v1/chat/completions \
--json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
```

The response should be JSON from the model.

## Troubleshooting

Common issues and fixes:

- **Ollama not reachable from sandbox** — Ollama must be bound to `0.0.0.0`, not `127.0.0.1`. This applies to host-level Ollama only; the community sandbox handles this automatically.
- **`OPENAI_BASE_URL` wrong** — Use `http://host.openshell.internal:11434/v1`, not `localhost` or `127.0.0.1`.
- **Model not found** — Run `ollama ps` to confirm the model is loaded. Run `ollama pull <model>` if needed.
- **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`.
- **AMD GPU driver issues** — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures.

Useful commands:

```console
$ openshell status
$ openshell inference get
$ openshell provider get ollama
```

## Next Steps

- To learn more about managed inference, refer to {doc}`/inference/index`.
- To configure a different self-hosted backend, refer to {doc}`/inference/configure`.
- To explore more community sandboxes, refer to {doc}`/sandboxes/community-sandboxes`.
Loading
Loading