diff --git a/README.md b/README.md index cbdcc6cb..954923b6 100644 --- a/README.md +++ b/README.md @@ -36,7 +36,7 @@ uv tool install -U openshell ### Create a sandbox ```bash -openshell sandbox create -- claude # or opencode, codex, copilot, ollama +openshell sandbox create -- claude # or opencode, codex, copilot ``` A gateway is created automatically on first use. To deploy on a remote host instead, pass `--remote user@host` to the create command. diff --git a/docs/about/supported-agents.md b/docs/about/supported-agents.md index 309139f0..664156ad 100644 --- a/docs/about/supported-agents.md +++ b/docs/about/supported-agents.md @@ -9,8 +9,8 @@ The following table summarizes the agents that run in OpenShell sandboxes. All a | [Codex](https://developers.openai.com/codex) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | No coverage | Pre-installed. Requires a custom policy with OpenAI endpoints and Codex binary paths. Requires `OPENAI_API_KEY`. | | [GitHub Copilot CLI](https://docs.github.com/en/copilot/github-copilot-in-the-cli) | [`base`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/base) | Full coverage | Pre-installed. Works out of the box. Requires `GITHUB_TOKEN` or `COPILOT_GITHUB_TOKEN`. | | [OpenClaw](https://openclaw.ai/) | [`openclaw`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/openclaw) | Bundled | Agent orchestration layer. Launch with `openshell sandbox create --from openclaw`. | -| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenClaw. Launch with `openshell sandbox create --from ollama`. | +| [Ollama](https://ollama.com/) | [`ollama`](https://github.com/NVIDIA/OpenShell-Community/tree/main/sandboxes/ollama) | Bundled | Run cloud and local models. Includes Claude Code, Codex, and OpenCode. Launch with `openshell sandbox create --from ollama`. | More community agent sandboxes are available in the {doc}`../sandboxes/community-sandboxes` catalog. -For a complete support matrix, refer to the {doc}`../reference/support-matrix` page. \ No newline at end of file +For a complete support matrix, refer to the {doc}`../reference/support-matrix` page. diff --git a/docs/inference/configure.md b/docs/inference/configure.md index a0d17498..4b24f958 100644 --- a/docs/inference/configure.md +++ b/docs/inference/configure.md @@ -81,6 +81,12 @@ $ openshell provider create \ Use `--config OPENAI_BASE_URL` to point to any OpenAI-compatible server running where the gateway runs. For host-backed local inference, use `host.openshell.internal` or the host's LAN IP. Avoid `127.0.0.1` and `localhost`. Set `OPENAI_API_KEY` to a dummy value if the server does not require authentication. +:::{tip} +For a self-contained setup, the Ollama community sandbox bundles Ollama inside the sandbox itself — no host-level provider needed. See {doc}`/tutorials/local-inference-ollama` for details. +::: + +Ollama also supports cloud-hosted models using the `:cloud` tag suffix (e.g., `qwen3.5:cloud`). + :::: ::::{tab-item} Anthropic diff --git a/docs/sandboxes/community-sandboxes.md b/docs/sandboxes/community-sandboxes.md index d2924657..0e3df84b 100644 --- a/docs/sandboxes/community-sandboxes.md +++ b/docs/sandboxes/community-sandboxes.md @@ -43,7 +43,7 @@ The following community sandboxes are available in the catalog. | Sandbox | Description | |---|---| | `base` | Foundational image with system tools and dev environment | -| `ollama` | Ollama with cloud and local model support, Claude Code, Codex, and OpenClaw pre-installed | +| `ollama` | Ollama with cloud and local model support, Claude Code, OpenCode, and Codex pre-installed. Use `ollama launch` inside the sandbox to start coding agents with zero config. | `openclaw` | Open agent manipulation and control | | `sdg` | Synthetic data generation workflows | diff --git a/docs/tutorials/inference-ollama.md b/docs/tutorials/inference-ollama.md new file mode 100644 index 00000000..7bcc3dd4 --- /dev/null +++ b/docs/tutorials/inference-ollama.md @@ -0,0 +1,221 @@ +--- +title: + page: Inference with Ollama + nav: Inference with Ollama +description: Run local and cloud models inside an OpenShell sandbox using the Ollama community sandbox, or route sandbox requests to a host-level Ollama server. +topics: +- Generative AI +- Cybersecurity +tags: +- Tutorial +- Inference Routing +- Ollama +- Local Inference +- Sandbox +content: + type: tutorial + difficulty: technical_intermediate + audience: + - engineer +--- + + + +# Run Local Inference with Ollama + +This tutorial covers two ways to use Ollama with OpenShell: + +1. **Ollama sandbox (recommended)** — a self-contained sandbox with Ollama, Claude Code, and Codex pre-installed. One command to start. +2. **Host-level Ollama** — run Ollama on the gateway host and route sandbox inference to it. Useful when you want a single Ollama instance shared across multiple sandboxes. + +After completing this tutorial, you will know how to: + +- Launch the Ollama community sandbox for a batteries-included experience. +- Use `ollama launch` to start coding agents inside a sandbox. +- Expose a host-level Ollama server to sandboxes through `inference.local`. + +## Prerequisites + +- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding. + +## Option A: Ollama Community Sandbox (Recommended) + +The Ollama community sandbox bundles Ollama, Claude Code, OpenCode, and Codex into a single image. Ollama starts automatically when the sandbox launches. + +### Step 1: Create the Sandbox + +```console +$ openshell sandbox create --from ollama +``` + +This pulls the community sandbox image, applies the bundled policy, and drops you into a shell with Ollama running. + +::: + +### Step 2: Chat with a Model + +Chat with a local model + +```console +$ ollama run qwen3.5 +``` + +Or a cloud model + +```console +$ ollama run kimi-k2.5:cloud +``` + + +Or use `ollama launch` to start a coding agent with Ollama as the model backend: + +```console +$ ollama launch claude +$ ollama launch codex +$ ollama launch opencode +``` + +For CI/CD and automated workflows, `ollama launch` supports a headless mode: + +```console +$ ollama launch claude --yes --model qwen3.5 +``` + +### Model Recommendations + +| Use case | Model | Notes | +|---|---|---| +| Smoke test | `qwen3.5:0.8b` | Fast, lightweight, good for verifying setup | +| Coding and reasoning | `qwen3.5` | Strong tool calling support for agentic workflows | +| Complex tasks | `nemotron-3-super` | 122B parameter model, needs 48GB+ VRAM | +| No local GPU | `qwen3.5:cloud` | Runs on Ollama's cloud infrastructure, no `ollama pull` required | + +:::{note} +Cloud models use the `:cloud` tag suffix and do not require local hardware. + +```console +$ openshell sandbox create --from ollama +``` +::: + +### Tool Calling + +Agentic workflows (Claude Code, Codex, OpenCode) rely on tool calling. The following models have reliable tool calling support: Qwen 3.5, Nemotron-3-Super, GLM-5, and Kimi-K2.5. Check the [Ollama model library](https://ollama.com/library) for the latest models. + +### Updating Ollama + +To update Ollama inside a running sandbox: + +```console +$ update-ollama +``` + +Or auto-update on every sandbox start: + +```console +$ openshell sandbox create --from ollama -e OLLAMA_UPDATE=1 +``` + +## Option B: Host-Level Ollama + +Use this approach when you want a single Ollama instance on the gateway host, shared across multiple sandboxes through `inference.local`. + +:::{note} +This approach uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name. +::: + +### Step 1: Install and Start Ollama + +Install [Ollama](https://ollama.com/) on the gateway host: + +```console +$ curl -fsSL https://ollama.com/install.sh | sh +``` + +Start Ollama on all interfaces so it is reachable from sandboxes: + +```console +$ OLLAMA_HOST=0.0.0.0:11434 ollama serve +``` + +:::{tip} +If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first: + +```console +$ systemctl stop ollama +$ OLLAMA_HOST=0.0.0.0:11434 ollama serve +``` +::: + +### Step 2: Pull a Model + +In a second terminal, pull a model: + +```console +$ ollama run qwen3.5:0.8b +``` + +Type `/bye` to exit the interactive session. The model stays loaded. + +### Step 3: Create a Provider + +Create an OpenAI-compatible provider pointing at the host Ollama: + +```console +$ openshell provider create \ + --name ollama \ + --type openai \ + --credential OPENAI_API_KEY=empty \ + --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1 +``` + +OpenShell injects `host.openshell.internal` so sandboxes and the gateway can reach the host machine. You can also use the host's LAN IP. + +### Step 4: Set Inference Routing + +```console +$ openshell inference set --provider ollama --model qwen3.5:0.8b +``` + +Confirm: + +```console +$ openshell inference get +``` + +### Step 5: Verify from a Sandbox + +```console +$ openshell sandbox create -- \ + curl https://inference.local/v1/chat/completions \ + --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' +``` + +The response should be JSON from the model. + +## Troubleshooting + +Common issues and fixes: + +- **Ollama not reachable from sandbox** — Ollama must be bound to `0.0.0.0`, not `127.0.0.1`. This applies to host-level Ollama only; the community sandbox handles this automatically. +- **`OPENAI_BASE_URL` wrong** — Use `http://host.openshell.internal:11434/v1`, not `localhost` or `127.0.0.1`. +- **Model not found** — Run `ollama ps` to confirm the model is loaded. Run `ollama pull ` if needed. +- **HTTPS vs HTTP** — Code inside sandboxes must call `https://inference.local`, not `http://`. +- **AMD GPU driver issues** — Ollama v0.18+ requires ROCm 7 drivers for AMD GPUs. Update your drivers if you see GPU detection failures. + +Useful commands: + +```console +$ openshell status +$ openshell inference get +$ openshell provider get ollama +``` + +## Next Steps + +- To learn more about managed inference, refer to {doc}`/inference/index`. +- To configure a different self-hosted backend, refer to {doc}`/inference/configure`. +- To explore more community sandboxes, refer to {doc}`/sandboxes/community-sandboxes`. diff --git a/docs/tutorials/local-inference-ollama.md b/docs/tutorials/local-inference-ollama.md deleted file mode 100644 index f8e82273..00000000 --- a/docs/tutorials/local-inference-ollama.md +++ /dev/null @@ -1,156 +0,0 @@ ---- -title: - page: Run Local Inference with Ollama - nav: Local Inference with Ollama -description: Configure inference.local to route sandbox requests to a local Ollama server running on the gateway host. -topics: -- Generative AI -- Cybersecurity -tags: -- Tutorial -- Inference Routing -- Ollama -- Local Inference -- Sandbox -content: - type: tutorial - difficulty: technical_intermediate - audience: - - engineer ---- - - - -# Run Local Inference with Ollama - -This tutorial shows how to route sandbox inference to a model running locally. - -:::{note} -This tutorial uses Ollama because it is easy to install and run locally, but you can substitute other inference engines such as vLLM, SGLang, TRT-LLM, and NVIDIA NIM by changing the startup command, base URL, and model name. -::: - -After completing this tutorial, you will know how to: - -- Expose a local inference server to OpenShell sandboxes. -- Verify end-to-end inference from inside a sandbox. - -## Prerequisites - -- A working OpenShell installation. Complete the {doc}`/get-started/quickstart` before proceeding. - -If your gateway runs on a remote host or in a cloud deployment, Ollama must also run there. Another common scenario is running a model and the gateway on different nodes in the same local network. - -Install [Ollama](https://ollama.com/) with: - -```console -$ curl -fsSL https://ollama.com/install.sh | sh -``` - -## Step 1: Start Ollama on All Interfaces - -By default, Ollama listens only on the loopback address (`127.0.0.1`), which is not reachable from the OpenShell gateway or sandboxes. Start Ollama so it listens on all interfaces: - -```console -$ OLLAMA_HOST=0.0.0.0:11434 ollama serve -``` - -:::{tip} -If you see `Error: listen tcp 0.0.0.0:11434: bind: address already in use`, Ollama is already running as a system service. Stop it first, then start it manually with the correct bind address: - -```console -$ systemctl stop ollama -$ OLLAMA_HOST=0.0.0.0:11434 ollama serve -``` -::: - -## Step 2: Pull a Model - -In a second terminal, pull a lightweight model: - -```console -$ ollama run qwen3.5:0.8b -``` - -This downloads the model and starts an interactive session. Type `/bye` to exit the session. The model stays available for inference after you exit. - -:::{note} -`qwen3.5:0.8b` is a good smoke-test target for verifying your local inference setup, but it is best suited for simple tasks. For more complex coding, reasoning, or agent workflows, use a stronger open model such as Nemotron or another larger open-source model that fits your hardware. -::: - -Confirm the model is available: - -```console -$ ollama ps -``` - -You should see `qwen3.5:0.8b` in the output. - -## Step 3: Create a Provider for Ollama - -Create an OpenAI-compatible provider that points at Ollama through `host.openshell.internal`: - -```console -$ openshell provider create \ - --name ollama \ - --type openai \ - --credential OPENAI_API_KEY=empty \ - --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1 -``` - -This works because OpenShell injects `host.openshell.internal` so sandboxes and the gateway can refer back to the gateway host machine. If that hostname is not the best fit for your environment, you can also use the host's LAN IP. - -## Step 4: Configure Local Inference with Ollama - -Set the managed inference route for the active gateway: - -```console -$ openshell inference set --provider ollama --model qwen3.5:0.8b -``` - -If the command succeeds, OpenShell has verified that the upstream is reachable and accepts the expected OpenAI-compatible request shape. - -Confirm the saved config: - -```console -$ openshell inference get -``` - -You should see `Provider: ollama` and `Model: qwen3.5:0.8b`. - -## Step 5: Verify from Inside a Sandbox - -Run a simple request through `https://inference.local`: - -```console -$ openshell sandbox create -- \ - curl https://inference.local/v1/chat/completions \ - --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}' -``` - -The response should be JSON from the upstream model. The `model` reported in the response may show the real model resolved by OpenShell. - -## Troubleshooting - -If setup fails, check these first: - -- Ollama is bound to `0.0.0.0`, not only `127.0.0.1` -- `OPENAI_BASE_URL` uses `http://host.openshell.internal:11434/v1` -- The gateway and Ollama run on the same machine -- The configured model exists in Ollama -- The app calls `https://inference.local`, not `http://inference.local` - -Useful commands: - -```console -$ openshell status -$ openshell inference get -$ openshell provider get ollama -``` - -## Next Steps - -- To learn more about managed inference, refer to {doc}`/inference/index`. -- To configure a different self-hosted backend, refer to {doc}`/inference/configure`.