docs(spark): add local Ollama inference setup section by paritoshd-nv · Pull Request #678 · NVIDIA/NemoClaw

paritoshd-nv · 2026-03-23T00:54:11Z

Add step-by-step instructions for setting up local inference with Ollama on DGX Spark, covering NVIDIA runtime verification, Ollama install and model pre-load, OLLAMA_HOST=0.0.0.0 configuration, and sandbox connection with verification.

Fixes #314, #385

Summary

Related Issue

Changes

Type of Change

Code change for a new feature, bug fix, or refactor.
Code change with doc updates.
Doc only. Prose changes without code sample modifications.
Doc only. Includes code sample changes.

Testing

make check passes.
npm test passes.
make docs builds without warnings. (for doc-only changes)

Checklist

General

I have read and followed the contributing guide.
I have read and followed the style guide. (for doc-only changes)

Code Changes

make format applied (TypeScript and Python).
Tests added or updated for new or changed behavior.
No secrets, API keys, or credentials committed.
Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

Follows the style guide. Try running the update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/update-docs catch up the docs for the new changes I made in this PR."
New pages include SPDX license header and frontmatter, if creating a new page.
Cross-references and links verified.

Summary by CodeRabbit

Documentation
- Added a step-by-step guide to set up local GPU inference with Ollama: validate GPU/container runtime, install and verify Ollama, pre-load a large model, configure the service for network access, install client tools, connect to the local HTTPS inference endpoint with verification, and run an agent test using a sample prompt and session.

Add step-by-step instructions for setting up local inference with Ollama on DGX Spark, covering NVIDIA runtime verification, Ollama install and model pre-load, OLLAMA_HOST=0.0.0.0 configuration, and sandbox connection with verification. Fixes NVIDIA#314, NVIDIA#385

coderabbitai · 2026-03-23T00:54:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 00cac537-810f-4ccb-b012-7ece5ad893eb

📥 Commits

Reviewing files that changed from the base of the PR and between 0ff8614 and 8d02c4d.

📒 Files selected for processing (1)

spark-install.md

✅ Files skipped from review due to trivial changes (1)

spark-install.md

📝 Walkthrough

Walkthrough

Added a new "Setup Local Inference (Ollama)" documentation section to spark-install.md that provides an end-to-end local GPU inference setup: validate NVIDIA Docker runtime, install and configure Ollama (including pulling a model and enabling external binding), install OpenShell and NemoClaw, and verify/connect to inference.local.

Changes

Cohort / File(s)	Summary
Documentation — Local inference workflow `docs/spark-install.md`	Added "Setup Local Inference (Ollama)" section (+92 lines). Describes steps to test NVIDIA Docker runtime and configure with `nvidia-ctk`, install and verify Ollama (curl to :11434), pull `nemotron-3-super:120b` and preload, set `OLLAMA_HOST=0.0.0.0` via systemd override, install OpenShell and NemoClaw, and run `nemoclaw ... connect` / `openclaw agent ... --local` with explicit curl checks to `https://inference.local/v1/models`.

Sequence Diagram(s)

sequenceDiagram
  actor User
  participant Host
  participant Docker
  participant Ollama
  participant Systemd
  participant OpenShell
  participant NemoClaw

  User->>Host: run `docker run --gpus all --rm nvidia/cuda:... nvidia-smi`
  Host->>Docker: attempt GPU runtime
  alt GPU runtime missing
    Host->>Host: run `nvidia-ctk` to configure runtime & restart Docker
  end
  User->>Host: install Ollama (official script)
  Host->>Ollama: start service (default localhost:11434)
  User->>Ollama: curl http://127.0.0.1:11434 (verify)
  User->>Ollama: `ollama pull nemotron-3-super:120b` (preload)
  User->>Systemd: create override to set `OLLAMA_HOST=0.0.0.0`
  Systemd->>Ollama: restart service (listen 0.0.0.0:11434)
  User->>OpenShell: install (script) & choose "Local Ollama" + model
  User->>NemoClaw: install (script), run `nemoclaw ... connect`
  NemoClaw->>Host: curl -sf https://inference.local/v1/models (validate routing)
  User->>OpenShell: run `openclaw agent ... --local` (start agent using local inference)

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

"I hopped through docs with eager paws—
tested GPUs and fixed the claws.
Pulled a model, opened the door,
now local inference roams the floor.
— 🐇"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	❓ Inconclusive	The PR adds documentation for local Ollama setup on DGX Spark, addressing issue `#314`'s need to document reliable in-sandbox routing to local Ollama. However, it is purely documentation and does not implement fixes for the underlying 403 Forbidden routing issue itself.	Clarify whether this documentation PR is intended to supplement a code fix for issue `#314`'s routing problem, or if code changes addressing the 403 error are expected in a separate PR.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title concisely and accurately describes the main change: adding local Ollama inference setup documentation to spark-install.md.
Out of Scope Changes check	✅ Passed	All changes are within scope: the PR adds step-by-step setup documentation for local Ollama inference on DGX Spark, which aligns with documenting reliable in-sandbox routing mentioned in issue `#314`.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spark-install.md`:
- Around line 169-177: Add an explicit check for the local inference proxy in
Step 6: after running the "nemoclaw my-assistant connect" command and before
"openclaw agent --agent main --local ...", run a curl GET against
http://inference.local/api/tags, capture the response body to
/tmp/inference_tags.json and assert the HTTP status is 200 so the documentation
verifies the non-403 fallback path is working; reference the existing step
commands ("nemoclaw my-assistant connect" and "openclaw agent --agent main
--local -m ...") so the check is placed inside the sandbox and fails the doc
verification if inference.local returns 403 or non-200.
- Line 157: Replace the netstat-based listener check "sudo netstat -nap | grep
11434" with an ss-based check: update the line that mentions netstat to use ss
to list listening TCP sockets with numeric ports and process info (for example
using ss with listen, tcp, numeric and process flags and filtering for port
11434) so the doc uses the standard iproute2 tool present on Ubuntu 24.04.
- Line 105: Replace the failing runtime verification command that uses the plain
"ubuntu" image; update the Docker command string "docker run --rm
--runtime=nvidia --gpus all ubuntu nvidia-smi" to use an NVIDIA CUDA image (for
example an nvidia/cuda:<tag>-runtime image such as
nvidia/cuda:11.8-runtime-ubuntu20.04) so that nvidia-smi is present in the
container; keep the same flags (--rm --runtime=nvidia --gpus all) and the final
command (nvidia-smi) but swap the image name to a CUDA runtime image.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6ab3c5b3-d63d-4252-9a95-e07ac7cddea3

📥 Commits

Reviewing files that changed from the base of the PR and between ffa1283 and 321962f.

📒 Files selected for processing (1)

spark-install.md

spark-install.md

netstat requires net-tools which is not installed by default on Ubuntu 24.04. ss from iproute2 is available by default and is more reliable for verifying listening sockets. Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>

Add explicit curl to https://inference.local/v1/models inside the sandbox to validate the proxy route before running the agent. This prevents fallback paths from masking regressions in the fix for NVIDIA#314.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

spark-install.md (1)

142-152: Add a hardening note when binding Ollama to 0.0.0.0.

Line 148 intentionally exposes Ollama on all interfaces; add a short warning to restrict network access (trusted LAN only / firewall), since Ollama is typically unauthenticated by default.

Suggested wording

 ### Step 4: Configure Ollama to Listen on All Interfaces

 By default Ollama binds to `127.0.0.1`, which is not reachable from inside the sandbox container. Configure it to listen on all interfaces:
+> Security note: `OLLAMA_HOST=0.0.0.0` exposes Ollama on your network. Restrict access with host firewall rules or trusted-network isolation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 142 - 152, Add a short hardening warning to
Step 4 near the OLLAMA_HOST=0.0.0.0 instruction: note that binding Ollama to
0.0.0.0 exposes the service to all network interfaces and should only be done on
a trusted LAN or behind a firewall, and recommend restricting access via
firewall rules or local network-only interfaces if Ollama is unauthenticated by
default; reference the OLLAMA_HOST=0.0.0.0 override.conf instruction so readers
know where to apply the caution.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spark-install.md`:
- Around line 176-181: Update the probe command that currently reads `curl -s
https://inference.local/v1/models` so it fails fast on non-2xx responses;
replace it with a curl invocation that returns non-zero on non-success (for
example `curl -sSf https://inference.local/v1/models`) or explicitly assert HTTP
200 (for example `curl -s -o /dev/null -w '%{http_code}'
https://inference.local/v1/models | grep -q '^200$'`) so the step gates success
when `inference.local` does not return 200.

---

Nitpick comments:
In `@spark-install.md`:
- Around line 142-152: Add a short hardening warning to Step 4 near the
OLLAMA_HOST=0.0.0.0 instruction: note that binding Ollama to 0.0.0.0 exposes the
service to all network interfaces and should only be done on a trusted LAN or
behind a firewall, and recommend restricting access via firewall rules or local
network-only interfaces if Ollama is unauthenticated by default; reference the
OLLAMA_HOST=0.0.0.0 override.conf instruction so readers know where to apply the
caution.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3569242d-fbcd-4439-8540-752cb0c458c5

📥 Commits

Reviewing files that changed from the base of the PR and between 321962f and a9dbc13.

📒 Files selected for processing (1)

spark-install.md

spark-install.md

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

spark-install.md (3)
157-157: Add sudo to the ss command for complete process information.

The -p flag requires elevated privileges to display process information. While the command will work without sudo, it won't show the full process details that help verify Ollama is the service listening on port 11434.
📝 Suggested fix
-ss -tlnp | grep 11434
+sudo ss -tlnp | grep 11434
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` at line 157, Update the command that checks listeners to
run with elevated privileges so process info is shown; change the existing "ss
-tlnp | grep 11434" invocation to run under sudo (i.e., prefix with sudo) so the
-p flag can return full process details and confirm Ollama is the process on
port 11434.
138-140: Consider providing a non-interactive alternative for model preloading.

The current step requires users to manually type /bye to exit, which breaks automation. Consider adding a note about a non-interactive approach or mention that this step is optional (the model will be loaded on first actual use).
📝 Suggested documentation improvement
 Run it briefly to pre-load weights into unified memory, then exit:
 
 ```bash
 ollama run nemotron-3-super:120b
 # type /bye to exit
+> Note: This step is optional. The model will be loaded automatically on first use, but pre-loading can reduce initial inference latency.
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @spark-install.md around lines 138 - 140, Update the "ollama run
nemotron-3-super:120b" step to note that it is optional and that the model will
be loaded on first use, and add a short non-interactive alternative so
automation isn't blocked; reference the interactive shutdown token "/bye" and
describe using a one-shot or piped/timeout-based invocation as the
non-interactive approach and include a brief example sentence explaining reduced
latency from preloading.
</details>

---

`173-173`: **Clarify the sandbox name reference.**

The command references `my-assistant` as the sandbox name, but this name isn't defined in the "Setup Local Inference (Ollama)" section. Consider adding a note that this is the default sandbox name created during onboarding, or reference where users should have created this sandbox.


<details>
<summary>📝 Suggested clarification</summary>

```diff
+Connect to your sandbox (the default name is `my-assistant` if created during onboarding):
+
 ```bash
 # Connect to the sandbox
 nemoclaw my-assistant connect
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against the current code and only fix it if needed.

In @spark-install.md at line 173, Clarify that the sandbox name "my-assistant"
used in the command "nemoclaw my-assistant connect" is the default sandbox
created during onboarding (or point to where users should create it). Update the
"Setup Local Inference (Ollama)" section to either mention that onboarding
creates a sandbox named "my-assistant" or add a brief note/instruction telling
users how to create/choose a sandbox before running "nemoclaw my-assistant
connect" so the reference is explicit and not ambiguous.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @spark-install.md:

Line 164: Update the curl command string so the NemoClaw install URL matches
other docs by adding the "www." prefix; replace the existing
"https://nvidia.com/nemoclaw.sh" occurrence in the curl invocation (the line
containing "curl -fsSL https://nvidia.com/nemoclaw.sh | bash") with
"https://www.nvidia.com/nemoclaw.sh" so the URL pattern is consistent with
README.md and docs/index.md.

Nitpick comments:
In @spark-install.md:

Line 157: Update the command that checks listeners to run with elevated
privileges so process info is shown; change the existing "ss -tlnp | grep 11434"
invocation to run under sudo (i.e., prefix with sudo) so the -p flag can return
full process details and confirm Ollama is the process on port 11434.

Around line 138-140: Update the "ollama run nemotron-3-super:120b" step to
note that it is optional and that the model will be loaded on first use, and add
a short non-interactive alternative so automation isn't blocked; reference the
interactive shutdown token "/bye" and describe using a one-shot or
piped/timeout-based invocation as the non-interactive approach and include a
brief example sentence explaining reduced latency from preloading.

Line 173: Clarify that the sandbox name "my-assistant" used in the command
"nemoclaw my-assistant connect" is the default sandbox created during onboarding
(or point to where users should create it). Update the "Setup Local Inference
(Ollama)" section to either mention that onboarding creates a sandbox named
"my-assistant" or add a brief note/instruction telling users how to
create/choose a sandbox before running "nemoclaw my-assistant connect" so the
reference is explicit and not ambiguous.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Path: .coderabbit.yaml

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `216e11da-1053-4127-b4a3-815b93174dc0`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between a9dbc13e8855c50e06595bbd6295ee5102983a7f and 909f98e94bcddb60745c43e0238c5a5ed9161004.

</details>

<details>
<summary>📒 Files selected for processing (1)</summary>

* `spark-install.md`

</details>

</details>

spark-install.md

Use curl -sf so the check exits non-zero on HTTP errors (403, 503, etc.), preventing a silent 403 from masking a proxy routing regression. Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

spark-install.md Show resolved Hide resolved

spark-install.md Outdated Show resolved Hide resolved

spark-install.md Show resolved Hide resolved

paritoshd-nv added 2 commits March 23, 2026 01:23

docs(spark): add direct inference.local check in Step 6

a9dbc13

Add explicit curl to https://inference.local/v1/models inside the sandbox to validate the proxy route before running the agent. This prevents fallback paths from masking regressions in the fix for NVIDIA#314.

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

spark-install.md Show resolved Hide resolved

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

spark-install.md Outdated Show resolved Hide resolved

paritoshd-nv force-pushed the docs/spark-local-ollama-setup branch from 909f98e to 0ff8614 Compare March 23, 2026 01:53

docs(spark): fail fast on non-200 from inference.local probe

8d02c4d

Use curl -sf so the check exits non-zero on HTTP errors (403, 503, etc.), preventing a silent 403 from masking a proxy routing regression. Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>

paritoshd-nv force-pushed the docs/spark-local-ollama-setup branch from 0ff8614 to 8d02c4d Compare March 23, 2026 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(spark): add local Ollama inference setup section#678

docs(spark): add local Ollama inference setup section#678
paritoshd-nv wants to merge 4 commits intoNVIDIA:mainfrom
paritoshd-nv:docs/spark-local-ollama-setup

paritoshd-nv commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paritoshd-nv commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Testing

Checklist

General

Code Changes

Doc Changes

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

paritoshd-nv commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading