Conversation
Add 35 new tests across 4 test files covering critical paths that previously had zero test coverage: - onboard/validate: API key validation (success, HTTP errors, timeouts, network failures) and key masking - onboard/config: save/load round-trip, missing file handling, config clearing - commands/logs: sandbox state detection, spawn argument construction, --follow/--lines/--run-id flags, error handling - commands/connect: spawn arguments, openshell errors, exit codes All tests follow the existing status.test.ts conventions (vitest, vi.mock, captureLogger). Total suite: 57 tests (22 existing + 35 new). No source files modified.
Address code review feedback:
- Extract captureLogger and mockSpawnProc into src/test-helpers.ts
to eliminate duplication between connect.test.ts and logs.test.ts
- Replace realistic API key fixture ('nvapi-...') in validate.test.ts
with 'test-api-key-placeholder' to avoid secret scanner triggers
Address second CodeRabbit review: spawn errors resolve with exit
code 1, which triggers the follow-up 'exited with code' guidance.
Tests now assert both the error message and the exit guidance.
Also explain why mockExec's callback shape is correct — Node's
promisify(exec) resolves with { stdout, stderr }, matching the
existing status.test.ts pattern.
…tibility On cgroup v2 hosts (Ubuntu 24.04, DGX Spark), k3s-in-Docker requires cgroupns=host in the Docker daemon config. Without this, kubelet fails to initialize, the 'openshell' namespace is never created, and gateway startup times out with: 'timed out waiting for namespace openshell to exist' This was previously caught by a preflight check (NVIDIA#62) that was removed in NVIDIA#248. This commit restores the check in the onboard preflight: - Detects cgroup v2 by reading /proc/mounts for 'cgroup2' - Checks /etc/docker/daemon.json for 'default-cgroupns-mode: host' - On mismatch, exits with a clear error and remediation steps (run 'sudo nemoclaw setup-spark' or manually configure daemon.json) - Shows a success message when cgroup v2 is properly configured Also improves GPU display for DGX Spark: shows 'unified memory' instead of 'VRAM' when the Spark GPU is detected (since GB10 uses unified memory architecture where nvidia-smi cannot report dedicated VRAM). Fixes NVIDIA#280
📝 WalkthroughWalkthroughThe changes introduce Docker cgroup v2 compatibility detection within the onboarding preflight checks, extend GPU detection messaging to handle DGX Spark systems, and add comprehensive test coverage for the CLI connect and logs commands alongside configuration and validation functionality testing. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
nemoclaw/src/onboard/validate.test.ts (1)
54-58: Make theheadersmatcher less brittle.
headersis currently asserted as an exact object value. If request options later include extra headers, this test can fail for a non-functional change. Prefer nested partial matching.♻️ Proposed change
expect(mockFetch).toHaveBeenCalledWith( "https://integrate.api.nvidia.com/v1/models", expect.objectContaining({ - headers: { Authorization: `Bearer ${apiKey}` }, + headers: expect.objectContaining({ + Authorization: `Bearer ${apiKey}`, + }), }), );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw/src/onboard/validate.test.ts` around lines 54 - 58, The test currently asserts headers as an exact object which is brittle; update the assertion in the expect(mockFetch).toHaveBeenCalledWith call to use nested partial matching so only the Authorization header is required — replace the headers value with expect.objectContaining({ Authorization: `Bearer ${apiKey}` }) (i.e., use expect.objectContaining for the overall options and for the headers field) so additional headers won't break the test while still verifying Authorization; locate the call referencing mockFetch and adjust the matcher accordingly.nemoclaw/src/commands/logs.test.ts (1)
260-264: Strengthen sandbox-resolution verification by asserting the spawn target too.This test currently verifies
execcontainsmy-sandbox, but it can still miss regressions wherespawn("openshell", ["sandbox","connect", ...])uses the default sandbox.Suggested assertion addition
expect(exec).toHaveBeenCalledWith( expect.stringContaining("my-sandbox"), expect.anything(), expect.anything(), ); + expect(spawn).toHaveBeenCalledWith( + "openshell", + expect.arrayContaining(["sandbox", "connect", "my-sandbox"]), + expect.anything(), + );🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw/src/commands/logs.test.ts` around lines 260 - 264, The test currently only checks exec was called with a string containing "my-sandbox"; also assert the spawn target/args to ensure the correct sandbox is passed: add an assertion on the exec call arguments (the array/args parameter passed to the spawn-like call) to include "sandbox", "connect", and "my-sandbox" (e.g., using expect.arrayContaining on the second argument to exec), referencing the existing exec call and the expected spawn invocation ("openshell", ["sandbox","connect", ...]) to catch regressions that use the default sandbox.nemoclaw/src/test-helpers.ts (1)
11-11: Update the docstring to match actual logger capture behavior.The helper captures
info,warn, anderror(not onlyinfo). A quick wording tweak will avoid confusion.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@nemoclaw/src/test-helpers.ts` at line 11, Docstring mismatch: the comment "Create a logger that captures all info() calls into an array." is inaccurate because the test logger captures info(), warn(), and error(); update that docstring in nemoclaw/src/test-helpers.ts to explicitly state it captures info(), warn(), and error() calls into the array (i.e., replace "info() calls" with "info(), warn(), and error() calls") so the comment matches the actual behavior of the test logger helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@bin/lib/onboard.js`:
- Around line 45-52: The current logic infers isCgroupV2 by reading /proc/mounts
via mountInfo and readFileSync("/proc/mounts"), which gives false positives on
hybrid hosts; replace that heuristic in the isCgroupV2 determination with
Docker's reported cgroup mode by invoking docker info --format
'{{.CgroupVersion}}' and treating "2" as v2, and additionally inspect daemon
config files (/etc/docker/daemon.json and ~/.config/docker/daemon.json) for
cgroupns/cgroup_version settings to respect explicit daemon config; update the
code paths that set isCgroupV2 and the returned needsFix flag (symbols:
isCgroupV2, the block that currently reads mountInfo and the surrounding
try/catch) to use Docker's output and daemon.json checks and preserve existing
error handling and fallback behavior.
---
Nitpick comments:
In `@nemoclaw/src/commands/logs.test.ts`:
- Around line 260-264: The test currently only checks exec was called with a
string containing "my-sandbox"; also assert the spawn target/args to ensure the
correct sandbox is passed: add an assertion on the exec call arguments (the
array/args parameter passed to the spawn-like call) to include "sandbox",
"connect", and "my-sandbox" (e.g., using expect.arrayContaining on the second
argument to exec), referencing the existing exec call and the expected spawn
invocation ("openshell", ["sandbox","connect", ...]) to catch regressions that
use the default sandbox.
In `@nemoclaw/src/onboard/validate.test.ts`:
- Around line 54-58: The test currently asserts headers as an exact object which
is brittle; update the assertion in the expect(mockFetch).toHaveBeenCalledWith
call to use nested partial matching so only the Authorization header is required
— replace the headers value with expect.objectContaining({ Authorization:
`Bearer ${apiKey}` }) (i.e., use expect.objectContaining for the overall options
and for the headers field) so additional headers won't break the test while
still verifying Authorization; locate the call referencing mockFetch and adjust
the matcher accordingly.
In `@nemoclaw/src/test-helpers.ts`:
- Line 11: Docstring mismatch: the comment "Create a logger that captures all
info() calls into an array." is inaccurate because the test logger captures
info(), warn(), and error(); update that docstring in
nemoclaw/src/test-helpers.ts to explicitly state it captures info(), warn(), and
error() calls into the array (i.e., replace "info() calls" with "info(), warn(),
and error() calls") so the comment matches the actual behavior of the test
logger helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: add686c4-8bfc-41a1-bf82-615e9ad24081
📒 Files selected for processing (6)
bin/lib/onboard.jsnemoclaw/src/commands/connect.test.tsnemoclaw/src/commands/logs.test.tsnemoclaw/src/onboard/config.test.tsnemoclaw/src/onboard/validate.test.tsnemoclaw/src/test-helpers.ts
| const mountInfo = fs.readFileSync("/proc/mounts", "utf-8"); | ||
| // cgroup v2 mounts as "cgroup2" type on /sys/fs/cgroup | ||
| isCgroupV2 = mountInfo.includes("cgroup2"); | ||
| } catch { | ||
| return { needsFix: false, isCgroupV2: false }; | ||
| } | ||
|
|
||
| if (!isCgroupV2) return { needsFix: false, isCgroupV2: false }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Docker-reported cgroup version:"
docker info --format '{{.CgroupVersion}}' || echo "docker_info_unavailable"
echo
echo "cgroup2 entries in /proc/mounts:"
grep -n 'cgroup2' /proc/mounts || true
echo
echo "default-cgroupns-mode in common daemon configs:"
for f in /etc/docker/daemon.json "$HOME/.config/docker/daemon.json"; do
if [ -r "$f" ]; then
val=$(jq -r '."default-cgroupns-mode" // "unset"' "$f" 2>/dev/null || echo "parse_error")
echo "$f -> $val"
else
echo "$f -> missing_or_unreadable"
fi
doneRepository: NVIDIA/NemoClaw
Length of output: 375
🏁 Script executed:
# First, locate and examine the onboard.js file
fd -t f "onboard.js" --exec cat -n {} \;Repository: NVIDIA/NemoClaw
Length of output: 50371
🏁 Script executed:
# Find and examine runner.js
fd -t f "runner.js" --exec head -100 {} \;Repository: NVIDIA/NemoClaw
Length of output: 1576
🏁 Script executed:
# Search for docker info usage
rg "docker info" --type js --type ts -B 2 -A 2Repository: NVIDIA/NemoClaw
Length of output: 274
Use Docker-reported cgroup mode to avoid false preflight failures.
Line 47 infers cgroup v2 from /proc/mounts (includes("cgroup2")), which can be true on hybrid hosts even when Docker is not using cgroup v2. This causes a false-positive needsFix=true that incorrectly blocks onboarding on systems like Ubuntu 24.04 or DGX Spark where the kernel supports cgroup v2 but Docker is configured to use v1.
Replace the /proc/mounts heuristic with docker info --format '{{.CgroupVersion}}' to detect Docker's actual cgroup mode. Also check both /etc/docker/daemon.json and ~/.config/docker/daemon.json for the cgroupns setting, as daemon configs can be in multiple standard locations.
Suggested patch
function checkDockerCgroupCompat() {
if (process.platform !== "linux") return { needsFix: false, isCgroupV2: false };
- // Check if cgroup v2 (unified hierarchy)
- let isCgroupV2 = false;
- try {
- const mountInfo = fs.readFileSync("/proc/mounts", "utf-8");
- // cgroup v2 mounts as "cgroup2" type on /sys/fs/cgroup
- isCgroupV2 = mountInfo.includes("cgroup2");
- } catch {
- return { needsFix: false, isCgroupV2: false };
- }
+ // Prefer Docker's actual runtime cgroup mode over host mount heuristics.
+ const cgroupVersion = runCapture(`docker info --format '{{.CgroupVersion}}'`, {
+ ignoreError: true,
+ }).trim();
+ const isCgroupV2 = cgroupVersion === "2";
if (!isCgroupV2) return { needsFix: false, isCgroupV2: false };
// Check Docker daemon config for default-cgroupns-mode
try {
- // Check daemon.json directly for the cgroupns setting
let cgroupnsHost = false;
- try {
- const daemonJson = JSON.parse(fs.readFileSync("/etc/docker/daemon.json", "utf-8"));
- cgroupnsHost = daemonJson["default-cgroupns-mode"] === "host";
- } catch {
- // No daemon.json or parse error — not configured
- }
+ const daemonConfigPaths = ["/etc/docker/daemon.json"];
+ if (process.env.HOME) {
+ daemonConfigPaths.push(path.join(process.env.HOME, ".config", "docker", "daemon.json"));
+ }
+ for (const configPath of daemonConfigPaths) {
+ try {
+ const daemonJson = JSON.parse(fs.readFileSync(configPath, "utf-8"));
+ cgroupnsHost = daemonJson["default-cgroupns-mode"] === "host";
+ if (cgroupnsHost) break;
+ } catch {
+ // Try next known daemon config location
+ }
+ }
if (cgroupnsHost) return { needsFix: false, isCgroupV2: true };
// Cgroup v2 without cgroupns=host — this will break k3s-in-Docker
return { needsFix: true, isCgroupV2: true };Also applies to: 59-60
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@bin/lib/onboard.js` around lines 45 - 52, The current logic infers isCgroupV2
by reading /proc/mounts via mountInfo and readFileSync("/proc/mounts"), which
gives false positives on hybrid hosts; replace that heuristic in the isCgroupV2
determination with Docker's reported cgroup mode by invoking docker info
--format '{{.CgroupVersion}}' and treating "2" as v2, and additionally inspect
daemon config files (/etc/docker/daemon.json and ~/.config/docker/daemon.json)
for cgroupns/cgroup_version settings to respect explicit daemon config; update
the code paths that set isCgroupV2 and the returned needsFix flag (symbols:
isCgroupV2, the block that currently reads mountInfo and the surrounding
try/catch) to use Docker's output and daemon.json checks and preserve existing
error handling and fallback behavior.
|
Closing — this issue was already resolved by #248 (merged). The cgroup v2 preflight check has been removed entirely, making this fix redundant. |
Fixes #280
Automated PR via GoGetAJob
Summary by CodeRabbit
New Features
Tests