Skip to content

fix(security): pin SSH host keys via trust-on-first-use instead of StrictHostKeyChecking=no#691

Open
ericksoa wants to merge 4 commits intomainfrom
fix/ssh-host-key-tofu
Open

fix(security): pin SSH host keys via trust-on-first-use instead of StrictHostKeyChecking=no#691
ericksoa wants to merge 4 commits intomainfrom
fix/ssh-host-key-tofu

Conversation

@ericksoa
Copy link
Contributor

@ericksoa ericksoa commented Mar 23, 2026

Summary

All 7 SSH/SCP/rsync calls in the Brev deploy() function used StrictHostKeyChecking=no, enabling MITM attacks that could capture .env files containing NVIDIA_API_KEY, GITHUB_TOKEN, and messaging bot tokens during SCP transfer.

Fix: trust-on-first-use (TOFU) — accept the host key on the initial connectivity probe (unavoidable for fresh VMs), then immediately pin it via ssh-keyscan and enforce StrictHostKeyChecking=yes for all subsequent connections including the SCP of secrets.

  • Initial probe: StrictHostKeyChecking=no (only runs echo ok, no secrets transferred)
  • ssh-keyscan -H captures host key → temp known_hosts file (mode 0600)
  • 6 remaining calls: StrictHostKeyChecking=yes with pinned known_hosts
  • try/finally cleanup of temp directory on both success and failure

Test plan

  • 219/219 tests pass
  • Manual: nemoclaw deploy <instance> completes successfully with Brev VM
  • Manual: verify subsequent SSH calls fail if host key changes mid-deploy

Summary by CodeRabbit

  • Bug Fixes
    • Deployments now pin remote host keys after initial SSH reachability and enforce strict host-key verification for subsequent connections, reducing MITM risk.
    • Temporary trust files and environment upload artifacts are reliably removed on success, failure, or timeout, improving cleanup and deployment robustness.

…rictHostKeyChecking=no

Addresses PSIRT bug 6002727 (HIGH-7, CWE-295). The deploy() function
had 7 uses of StrictHostKeyChecking=no for SSH/SCP/rsync connections
to Brev remote VMs, disabling host key verification and enabling MITM
attacks that could capture .env files containing NVIDIA_API_KEY,
GITHUB_TOKEN, and TELEGRAM_BOT_TOKEN during SCP transfer.

Fix: after the initial SSH connectivity probe succeeds, immediately
capture the host key via ssh-keyscan and pin it to a temporary
known_hosts file. All subsequent SSH/SCP/rsync calls use
StrictHostKeyChecking=yes with the pinned known_hosts, rejecting any
host key change. The temp directory is cleaned up in a finally block.
@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7f83f849-5f08-4b0b-8926-7ac95524cd1a

📥 Commits

Reviewing files that changed from the base of the PR and between ed3167b and ecb7509.

📒 Files selected for processing (1)
  • bin/nemoclaw.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • bin/nemoclaw.js

📝 Walkthrough

Walkthrough

Refactors bin/nemoclaw.js deploy flow to implement SSH Trust-On-First-Use: initial probe allows unknown host keys, then ssh-keyscan pins the host key to a temporary known_hosts, switches subsequent ssh/scp/rsync to StrictHostKeyChecking=yes with UserKnownHostsFile pointing to that file, and ensures cleanup in finally blocks. (47 words)

Changes

Cohort / File(s) Summary
SSH TOFU Deployment Security
bin/nemoclaw.js
Adds temporary khDir/known_hosts creation and host-key pinning via ssh -G + ssh-keyscan; initial readiness probe still uses StrictHostKeyChecking=no, then switches all ssh, rsync, and scp calls to an sshOpts using StrictHostKeyChecking=yes and UserKnownHostsFile pointing to the generated file. Restructures control flow: main try contains host-key pinning, syncing, .env creation/upload and remote setup/start; nested try/finally handles .env temp cleanup; outer finally removes khDir. Ensures khDir is removed on SSH timeout.

Sequence Diagram

sequenceDiagram
    actor User
    participant Script as bin/nemoclaw.js
    participant VM as Remote VM
    participant LocalFS as Local FS
    participant SSH as SSH

    User->>Script: deploy(instanceName)
    Script->>VM: initial SSH readiness probe (allow unknown host key)
    Script->>LocalFS: create khDir / temp known_hosts
    Script->>SSH: run `ssh -G name` to resolve hostname
    Script->>SSH: ssh-keyscan -> retrieve host key
    SSH-->>LocalFS: write host key into temp known_hosts
    Script->>Script: set sshOpts (StrictHostKeyChecking=yes, UserKnownHostsFile=khDir)
    
    rect rgba(76,175,80,0.5)
    Script->>VM: SSH / rsync / scp using pinned host key (sshOpts)
    Script->>VM: upload .env (within try, cleaned in nested finally)
    Script->>VM: run remote setup and start commands
    end

    alt finish or error
        Script->>LocalFS: remove khDir (outer finally)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I hopped to scan the host key bright,
Pinned it down in a temp-known light.
Finally blocks tidy up my trail,
No blind SSH trust to derail.
Deployment hops along just right.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The pull request title clearly and accurately describes the main security fix: switching from StrictHostKeyChecking=no to a trust-on-first-use approach with pinned SSH host keys.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/ssh-host-key-tofu

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
bin/nemoclaw.js (1)

160-160: Use OpenSSH standard directives to prevent probe side-effects on known_hosts.

Line 160 uses StrictHostKeyChecking=no, which still automatically adds new host keys to the user known-hosts file by default. For a side-effect-free probe, use the documented OpenSSH directives UserKnownHostsFile=none and GlobalKnownHostsFile=none:

🔒 Probe hardening suggestion
-      execFileSync("ssh", ["-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no", name, "echo", "ok"], { encoding: "utf-8", stdio: "ignore" });
+      execFileSync("ssh", [
+        "-o", "ConnectTimeout=5",
+        "-o", "StrictHostKeyChecking=no",
+        "-o", "UserKnownHostsFile=none",
+        "-o", "GlobalKnownHostsFile=none",
+        name, "echo", "ok",
+      ], { encoding: "utf-8", stdio: "ignore" });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/nemoclaw.js` at line 160, The SSH probe currently invoked via
execFileSync(...) uses "-o", "StrictHostKeyChecking=no" which still writes new
host keys into known_hosts; modify the argument list passed to the execFileSync
call (the ssh invocation) to replace or augment that option with the OpenSSH
directives "-o", "UserKnownHostsFile=/dev/null" and "-o",
"GlobalKnownHostsFile=/dev/null" (or "none" on platforms that support it)
alongside the existing "-o", "ConnectTimeout=5" so the probe does not alter
known_hosts and retains the timeout/behavior of the original execFileSync("ssh",
...) call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/nemoclaw.js`:
- Around line 175-180: The host-key pinning (execFileSync("ssh-keyscan", ...),
fs.writeFileSync(knownHostsFile, ...)) can throw before the outer try/finally
that removes khDir; fix by ensuring khDir is created before calling execFileSync
and wrapping the ssh-keyscan + writeFileSync sequence (the hostKeys/execFileSync
and knownHostsFile write) in a local try/catch that on any error synchronously
removes khDir (fs.rmSync(khDir, { recursive: true, force: true })) and then
rethrows the error; reference the hostKeys variable, execFileSync call,
knownHostsFile write, and khDir cleanup so the temporary directory is always
removed even if ssh-keyscan fails.

---

Nitpick comments:
In `@bin/nemoclaw.js`:
- Line 160: The SSH probe currently invoked via execFileSync(...) uses "-o",
"StrictHostKeyChecking=no" which still writes new host keys into known_hosts;
modify the argument list passed to the execFileSync call (the ssh invocation) to
replace or augment that option with the OpenSSH directives "-o",
"UserKnownHostsFile=/dev/null" and "-o", "GlobalKnownHostsFile=/dev/null" (or
"none" on platforms that support it) alongside the existing "-o",
"ConnectTimeout=5" so the probe does not alter known_hosts and retains the
timeout/behavior of the original execFileSync("ssh", ...) call.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 36bc020e-89bf-4a1c-bb46-a1657f2aad18

📥 Commits

Reviewing files that changed from the base of the PR and between a09cfa9 and 553f579.

📒 Files selected for processing (1)
  • bin/nemoclaw.js

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
bin/nemoclaw.js (1)

159-160: ⚠️ Potential issue | 🟠 Major

Add UserKnownHostsFile=/dev/null to the readiness probe.

With only StrictHostKeyChecking=no, OpenSSH automatically records the host key to its default ~/.ssh/known_hosts file. This persists a trust decision after the temporary directory is cleaned up, permanently retaining any key accepted during the probe (including an attacker-controlled key if the host is compromised at that moment).

🔧 Suggested change
-      execFileSync("ssh", ["-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no", name, "echo", "ok"], { encoding: "utf-8", stdio: "ignore" });
+      execFileSync("ssh", [
+        "-o", "ConnectTimeout=5",
+        "-o", "StrictHostKeyChecking=no",
+        "-o", "UserKnownHostsFile=/dev/null",
+        name,
+        "echo",
+        "ok",
+      ], { encoding: "utf-8", stdio: "ignore" });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@bin/nemoclaw.js` around lines 159 - 160, The SSH readiness probe uses
execFileSync to call "ssh" with options including "StrictHostKeyChecking=no"
(see the execFileSync invocation); update the probe to also pass
"UserKnownHostsFile=/dev/null" among the "-o" options so the host key is not
persisted to the default known_hosts file (i.e., add "-o",
"UserKnownHostsFile=/dev/null" to the argument list passed to execFileSync for
the readiness check).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/nemoclaw.js`:
- Around line 176-180: The ssh-keyscan call uses the raw variable name and
ignores SSH config overrides, causing it to scan the wrong endpoint; update the
logic around ssh-keyscan/knownHostsFile so you first run ssh -G name (i.e., call
ssh -G ${name} and parse the HostName and Port it returns) to determine the
resolved HostName and Port, then pass that resolved host (and port if
non-default) to ssh-keyscan instead of the original name so knownHostsFile
matches the same endpoint used later by ssh/rsync; keep references to
ssh-keyscan, name, knownHostsFile and the ssh -G resolution step when
implementing the change.

---

Outside diff comments:
In `@bin/nemoclaw.js`:
- Around line 159-160: The SSH readiness probe uses execFileSync to call "ssh"
with options including "StrictHostKeyChecking=no" (see the execFileSync
invocation); update the probe to also pass "UserKnownHostsFile=/dev/null" among
the "-o" options so the host key is not persisted to the default known_hosts
file (i.e., add "-o", "UserKnownHostsFile=/dev/null" to the argument list passed
to execFileSync for the readiness check).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: bb368518-1dfc-4af8-8832-75ea734b7506

📥 Commits

Reviewing files that changed from the base of the PR and between 553f579 and 51340d8.

📒 Files selected for processing (1)
  • bin/nemoclaw.js

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@bin/nemoclaw.js`:
- Around line 179-182: The code currently writes ssh-keyscan output directly to
knownHostsFile (using execFileSync and fs.writeFileSync), which can create an
empty known_hosts and break subsequent SSH calls; update the logic in the block
that computes sshConfigOut, realHost and hostKeys so that after calling
execFileSync("ssh-keyscan", ["-H", realHost]) you validate hostKeys (e.g., if
(!hostKeys || !hostKeys.trim()) ) before calling fs.writeFileSync; on empty
output either retry the keyscan a limited number of times or throw/exit with a
clear error mentioning realHost and execFileSync so you do not overwrite
knownHostsFile with empty content. Ensure references: sshConfigOut, realHost,
hostKeys, knownHostsFile, execFileSync, fs.writeFileSync.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 57b2789c-bc66-4e6b-a4b4-16388724bf05

📥 Commits

Reviewing files that changed from the base of the PR and between 51340d8 and ed3167b.

📒 Files selected for processing (1)
  • bin/nemoclaw.js

@ericksoa ericksoa self-assigned this Mar 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant