bug: Claude Code crashes on GB200 (ARM64 64K page kernel)


### Agent Diagnostic

**Investigated by:** Claude Code (debug-openshell-cluster skill)
**Cluster:** `openshell` · `https://127.0.0.1:8080` · v0.0.7
**Sandbox:** `strong-treefrog` · image `ghcr.io/nvidia/openshell-community/sandboxes/base@sha256:97a6a668...`

Cluster is fully healthy — this is not a cluster startup issue.

```
openshell status        → Connected (v0.0.7)
node status             → Ready (k3s v1.35.2+k3s1)
openshell-0             → Running, 0 restarts
```

**Root cause: the Claude Code binary aborts on ARM64 kernels with 64K page size.**

| Property | Value |
|----------|-------|
| Kernel | `6.14.0-1013-nvidia-64k` |
| Architecture | `aarch64` |
| `getconf PAGE_SIZE` | `65536` (64K) |
| `/usr/local/bin/claude` | ELF ARM64 standalone binary, 222MB |
| Bundled runtime | Bun v1.3.1 (confirmed via `grep -boa "Bun v"`) |
| Crash signal | `SIGABRT` (exit 134, not SIGSEGV) |
| Seccomp | Disabled — not the cause |
| Node.js v22.22.1 | Works fine in same sandbox |

Bun's JavaScript engine (JavaScriptCore) performs memory layout operations assuming ≤16K page alignment. On a 64K page kernel, these assumptions are violated and JSC calls `abort()` before any user code executes. `node` itself is unaffected — the NPM-based install path uses Node.js and avoids this entirely.

Commands run:

```bash
openshell status
openshell doctor exec -- kubectl get pods -A -o wide
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- uname -a
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- getconf PAGE_SIZE
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- /usr/local/bin/claude --version
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'cat /proc/self/status | grep -E "Seccomp|NoNewPrivs|CapEff"'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- ldd /usr/local/bin/claude
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'ls -lh /usr/local/bin/claude'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- sh -c 'grep -boa "Bun v" /usr/local/bin/claude'
openshell doctor exec -- kubectl -n openshell exec strong-treefrog -- node -e "process.exit(0)"  # ✓ works
```

### Description

The default Claude Code installer produces a binary that does not run correctly on GB200 nodes.

```
$ openshell sandbox create -- claude
...
Created sandbox: strong-treefrog
```
```
$ openshell sandbox connect strong-treefrog
sandbox@strong-treefrog:~$ claude
Aborted (core dumped)
```

This is a known issue with the Claude Code installer on these machines. The workaround is to install via NPM instead:

```
npm install -g @anthropic-ai/claude-code
```

OpenShell should consider using the NPM-based install path for Claude on
affected systems. It is unclear whether this is specific to Grace-based
systems (GB200) or affects ARM64 in general — worth testing on other
aarch64 machines to narrow it down.

### Reproduction Steps

1. On a GB200 node, create a Claude sandbox: `openshell sandbox create -- claude`
2. Connect to the sandbox: `openshell sandbox connect <name>`
3. Run `claude`
4. Observe: `Aborted (core dumped)`

### Environment

- Host: GB200 (Grace Blackwell), aarch64
- OpenShell sandbox with Claude provider

### Logs

```shell

```

### Agent-First Checklist

- [x] I pointed my agent at the repo and had it investigate this issue
- [x] I loaded relevant skills (e.g., `debug-openshell-cluster`, `debug-inference`, `openshell-cli`)
- [x] My agent could not resolve this — the diagnostic above explains why

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: Claude Code crashes on GB200 (ARM64 64K page kernel) #400

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Agent-First Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Property	Value
Kernel	`6.14.0-1013-nvidia-64k`
Architecture	`aarch64`
`getconf PAGE_SIZE`	`65536` (64K)
`/usr/local/bin/claude`	ELF ARM64 standalone binary, 222MB
Bundled runtime	Bun v1.3.1 (confirmed via `grep -boa "Bun v"`)
Crash signal	`SIGABRT` (exit 134, not SIGSEGV)
Seccomp	Disabled — not the cause
Node.js v22.22.1	Works fine in same sandbox

bug: Claude Code crashes on GB200 (ARM64 64K page kernel) #400

Description

Agent Diagnostic

Description

Reproduction Steps

Environment

Logs

Agent-First Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions