Skip to content

bug: --from claude creates nonexistent image pull instead of running agent #422

@Manoj-engineer

Description

@Manoj-engineer

Agent Diagnostic

Loaded skill: openshell-cli (from .agents/skills/openshell-cli/SKILL.md)
Checked Quick Reference table — confirmed correct syntax is:
openshell sandbox create -- claude ← double-dash (run command)
openshell sandbox create --from openclaw ← --from (community image)

Following Workflow 1 (Getting Started) from the skill:

Step 1 — gateway start:
openshell gateway start --port 8082 --recreate
✓ Gateway ready at https://127.0.0.1:8082

Step 3 — sandbox create:
openshell sandbox create --from claude
(used --from instead of -- by mistake — this is the bug trigger)

Pod did not start. Ran openshell doctor exec per debug-openshell-cluster skill:
openshell doctor exec -- kubectl describe pod -n openshell

Output (attempt 1, no registry token):
Failed to pull image "ghcr.io/nvidia/openshell-community/sandboxes/claude:latest"
failed to authorize: unexpected status 403 Forbidden

Skill says --registry-token can be used for private community images.
Added GitHub PAT, recreated gateway. Re-ran sandbox create --from claude.

Output (attempt 2, with registry token):
Failed to pull image "ghcr.io/nvidia/openshell-community/sandboxes/claude:latest"
rpc error: code = NotFound
ghcr.io/nvidia/openshell-community/sandboxes/claude:latest: not found

Checked GHCR package list for nvidia/openshell-community — images present:
sandboxes/base, sandboxes/openclaw, sandboxes/openclaw-nvidia, sandboxes/ollama
sandboxes/claude does NOT exist.

Consulted skill Workflow 5 (BYOC) — --from resolves a name to
ghcr.io/nvidia/openshell-community/sandboxes/:latest.
claude is not a community image. It is an agent binary inside sandboxes/base.

Correct command per Quick Reference table:
openshell sandbox create -- claude
✓ Sandbox created, claude launched successfully.

Root cause: --from accepts any string without validating against the known
community image catalog. Agent names (claude, opencode, codex) are not valid
--from values but the CLI does not reject them — it lets Kubernetes fail at
image pull time with an error that gives no hint the CLI syntax is wrong.

###Description

openshell sandbox create --from claude is accepted by the CLI but fails at
runtime with ImagePullBackOff because sandboxes/claude:latest does not exist.

The error output (403 or "not found") gives no indication that the syntax is
wrong. The user is sent down an auth debugging path for what is a syntax error.

The --from flag accepts four kinds of values (per Workflow 5 in openshell-cli skill):

  • community image name → openshell sandbox create --from openclaw
  • local Dockerfile path → openshell sandbox create --from ./Dockerfile
  • image reference → openshell sandbox create --from registry.io/img:v1
  • directory → openshell sandbox create --from ./my-sandbox-dir

claude, opencode, and codex are not community image names — they are agent
commands run inside sandboxes/base using the double-dash syntax:
openshell sandbox create -- claude

Expected: the CLI should reject --from at parse time:

error: 'claude' is not a community sandbox image
hint: to run claude as an agent, use: openshell sandbox create -- claude

Reproduction Steps

  1. uv tool install -U openshell
  2. openshell gateway start
  3. openshell sandbox create --from claude
  4. Pod enters ErrImagePull / ImagePullBackOff
  5. openshell doctor exec -- kubectl describe pod -n openshell
    → Error says 403 or "not found" — no hint that the syntax is wrong
  6. openshell sandbox create -- claude ← this is the correct syntax

Environment

OS: macOS 14.4 (Apple Silicon)
Docker: Docker Desktop 28.1.1
OpenShell: v0.0.9

Logs

# Without registry token:
Failed to pull image "ghcr.io/nvidia/openshell-community/sandboxes/claude:latest":
failed to authorize: failed to fetch anonymous token: unexpected status 403 Forbidden

# With registry token:
rpc error: code = NotFound
ghcr.io/nvidia/openshell-community/sandboxes/claude:latest: not found

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions