The OpenCode sandbox runs agent invocations inside an isolated Docker container (aiboard-opencode-sandbox image) against a local Anthropic-compatible llama.cpp server — typically Qwen3.6 served by the sibling local-llm compose project. This gives you a free, offline alternative to the Claude-based providers for low-stakes roles.
It is off by default and entirely independent of the Docker/Claude sandbox. Follow this guide to enable it, verify it, and understand when it's safe to use.
There are two Qwen-target executors. This one (
docker-opencode) uses the OpenCode CLI and prompt-engineers the JSON schema with a client-side retry loop. The other (docker-claude-qwen, see ClaudeQwenSandbox.md) uses the Claude CLI and gets server-side schema enforcement via the proxy's tool-call mechanism. They exist side by side specifically so the candidate-evaluation feature can A/B them — choose based on your trust budget for prompt-engineered structure vs. wire-enforced structure, or run both and let the metrics decide.
| Requirement | Check |
|---|---|
| Docker daemon running | docker info |
llm-net bridge network exists |
docker network ls | Select-String llm-net |
local-llm compose project running |
docker ps | Select-String llama-server |
| Built OpenCode sandbox image | docker images aiboard-opencode-sandbox |
Built .NET worker |
dotnet build succeeds |
If llm-net is missing, start the sibling project first:
cd ..\local-llm
docker compose up -d
docker network ls | Select-String llm-net # should now appearOne-time, and whenever docker/opencode-sandbox/ changes:
.\scripts\build-opencode-sandbox.ps1Or via compose:
docker compose --profile build up opencode-sandboxVerify:
docker images aiboard-opencode-sandboxYou should see aiboard-opencode-sandbox:latest.
Set AGENT_EXECUTOR=docker-opencode in your environment (or .env.local) to make docker-opencode the explicitly-requested executor. Even without this env var, the executor is auto-registered whenever Docker is available; this variable just fails startup loud if Docker isn't reachable:
$env:AGENT_EXECUTOR = "docker-opencode"At startup the worker logs:
DockerOpenCodeAgentOptions: ImageName=aiboard-opencode-sandbox:latest, NetworkMode=llm-net, ProviderBaseUrl=http://llama-server:8080/v1, ModelName=qwen3.6-35b-a3b
OpenCode executor registered. Ensure the 'llm-net' Docker network exists (start the local-llm compose project) before routing roles to 'docker-opencode'.
Having the executor registered is not the same as using it — nothing routes to docker-opencode until you point a role at it in your workflow config.
{
"DockerAgents": {
"OpenCode": {
"ImageName": "aiboard-opencode-sandbox:latest",
"NetworkMode": "llm-net",
"MountHostDockerSocket": false,
"ProviderBaseUrl": "http://llama-server:8080/v1",
"AuthToken": "local",
"ModelName": "qwen3.6-35b-a3b",
"TimeoutSeconds": 7200,
"InactivityTimeoutSeconds": 1200,
"MaxRetriesOnMalformedOutput": 2,
"PerformanceVolumes": []
}
}
}| Key | Default | Purpose |
|---|---|---|
ImageName |
aiboard-opencode-sandbox:latest |
Image to run. |
NetworkMode |
llm-net |
Must match the bridge network owned by local-llm. Change only if you renamed that network. |
MountHostDockerSocket |
false |
When true, bind-mounts the host Docker daemon socket into the sandbox. Use with a project overlay that installs Docker CLI / Compose when OpenCode must run Docker-backed verification commands. Grants host-Docker control. |
HostDockerSocketPath |
/var/run/docker.sock |
Host socket path used when MountHostDockerSocket=true. |
ContainerDockerSocketPath |
/var/run/docker.sock |
Container socket path used when MountHostDockerSocket=true. |
ProviderBaseUrl |
http://llama-server:8080/v1 |
OpenAI-compatible endpoint exposed by the local llama.cpp proxy. The /v1 suffix is required — the OpenCode @ai-sdk/openai-compatible adapter appends /chat/completions to this prefix. |
AuthToken |
local |
Dummy token — llama.cpp validates nothing. Any non-empty string works. |
ModelName |
qwen3.6-35b-a3b |
Default model alias when a workflow role doesn't pin one. Both Qwen3.6 variants (qwen3.6-35b-a3b and qwen3.6-35b-a3b-think) are registered in the sandbox image; per-role model overrides this default. |
TimeoutSeconds |
7200 |
Hard wall-clock cap. The inactivity timer is the normal stuck detector. |
InactivityTimeoutSeconds |
1200 |
Stuck detector; kills the process when no stdout/stderr has appeared for N seconds. Set null to disable. |
MaxRetriesOnMalformedOutput |
2 |
Retry budget when the model response doesn't parse as Agent Contract JSON. After the final attempt, the executor returns outcome: ERROR with raw output in detail rather than throwing. Bypassed for fatal stderr hints — see "Fatal-hint short-circuit" below. Preceded by a one-shot structurer call on the first parse failure — see "No-think structurer fallback" below. |
EnableStructurer |
true |
When the agent's first invocation produces non-empty output that fails to parse as the Agent Contract JSON, run a one-shot follow-up call against StructurerModelName (no-think Qwen by default) asking it to extract the outcome from the prior narrative. Set to false to revert to the v0.0.22 behaviour: re-prompt the same model with a stricter instruction block. |
StructurerModelName |
qwen3.6-35b-a3b |
Model alias used by the recovery structurer. Defaults to the no-think variant — structuring is a fast mechanical extraction task where chain-of-thought is unhelpful. |
StructurerTimeoutSeconds |
180 |
Hard wall-clock cap for the structurer subprocess. Tight by design: extraction over a few-KB narrative should take seconds on a warm llama-server. |
ContainerNamePrefix |
aiboard-oc |
Prefix for generated container names (shape: aiboard-oc-{tenantHash}-{cardId}-{rand}). Keep the aiboard- prefix so orphaned-container detection still matches. |
RateLimitPatterns |
[] |
Additional stderr substrings that should be treated as rate-limit signals, merged with the built-in Anthropic patterns. |
PerformanceVolumes |
[] |
Workspace-relative dependency/cache directories to shadow with Docker named volumes. Use only for reproducible folders such as node_modules, .pnpm-store, .gradle, target, or .godot/imported. |
PerformanceVolumeOwner |
agent:agent |
Owner applied the first time a performance volume is initialized. Empty skips ownership initialization. |
Performance volumes are opt-in and deterministic per worktree/path. They are intended for slow host-backed dependency trees on Docker Desktop Windows; source files and generated artifacts that must be committed should stay on the normal worktree bind mount.
The sandbox image bakes an opencode.json template that registers both Qwen3.6 virtual models against the @ai-sdk/openai-compatible adapter, with the active model parameterised via the OPENCODE_MODEL_NAME env var. On each docker run, the sandbox entrypoint templates the JSON:
{
"provider": {
"llama-server": {
"npm": "@ai-sdk/openai-compatible",
"options": { "baseURL": "${OPENCODE_PROVIDER_BASE_URL}", "apiKey": "${OPENCODE_AUTH_TOKEN}" },
"models": {
"qwen3.6-35b-a3b": { "name": "Qwen3.6 35B-A3B (no thinking)", "limit": { "context": 131072, "output": 8192 } },
"qwen3.6-35b-a3b-think": { "name": "Qwen3.6 35B-A3B (thinking)", "limit": { "context": 131072, "output": 8192 } }
}
}
},
"model": "llama-server/${OPENCODE_MODEL_NAME}",
"small_model": "llama-server/qwen3.6-35b-a3b-think",
"compaction": { "auto": true, "prune": true }
}The executor sets OPENCODE_MODEL_NAME from the workflow role's model field per call. small_model is hardcoded to the -think alias because OpenCode uses it for compaction summaries — better-grounded summaries preserve specific identifiers (file paths, function names) that the no-think variant tends to drop. The compact happens only at threshold crossings, so the slower call is amortised. See local-llm/Qwen-3.6.md for the full rationale.
limit.context: 131072 matches the server's --ctx-size; OpenCode auto-compacts before hitting that ceiling.
The executor is registered under provider key docker-opencode. Opt in by changing the role's provider in workflow.github.json and choosing which Qwen variant fits the role:
"gate_checker": {
"model": "qwen3.6-35b-a3b",
"provider": "docker-opencode",
"systemPromptFile": "prompts/gate_checker.md",
"sections": []
},
"senior_engineer": {
"model": "qwen3.6-35b-a3b-think",
"provider": "docker-opencode",
"systemPromptFile": "prompts/senior_engineer.md",
"sections": ["Technical Design", "Decisions", "Implementation"]
}The model field selects between the two registered variants:
qwen3.6-35b-a3b— fast, no chain-of-thought. Use for tool-call loops, gate checks, estimation, mechanical implementation.qwen3.6-35b-a3b-think— same weights with reasoning emitted (~7× tokens, ~30–60s per turn). Use for design, code-review write-ups, QA test plans, summaries — anything single-shot where output quality dominates over latency.
Any unrecognised provider value fails config validation at startup.
.\scripts\smoke-opencode.ps1The script:
- Confirms Docker, the image, and the
llm-netnetwork are all present. - Runs a minimal one-shot against the sandbox with a trivial prompt.
- Parses the response via the same JSON-extraction strategies the executor uses (fenced block → whole document → trailing balanced braces).
- Asserts the response contains a valid
outcome(COMPLETE,NEEDS_INFO, orERROR).
On failure, the script prints the raw stdout/stderr and a suggested fix.
Two Qwen3.6 variants are exposed by the local-llm proxy as separate model aliases — pick the right one per role rather than choosing a single global mode.
| Role intent | Variant | Why |
|---|---|---|
gate_checker (pass/fail JSON) |
qwen3.6-35b-a3b |
Latency dominates; reasoning would crush throughput. |
estimator (calibrated size) |
qwen3.6-35b-a3b |
Short, structured answer. |
code_reviewer (write the review document) |
qwen3.6-35b-a3b-think |
Synthesis across the whole prompt; quality > latency. |
implementer (tool-loop edits) |
qwen3.6-35b-a3b |
Multi-turn tool loop; reasoning would multiply wall time per step. |
senior_engineer (design) |
qwen3.6-35b-a3b-think |
Single-shot synthesis where grounded reasoning materially improves output. |
qa (test plan) |
qwen3.6-35b-a3b-think |
Same shape as design — single-shot synthesis. |
specialist_reviewer / senior_specialist_reviewer |
qwen3.6-35b-a3b-think |
High-stakes reviews benefit from thinking; volume cost is acceptable. |
merge_resolver (file-edit tool loop) |
qwen3.6-35b-a3b |
Tool loop. |
Rule of thumb: single-shot synthesis → -think, tool-call loops → base.
Honest caveats:
- Qwen3.6 is below Claude Opus on architecture reasoning (SWE-Bench 73.4% vs 80.8%). Don't use it for irreversible architectural calls without human review.
- The
-thinkvariant occasionally hallucinates identifier names (e.g. inventsTowerNodewhen the actual class isTower). Verify before code-gen acts on these. - First request after
docker compose upor long idle takes 30–120s (cold prefix cache). Defaults areTimeoutSeconds: 7200hard cap plusInactivityTimeoutSeconds: 1200stuck detection.
This is guidance, not enforcement — the executor will run any role you point at it. The multi-agent candidate evaluation feature (docs/CandidateEvaluation.md) is the data-driven path to replacing this table with measured win rates per (role, provider).
- Orphan cleanup: startup warns on leftover
aiboard-oc-*containers with adocker rm -fsuggestion. - Network errors fire a stderr hint (category
Network) pointing to likely fixes (llm-netabsent,could not resolve host, etc.). - Model errors fire a stderr hint (category
Model) when the requested alias isn't loaded on llama-server. - The executor logs the resolved
ProviderBaseUrlandModelNameat Info on every run — verify the expected values appear in logs.
When the stderr signature detector fires with one of the fatal categories — Network, Auth, Config, Path — the retry-on-malformed-output loop is bypassed. The executor immediately throws CliInfrastructureException (recorded as INFRASTRUCTURE failure-reason in agent_run.failure_reason).
Why: re-prompting cannot recover an unreachable upstream, a rejected token, a missing provider key, or a wire-path mismatch. Without this, a single 502 from llama-server during a polling run could burn 3 × the inactivity timer (~60 min on default settings) before surfacing — a real cost observed in the v0.0.22 example-project run that prompted this fix.
Model (e.g. "model not found") is intentionally NOT in the fatal list, since a model could be loaded mid-run on a slow-starting llama-server. Retries continue for that case.
If you see a fatal-hint bail in your logs, the operator-actionable fix is in the hint text itself (e.g. "Docker network 'llm-net' does not exist. Start the local-llm compose project first") — not "give the model another try."
The most common parse failure for thinking-variant Qwen on heavy-reasoning roles isn't malformed JSON — it's missing JSON. The agent narrates correct work in prose and forgets to emit the {"outcome":"COMPLETE", ...} envelope at the end. Re-prompting the same thinking model with "be stricter" rarely fixes this, because the model already thinks it's done.
When the first attempt produces non-empty output that fails to parse, the executor runs a one-shot structurer call before the retry loop kicks in:
- Spawns a separate Docker container (
{ContainerNamePrefix}-struct-{tenant}-{cardId}-{rand}) using the same image and llm-server connection. - Pins the model to
StructurerModelName(default:qwen3.6-35b-a3b, the no-think variant — structuring is a mechanical extraction task where reasoning is counterproductive). - Sends a tight prompt: "Below is an agent's free-form narrative. Extract ONLY a JSON object matching this schema." No tools, no system prompt, no chain-of-thought.
- If the structurer returns parseable JSON, the executor returns it as the recovered result with a marker in
ConversationLog(Recovered via no-think structurer). - If the structurer fails (timeout, unparseable output, container error), execution falls through to the existing retry-with-stricter-reprompt path. The original v0.0.22 behaviour is preserved as the safety net.
The structurer fires only on the first parse failure, never on subsequent retries — it's a one-shot recovery, not a per-retry helper.
Trade-off: structuring infers fields from prose, which is by definition parser-side inference. The structurer prompt tells the model to faithfully summarize the narrative without inventing claims, but a determined hallucination in the agent's prose will be preserved verbatim in the structured output. If you need a stricter wire contract, route the role through docker-claude-qwen (server-enforced schema via --json-schema → tool-call); the structurer is the right answer when you need OpenCode's prompt-engineered path to be viable for thinking workloads.
Operator opt-out: set DockerAgents:OpenCode:EnableStructurer = false in appsettings.json to revert to v0.0.22 behaviour.
Logs to look for:
Docker/OpenCode invoking structurer for card N (model=...)— structurer is firing.Docker/OpenCode structurer recovered outcome=COMPLETE— recovery succeeded; no retry consumed.Docker/OpenCode structurer ... falling through to retry loop— recovery failed; existing retry path runs.
- Single-slot server.
local-llm'sllama-serverruns with--parallel 1. Candidate execution can start multiple Qwen-target providers in parallel; configure thelocal-llmnamed-resource pool withMaxConcurrent: 1when providers share the same backend. For true parallel local loads, add a secondllama-serveron a different port and route explicitly. - Schema enforcement is prompt-engineered, not wire-enforced (today). llama.cpp itself supports
response_format: {"type":"json_schema", ...}server-side (per the model card), but the OpenCode CLI doesn't expose a flag to thread it through, so this executor relies on a schema instruction block in the prompt + client-side validation byOpenCodeOutputParser+ bounded retry. This is less strict than Claude's--json-schemaenforcement; if you see frequent parse failures on a specific role, the long-term fix is to extend OpenCode's CLI surface (or call llama.cpp directly) rather than scale the retry budget. - 128K context ceiling. Local llama.cpp is configured for 128K. Oversized prompts fail at the server boundary (stderr hint
context length). - No credential staging. Unlike the Claude sandbox, no host directory is mounted into the container — the connection detail is just env vars passed through to the entrypoint.
If your project's agent work needs additional tooling baked into this sandbox (a runtime, a compiler, a CLI), overlay it instead of forking. See ProjectOverlays.md for the FROM aiboard-opencode-sandbox:latest pattern, build-script template, and appsettings.json wiring.