Truly useful local AI on Apple Silicon. A worked reference rig across 16 GB, 32 GB, and 64 GB — one architecture, three SoC budgets, so anyone with the Mac they already own can run real models locally.
The wager: a single LLM call is brain-like and primitive — what feels useful (ChatGPT, Claude Code) is a system of models, retrieval, tools, and routing. This project builds that system locally with small open models and proves they can be insanely useful. Read MANIFESTO.md for the why.
Stack. Ollama on the host (native Apple Silicon, unified memory) + Open WebUI in Docker (LAN browser UI) + OpenCode in Docker (agentic coding against a launchd-resident llama-server, driven by the oc wrapper CLI), wired to a five-profile OWUI lineup. Branded mac-llm-lab here; one rename away from any other handle (see Fork checklist).
Architecture: spec.md. Model selection: profiles.md.
Migration note. The coding stack was rebuilt on OpenCode on 2026-06-10, replacing the previous claw-code + LiteLLM-bridge + grammar stack at every memory tier. Rationale and evidence:
host/test/docs/OPENCODE-MIGRATION-DECISION.md. The last commit with the old stack intact is taggedclaw-stack-final— check out that tag to reproduce the claw baseline.
The fastest path to a working code stack (OpenCode + llama-server + the oc wrapper) is the bundled installer. It's pure Bash, curl-only, no Homebrew required, and strictly idempotent — re-runs are safe on a live system.
git clone https://github.com/<you>/mac-llm-lab.git
cd mac-llm-lab
./wizard/wizard installThe wizard will:
- detect your Mac's RAM and pick a memory tier (16 / 32 / 64 GB) — override with ←/→ arrow keys on the slider
- ask for a topology —
full-local(host + client both on this Mac) orclient-only(this Mac talks to a host elsewhere on the LAN) - install Xcode CLT, cmake, llama.cpp, OrbStack, Ollama, fetch the tier GGUF, install the launchd-resident OpenCode llama-server, build the
opencode:localclient image, install the global agent prompt (~/.config/opencode/AGENTS.md) and theocwrapper (~/.local/bin/oc) - finish with an end-to-end smoke: the prompt-injection wire-capture probe plus a real
oc runartifact
After install:
oc # OpenCode TUI on the current directory
oc run "fix the tests" # headless one-shot
oc probe # assert the global prompt reaches the agent
./wizard/wizard doctor # read-only state inspection
./wizard/wizard --helpSee wizard/README.md for tier model choices, idempotency guarantees, and trust boundaries (one upstream curl | sh for OrbStack, opt-out instructions included).
The wizard installs the code stack only. The five-profile OWUI chat lineup is the broader host/ setup — see Manual / OWUI setup below.
| Profile | Use it for | Backing model |
|---|---|---|
general |
daily driver — chat, code, vision | Qwen3.6-27B Q8_0 |
fast |
snappy triage, no <think> |
Qwen3.6-35B-A3B MoE Q4 |
reasoning |
hard thinking, planning | Nemotron Super 49B v1.5 Q6 |
digest |
long-context extract | Qwen3-30B-A3B-Instruct-2507 Q4 |
analyze |
long-context reasoning | Qwen3-30B-A3B-Thinking-2507 Q6 |
One profile resident at a time, swapped on demand. Full rationale in profiles.md. Agentic coding runs on a separate, dedicated llama-server (host/llama-server/) driven by OpenCode — that's what the wizard wires up.
If you want the full chat lineup (Open WebUI, the five profiles, the host orchestration CLI) or prefer to install piece-by-piece, each directory has its own README:
host/ollama/— install Ollama, stage GGUFshost/ollama/Modelfiles/—ollama createthe aliaseshost/— Open WebUI Docker stack, groups, per-model confighost/llama-server/— the dedicated coding llama-server (launchd-resident, tier-parameterized)host/scripts/— installmac-llm-lab-hostctlfor orchestrationclient/— install themac-llm-labCLI on your laptopclient/opencode/— containerised OpenCode + theocwrapper
The wizard automates 4 and 7 (serving, client image, global prompt, oc). The OWUI chat profiles in 1–3 remain manual today.
# 1. Brand: replace `mac-llm-lab` everywhere (LAN hostname, script names, plist Label)
grep -rl 'mac-llm-lab\|LLM Lab' . | xargs sed -i '' 's/mac-llm-lab/your-brand/g; s/LLM Lab/Your-Brand/g'
# 2. Rig username: Modelfile FROM paths point to /Users/nigel/.ollama/gguf/
sed -i '' "s|/Users/nigel/|/Users/$USER/|g" host/ollama/Modelfiles/*.Modelfile
# 3. Repo path: mac-llm-lab-hostctl defaults to ~/Desktop/bench/mac-llm-lab.
# Either clone there, or set `HOST_REPO=/your/path` in your shell profile.After step 1, also rename host/ollama/launchd/com.mac-llm-lab.ollama-env.plist to match.
Use Chrome or Firefox for long Open WebUI sessions. Safari WebContent retains 10+ GB after closing thinking-mode chats.
MIT — see LICENSE.