Hal0ai · thinmintdev · Jun 12, 2026
diff --git a/CONTENT_BRIEF.md b/CONTENT_BRIEF.md
@@ -12,12 +12,12 @@
 
 hal0 is a homelab AI inference platform: the Linux box you already
 have in the rack, running real OpenAI-compatible inference. It manages
-model **slots** as systemd units with a typed lifecycle state machine,
-exposes an **OpenAI-compatible `/v1/*` API**, and ships with a Vue
-**dashboard** plus a **prewired OpenWebUI** chat tab. One command
-installs on any modern Linux box — Strix Halo iGPU, AMD discrete,
-NVIDIA, or CPU — and it's happy in a privileged Proxmox LXC with
-GPU/NPU passthrough, behind your Traefik.
+model **slots** as podman containers under per-slot systemd units with
+a typed lifecycle state machine, exposes an **OpenAI-compatible
+`/v1/*` API**, and ships with a React **dashboard** plus a **prewired
+OpenWebUI** chat tab. One command installs on any modern Linux box —
+Strix Halo iGPU, AMD discrete, NVIDIA, or CPU — and it's happy in a
+privileged Proxmox LXC with GPU/NPU passthrough, behind your Traefik.
 
 (Synthesised from `hal0/README.md` lines 1–24 and `hal0/PLAN.md` §1.)
 
@@ -62,15 +62,15 @@ Overrides (env vars, from `installer/install.sh`):
 `HAL0_NO_PROBE`, per-backend `HAL0_TOOLBOX_IMAGE_*`.
 
 Status caveat: the installer is real and produces a running
-`hal0-api`. As of v0.1.1 (2026-05-22) **all six toolbox images** —
+`hal0-api`. All six toolbox images —
 `vulkan`, `rocm`, `flm`, `moonshine`, `kokoro`, `comfyui` — are
 published to `ghcr.io/hal0ai/` and pinned by sha256 digest in
-`hal0/manifest.json` (`toolbox_images.*.digest`). FLM chat + embed
-are surfaced in the picker when XDNA hardware and the local toolbox
-image are both present; STT slice deferred. v0.1.1 (2026-05-22) is
-the first install that completes end-to-end on non-Strix-Halo hosts:
-WSL 2 (with systemd), Proxmox VMs, and bare-metal Linux with discrete
-GPUs all probe + wizard cleanly. APIs may shift before v1.0.
+`hal0/manifest.json` (`toolbox_images.*.digest`). Slots run as
+**podman containers** under per-slot `hal0-slot@<name>.service`
+systemd units, orchestrated by hal0-api; container images and tuned
+flags come from slot profiles. FLM chat + embed are surfaced in the
+picker when XDNA hardware and the local toolbox image are both present.
+APIs may shift before v1.0.
 
 ### Installer overhaul (2026-05-15)
 
@@ -177,9 +177,13 @@ and curated models.
    warming → ready → serving ↔ idle → unloading; error sideband).
    Persisted to `state.json`, streamed over SSE. (PLAN §5 Tier 3,
    `src/hal0/slots/state.py`)
-3. **systemd-managed containers** — each slot is an instance of the
-   `hal0-slot@.service` template unit. The API process never holds a
-   model in its own memory. (`ARCHITECTURE.md` §Process model)
+3. **systemd-managed podman containers** — each slot runs as a podman
+   container under a `hal0-slot@<name>.service` unit; container images
+   and tuned flags come from slot profiles. The API process never holds
+   a model in its own memory. An exclusive GPU arbiter swaps the iGPU
+   between LLM serving and ComfyUI image generation. The NPU runs a
+   single FastFlowLM container serving chat + ASR + embeddings.
+   (`ARCHITECTURE.md` §Process model)
 4. **Hardware-aware probe** — detects GPU / NPU / unified memory (UMA
    pool on Strix Halo), writes `/etc/hal0/hardware.json`, surfaces
    VRAM/RAM fit warnings inline in the slot form. (PLAN §6,
@@ -191,9 +195,9 @@ and curated models.
 6. **Bundled prewired OpenWebUI** — chat UI on `:3001`, zero config:
    the installer writes `openwebui.env` pointing at the local hal0 API.
    (PLAN §8)
-7. **Vue 3 + Tailwind 4 dashboard** — 9 views: Dashboard, Slots,
-   Models, Hardware, Logs, Settings, Providers, FirstRun, plus error
-   shell. Dark mode default; SSE for status + log tail. (PLAN §6)
+7. **React 18 + TypeScript + Vite dashboard** — 9 views: Dashboard,
+   Slots, Models, Hardware, Logs, Settings, Providers, FirstRun, plus
+   error shell. Dark mode default; SSE for status + log tail. (PLAN §6)
 8. **Atomic self-update with rollback** — `hal0 update --channel
    stable|nightly`; cosign-verified tarballs swap a
    `/usr/lib/hal0/current` symlink; `--rollback` reverts. (PLAN §9,
@@ -211,12 +215,12 @@ and curated models.
     The dashboard operates ComfyUI as a containerized generation
     engine with a gated inference ⇄ generation iGPU switchover
     (hal0 PRs #686/#690, 2026-06-11 — see "Image generation" below).
-12. **Optional Caddy reverse proxy with basic_auth + Bearer token POC**
-    — `install.sh --auth=basic` provisions Caddy, writes a hashed
-    `basic_auth`, mints a Bearer token, and round-trips a self-test
-    against `https://${HAL0_HOSTNAME}/api/health` before exiting
-    (commits `ba79427`, `f62902c`; install.sh lines 294+ and 645+).
-    Trusted-LAN posture remains the default (`--auth=off`).
+12. **Trusted-LAN default — no built-in auth** — hal0-api binds
+    `0.0.0.0:8080` with no network gate (ADR-0012; Caddy / `--auth=basic`
+    removed in v0.3). Deploy behind your own Traefik/nginx/Cloudflare
+    Tunnel when you need authentication or TLS. The Origin allowlist
+    and HMAC session cookie still gate the chat-proxy WebSocket path.
+    (`src/hal0/api/routes/agents/_auth.py`)
 13. **Proxmox host-pressure segment (LXC deployments)** — drop a
     read-only `PVEAuditor` API token into Settings → "Proxmox
     integration" and the dashboard's unified-memory bar shows the
@@ -238,8 +242,8 @@ The five always-present slots (`BUILTIN_SLOTS` in
 
 | Slot | What it does | Default backend |
 |---|---|---|
-| `primary` | Chat / general LLM (`/v1/chat/completions`, `/v1/completions`) | llama.cpp (Vulkan) |
-| `embed`   | Embeddings (`/v1/embeddings`) and rerank (`/v1/rerankings`) | llama.cpp (Vulkan) |
+| `primary` | Chat / general LLM (`/v1/chat/completions`, `/v1/completions`) | llama-server (Vulkan) |
+| `embed`   | Embeddings (`/v1/embeddings`) and rerank (`/v1/rerankings`) | llama-server (Vulkan) |
 | `stt`     | Speech-to-text (`/v1/audio/transcriptions`) | Moonshine |
 | `tts`     | Text-to-speech (`/v1/audio/speech`) | Kokoro |
 | `img`     | Image generation (`/v1/images/generations`) | ComfyUI (ROCm) |
@@ -316,16 +320,22 @@ From `hal0/PLAN.md` §1 + `src/hal0/providers/`:
 
 | Provider | Hardware | What it serves |
 |---|---|---|
-| **llama.cpp** | Vulkan (default) / ROCm (opt-in) | chat, embed, rerank, vision |
-| **FLM** | AMD XDNA NPU (opt-in) | chat / embed / ASR multiplex (chat + embed surfaced to picker today; STT slice deferred) |
-| **Moonshine** | CPU / Vulkan | STT (`/v1/audio/transcriptions`) |
+| **llama.cpp** (llama-server) | Vulkan (default) / ROCm (opt-in) | chat, embed, rerank, vision |
+| **FLM** | AMD XDNA NPU (opt-in) | chat / embed / ASR multiplex — one FastFlowLM container serves all three |
+| **Moonshine** | CPU only | STT (`/v1/audio/transcriptions`) |
 | **Kokoro** | CPU / Vulkan | TTS (`/v1/audio/speech`) |
-| **ComfyUI** | ROCm (Strix Halo iGPU class) | Image gen (`/v1/images/generations`) |
+| **ComfyUI** | ROCm (Strix Halo iGPU class) | Image gen (`/v1/images/generations`) — exclusive GPU arbiter swaps iGPU between inference and generation |
 
-All five are first-class in v1. Each provider is a class with
+All five are first-class in v1. Every slot runs as a **podman container**
+under a `hal0-slot@<name>.service` unit; container images + tuned flags
+come from profiles. Each provider is a class with
 `build_env() / start_cmd() / health() / infer()` — stateless, swappable
 (`ARCHITECTURE.md` §Key boundaries).
 
+**Provider name in TOML**: use `llama-server` (not `llama.cpp`) in slot
+TOML files — `_VALID_PROVIDERS` = `{llama-server, flm, moonshine, kokoro}`
+(`src/hal0/config/schema.py:89`).
+
 ### FLM NPU (AMD XDNA) deep-dive
 
 FLM is live as the NPU provider — opt-in, surfaced in the picker only
@@ -652,7 +662,7 @@ only `models_dir` raises (see comments at
   (`src/hal0/api/routes/models.py:148`,
   `src/hal0/registry/detect.py:140`)
 - Per-slot live metrics endpoint — `GET /api/slots/metrics` reads
-  docker container cgroup memory + `ActiveEnterTimestampMonotonic` +
+  podman container cgroup memory + `ActiveEnterTimestampMonotonic` +
   scraped llama-server `/metrics`; falls back to the systemd unit's
   own `MemoryCurrent` for native-host slots
   (`src/hal0/api/routes/slots.py:376–467`)
@@ -927,10 +937,11 @@ a stale placeholder.
   versioned dir), `/etc/hal0/` (config, preserved across updates),
   `/var/lib/hal0/` (models, registry, OpenWebUI state). `HAL0_HOME`
   overrides all of the above for dev installs. (PLAN §2)
-- **systemd template units** — slots are `hal0-slot@<name>.service`
-  instances (`packaging/systemd/hal0-slot@.service`), not per-slot
-  hand-written units. One template, N instances, all rendered from
-  config.
+- **systemd template units + podman containers** — slots are
+  `hal0-slot@<name>.service` instances that each launch a **podman**
+  container; not per-slot hand-written units and not Docker Compose.
+  One template, N instances, all rendered from config. Container images
+  and flags come from slot profiles managed by hal0-api.
 - **Atomic TOML config** — every config write is
   `NamedTemporaryFile(delete=False) + os.replace()`; a failed write
   leaves the prior file intact (PLAN §5 Tier 1).

diff --git a/docs/SITE-FIXES.md b/docs/SITE-FIXES.md
@@ -18,8 +18,8 @@
 - [ ] **Dashboard framework wrong** — "Vue 3" → **React 18 + TypeScript + Vite** · `src/pages/index.astro:64`
 - [ ] **Version eyebrow 1–2 minors stale** → `0.3.2-alpha.1` (sweep ALL version strings; ideally template from release manifest) · `src/pages/index.astro:13`, `src/content/docs/docs/index.mdx:16`, `src/content/docs/docs/operate/updates.mdx:13`
 - [ ] **Caddy / `--auth=basic` / auto-HTTPS = broken install** — entire auth/TLS subsystem retired (ADR-0012; API binds `0.0.0.0:8080` open). Delete/rewrite to the honest `no-auth-default` story · `src/pages/index.astro:361`, `src/content/docs/docs/operate/auth.mdx:228,288`
-- [ ] **Per-slot toolbox containers / `hal0-slot@.service` / 6 published images** — superseded by the single `lemond` runtime; slots are logical name→model mappings. Rewrite slot + provider-matrix docs · `getting-started/install.mdx:154`, `reference/provider-matrix.mdx:31`, `slots/what-is-a-slot.mdx:53`, `custom-slots.mdx:65`
-- [ ] **Provider matrix = pre-Lemonade set** (Moonshine/ComfyUI/Kokoro first-class) — rewrite around Lemonade-unified runtime · `reference/provider-matrix.mdx:12,39,62,76,90,99,110`, `api/audio.mdx:15`
+- [ ] **Per-slot container runtime story** — slots run as **podman** containers under `hal0-slot@<name>.service` units (not Docker, not lemond); container images + flags from profiles; GPU arbiter for iGPU; NPU = single FastFlowLM container. Rewrite slot + provider-matrix + install docs · `getting-started/install.mdx:66,103,138,156`, `reference/provider-matrix.mdx:31`, `slots/what-is-a-slot.mdx:48-52`, `custom-slots.mdx:65`
+- [ ] **Provider matrix uses wrong provider names** — `llama.cpp` → `llama-server` in TOML examples and matrix; `_VALID_PROVIDERS = {llama-server, flm, moonshine, kokoro}` · `reference/provider-matrix.mdx:20`, `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
 - [ ] **Hermes-Agent labelled "(roadmap: soon)" but SHIPPED** — promote to shipped; document install + dashboard chat surface · `src/pages/index.astro:346`
 
 ## ⛔ BLOCKED — backend must reconcile code↔docs first (do NOT edit site yet)
@@ -51,7 +51,7 @@ _(Most marketing-worthy first. Each is real + user-facing.)_
 - [ ] **Bundle-tier first-run picker** — Lite/Default/Pro/Max/LMX-Omni, RAM-gated · `api/routes/bundles.py:46-82`
 - [ ] **NPU FLM trio** — agent + stt-npu + embed-npu from one `flm serve` · `CONTEXT.md:115-136`
 - [ ] **Bundled pi-coder agent** (single-pick with Hermes) · `agents/pi_coder.py`
-- [ ] **Lemonade admin panel** — `GET/POST /api/lemonade/config` · `api/routes/lemonade_admin.py:297`
+- [ ] **Slot container admin** — expose container-runtime management story (podman slot containers, GPU arbiter, profile-driven image selection) in the docs; `api/routes/lemonade_admin.py` is removed in the lemonade-removal epic
 - [ ] **Prometheus metrics endpoint** · `api/routes/health.py:112`
 - [ ] **Merged journal SSE** — `/api/journal` + `/stream` + Lemonade log proxy · `api/routes/journal.py:203`
 - [ ] **HMAC session cookie** for agents chat proxy (HttpOnly, SameSite=Lax, 8h) · `api/agents/_auth.py`
@@ -67,7 +67,7 @@ _New items beyond the drift report. Full detail + verification notes in the back
 
 ### 🔴 HIGH — OSS / broken copy-paste
 - [ ] **B2 — Scrub private IPs/domains from the website repo** · `astro.config.mjs:93` (`allowedHosts ['.thinmint.dev']` → localhost) · `operate/openwebui.mdx:79,81` (`hal0.thinmint.dev` → `hal0.local`) · `operate/auth.mdx:31,331,337` (`10.0.1.230` — goes away with B1)
-- [ ] **B3 — Fix invalid `--provider` / `provider =` examples (they error on validation)** · valid set `{lemonade,llama-server,flm,moonshine,kokoro}` · `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
+- [ ] **B3 — Fix invalid `--provider` / `provider =` examples (they error on validation)** · valid set `{llama-server,flm,moonshine,kokoro}` (lemonade removed in phase-F epic) · `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
 
 ### 🟡 MEDIUM
 - [ ] **B6 — Remove `hal0-slot@.service` from install step 5** (template removed PR-9) · `getting-started/install.mdx:137-148`

diff --git a/src/content/docs/docs/getting-started/install.mdx b/src/content/docs/docs/getting-started/install.mdx
@@ -60,10 +60,10 @@ below).
 ## Deployment shapes
 
 hal0 doesn't care which slice of your homelab it runs on, as long as
-the kernel speaks systemd and Docker. Three shapes worth naming:
+the kernel speaks systemd and podman. Three shapes worth naming:
 
 - **Bare-metal Linux.** The simplest case. `hal0-api`, `hal0-openwebui`,
-  and the per-slot toolbox containers all live on the host.
+  and the per-slot podman containers all live on the host.
 - **VM.** Works if you give it enough RAM and pass the GPU through.
   Eats more host memory than an LXC for the same workload.
 - **Privileged LXC on Proxmox (recommended for homelabs).** GPU and
@@ -99,8 +99,9 @@ loads one.
 
 2. **x86_64.** ARM is not currently supported.
 
-3. **Docker reachable.** The slot toolboxes run as containers. The
-   installer checks `docker ps` before doing anything destructive.
+3. **Podman reachable.** Slot inference containers run under podman.
+   The installer checks podman availability before doing anything
+   destructive.
 
 4. **At least 20 GB free under `/var/lib`.** Models, registry, and
    OpenWebUI state land there. Override via `--models-dir=/path` if
@@ -113,7 +114,7 @@ loads one.
 ## What the installer does
 
 <Steps>
-1. **Pre-flight checks.** Verifies systemd, x86\_64, Docker, free
+1. **Pre-flight checks.** Verifies systemd, x86\_64, podman, free
    space, and free ports. Bails before touching anything if a check
    fails. Re-runnable on its own as `hal0 doctor`.
 
@@ -134,10 +135,11 @@ loads one.
    pull a model and flip it on yourself. Existing files are never
    overwritten on re-run.
 
-5. **systemd units.** Drops `hal0-api.service`,
-   `hal0-openwebui.service`, and the `hal0-slot@.service` template into
-   `/etc/systemd/system/`. Reloads the daemon, enables and starts the
-   API plus OpenWebUI.
+5. **systemd units.** Drops `hal0-api.service` and
+   `hal0-openwebui.service` into `/etc/systemd/system/`. The
+   `hal0-slot@.service` template is written separately for per-slot
+   podman containers. Reloads the daemon, enables and starts the API
+   plus OpenWebUI.
 
 6. **`hal0` on PATH.** Symlinks `/usr/local/bin/hal0` →
    `${VENV_DIR}/bin/hal0` so the CLI works without sourcing anything.
@@ -153,7 +155,7 @@ The installer needs `sudo` to drop systemd unit files in
 `/etc/systemd/system/`, create the `/usr/lib/hal0/` tree, and
 optionally write the `/usr/local/bin/hal0` symlink. The `hal0-api`
 service itself runs as the unprivileged `hal0` user, as do the
-per-slot toolbox containers (each declares `User=hal0` in the
+per-slot podman containers (each declares `User=hal0` in the
 template unit).
 </Aside>
 
@@ -209,22 +211,15 @@ then `/var/lib/hal0/models`. The path is persisted as
 `[models].pull_root` and auto-included in `[models].roots` so a fresh
 install scans the existing tree on first boot.
 
-## Authentication & HTTPS (optional)
+## Authentication & HTTPS
 
-The default install has no auth in front. Fine on a trusted home LAN
-behind your Traefik or Caddy. For exposed deployments add
-`--auth=basic`:
-
-```sh frame="terminal"
-sudo bash installer/install.sh --auth=basic
-```
-
-The installer prompts for an admin user and password, installs Caddy,
-generates a TLS cert (self-signed for `.local` hostnames, Let's
-Encrypt for real domains), mints a Bearer token, and round-trips
-`https://${HAL0_HOSTNAME}/api/health` as a self-test before exiting.
-See [Authentication & HTTPS](/docs/operate/auth/) for the full flow
-including client-side cert trust and ACME setup.
+hal0-api binds `0.0.0.0:8080` with no built-in auth or TLS. On a
+trusted home LAN behind your Traefik this is the correct default.
+For exposed deployments, put a reverse proxy (Traefik, nginx,
+Cloudflare Tunnel) in front and handle auth and TLS there.
+See [Authentication & Security](/docs/operate/auth/) for recommended
+patterns including WebSocket passthrough and the MCP transport
+allowlists.
 
 ## From a clone
 
@@ -258,7 +253,7 @@ hal0 doctor
 ```
 
 Re-runs the pre-flight pack against the live host. Handy after a
-kernel upgrade, a Docker daemon swap, or whenever something feels off.
+kernel upgrade, a podman update, or whenever something feels off.
 
 ## Next steps