From bafcec19cf2aaa64be068b93d5d2348f18283f48 Mon Sep 17 00:00:00 2001
From: "Alexander (via Claude)" <alexander@awideweb.com>
Date: Fri, 12 Jun 2026 02:30:46 -0400
Subject: [PATCH] =?UTF-8?q?content:=20lemonade=20=E2=86=92=20container-run?=
 =?UTF-8?q?time=20sweep=20(hal0=20Phase=20F)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

- CONTENT_BRIEF: podman per-slot containers, llama-server toolbox
  images, GPU arbiter + FLM trio, honest no-auth posture, React 18
- install/provider-matrix/what-is-a-slot/index: podman + container
  claims; Caddy/basic-auth copy replaced with reverse-proxy-at-edge
- SITE-FIXES tracker items updated to the container story

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
---
 CONTENT_BRIEF.md                              | 85 +++++++++++--------
 docs/SITE-FIXES.md                            |  8 +-
 .../docs/docs/getting-started/install.mdx     | 47 +++++-----
 .../docs/docs/reference/provider-matrix.mdx   | 58 +++++++------
 .../docs/docs/slots/what-is-a-slot.mdx        | 33 +++----
 src/pages/index.astro                         | 10 +--
 6 files changed, 126 insertions(+), 115 deletions(-)
diff --git a/CONTENT_BRIEF.md b/CONTENT_BRIEF.md
index 27177fb..b1153a2 100644
--- a/CONTENT_BRIEF.md
+++ b/CONTENT_BRIEF.md
@@ -12,12 +12,12 @@
 
 hal0 is a homelab AI inference platform: the Linux box you already
 have in the rack, running real OpenAI-compatible inference. It manages
-model **slots** as systemd units with a typed lifecycle state machine,
-exposes an **OpenAI-compatible `/v1/*` API**, and ships with a Vue
-**dashboard** plus a **prewired OpenWebUI** chat tab. One command
-installs on any modern Linux box — Strix Halo iGPU, AMD discrete,
-NVIDIA, or CPU — and it's happy in a privileged Proxmox LXC with
-GPU/NPU passthrough, behind your Traefik.
+model **slots** as podman containers under per-slot systemd units with
+a typed lifecycle state machine, exposes an **OpenAI-compatible
+`/v1/*` API**, and ships with a React **dashboard** plus a **prewired
+OpenWebUI** chat tab. One command installs on any modern Linux box —
+Strix Halo iGPU, AMD discrete, NVIDIA, or CPU — and it's happy in a
+privileged Proxmox LXC with GPU/NPU passthrough, behind your Traefik.
 
 (Synthesised from `hal0/README.md` lines 1–24 and `hal0/PLAN.md` §1.)
 
@@ -62,15 +62,15 @@ Overrides (env vars, from `installer/install.sh`):
 `HAL0_NO_PROBE`, per-backend `HAL0_TOOLBOX_IMAGE_*`.
 
 Status caveat: the installer is real and produces a running
-`hal0-api`. As of v0.1.1 (2026-05-22) **all six toolbox images** —
+`hal0-api`. All six toolbox images —
 `vulkan`, `rocm`, `flm`, `moonshine`, `kokoro`, `comfyui` — are
 published to `ghcr.io/hal0ai/` and pinned by sha256 digest in
-`hal0/manifest.json` (`toolbox_images.*.digest`). FLM chat + embed
-are surfaced in the picker when XDNA hardware and the local toolbox
-image are both present; STT slice deferred. v0.1.1 (2026-05-22) is
-the first install that completes end-to-end on non-Strix-Halo hosts:
-WSL 2 (with systemd), Proxmox VMs, and bare-metal Linux with discrete
-GPUs all probe + wizard cleanly. APIs may shift before v1.0.
+`hal0/manifest.json` (`toolbox_images.*.digest`). Slots run as
+**podman containers** under per-slot `hal0-slot@<name>.service`
+systemd units, orchestrated by hal0-api; container images and tuned
+flags come from slot profiles. FLM chat + embed are surfaced in the
+picker when XDNA hardware and the local toolbox image are both present.
+APIs may shift before v1.0.
 
 ### Installer overhaul (2026-05-15)
 
@@ -177,9 +177,13 @@ and curated models.
    warming → ready → serving ↔ idle → unloading; error sideband).
    Persisted to `state.json`, streamed over SSE. (PLAN §5 Tier 3,
    `src/hal0/slots/state.py`)
-3. **systemd-managed containers** — each slot is an instance of the
-   `hal0-slot@.service` template unit. The API process never holds a
-   model in its own memory. (`ARCHITECTURE.md` §Process model)
+3. **systemd-managed podman containers** — each slot runs as a podman
+   container under a `hal0-slot@<name>.service` unit; container images
+   and tuned flags come from slot profiles. The API process never holds
+   a model in its own memory. An exclusive GPU arbiter swaps the iGPU
+   between LLM serving and ComfyUI image generation. The NPU runs a
+   single FastFlowLM container serving chat + ASR + embeddings.
+   (`ARCHITECTURE.md` §Process model)
 4. **Hardware-aware probe** — detects GPU / NPU / unified memory (UMA
    pool on Strix Halo), writes `/etc/hal0/hardware.json`, surfaces
    VRAM/RAM fit warnings inline in the slot form. (PLAN §6,
@@ -191,9 +195,9 @@ and curated models.
 6. **Bundled prewired OpenWebUI** — chat UI on `:3001`, zero config:
    the installer writes `openwebui.env` pointing at the local hal0 API.
    (PLAN §8)
-7. **Vue 3 + Tailwind 4 dashboard** — 9 views: Dashboard, Slots,
-   Models, Hardware, Logs, Settings, Providers, FirstRun, plus error
-   shell. Dark mode default; SSE for status + log tail. (PLAN §6)
+7. **React 18 + TypeScript + Vite dashboard** — 9 views: Dashboard,
+   Slots, Models, Hardware, Logs, Settings, Providers, FirstRun, plus
+   error shell. Dark mode default; SSE for status + log tail. (PLAN §6)
 8. **Atomic self-update with rollback** — `hal0 update --channel
    stable|nightly`; cosign-verified tarballs swap a
    `/usr/lib/hal0/current` symlink; `--rollback` reverts. (PLAN §9,
@@ -211,12 +215,12 @@ and curated models.
     The dashboard operates ComfyUI as a containerized generation
     engine with a gated inference ⇄ generation iGPU switchover
     (hal0 PRs #686/#690, 2026-06-11 — see "Image generation" below).
-12. **Optional Caddy reverse proxy with basic_auth + Bearer token POC**
-    — `install.sh --auth=basic` provisions Caddy, writes a hashed
-    `basic_auth`, mints a Bearer token, and round-trips a self-test
-    against `https://${HAL0_HOSTNAME}/api/health` before exiting
-    (commits `ba79427`, `f62902c`; install.sh lines 294+ and 645+).
-    Trusted-LAN posture remains the default (`--auth=off`).
+12. **Trusted-LAN default — no built-in auth** — hal0-api binds
+    `0.0.0.0:8080` with no network gate (ADR-0012; Caddy / `--auth=basic`
+    removed in v0.3). Deploy behind your own Traefik/nginx/Cloudflare
+    Tunnel when you need authentication or TLS. The Origin allowlist
+    and HMAC session cookie still gate the chat-proxy WebSocket path.
+    (`src/hal0/api/routes/agents/_auth.py`)
 13. **Proxmox host-pressure segment (LXC deployments)** — drop a
     read-only `PVEAuditor` API token into Settings → "Proxmox
     integration" and the dashboard's unified-memory bar shows the
@@ -238,8 +242,8 @@ The five always-present slots (`BUILTIN_SLOTS` in
 
 | Slot | What it does | Default backend |
 |---|---|---|
-| `primary` | Chat / general LLM (`/v1/chat/completions`, `/v1/completions`) | llama.cpp (Vulkan) |
-| `embed`   | Embeddings (`/v1/embeddings`) and rerank (`/v1/rerankings`) | llama.cpp (Vulkan) |
+| `primary` | Chat / general LLM (`/v1/chat/completions`, `/v1/completions`) | llama-server (Vulkan) |
+| `embed`   | Embeddings (`/v1/embeddings`) and rerank (`/v1/rerankings`) | llama-server (Vulkan) |
 | `stt`     | Speech-to-text (`/v1/audio/transcriptions`) | Moonshine |
 | `tts`     | Text-to-speech (`/v1/audio/speech`) | Kokoro |
 | `img`     | Image generation (`/v1/images/generations`) | ComfyUI (ROCm) |
@@ -316,16 +320,22 @@ From `hal0/PLAN.md` §1 + `src/hal0/providers/`:
 
 | Provider | Hardware | What it serves |
 |---|---|---|
-| **llama.cpp** | Vulkan (default) / ROCm (opt-in) | chat, embed, rerank, vision |
-| **FLM** | AMD XDNA NPU (opt-in) | chat / embed / ASR multiplex (chat + embed surfaced to picker today; STT slice deferred) |
-| **Moonshine** | CPU / Vulkan | STT (`/v1/audio/transcriptions`) |
+| **llama.cpp** (llama-server) | Vulkan (default) / ROCm (opt-in) | chat, embed, rerank, vision |
+| **FLM** | AMD XDNA NPU (opt-in) | chat / embed / ASR multiplex — one FastFlowLM container serves all three |
+| **Moonshine** | CPU only | STT (`/v1/audio/transcriptions`) |
 | **Kokoro** | CPU / Vulkan | TTS (`/v1/audio/speech`) |
-| **ComfyUI** | ROCm (Strix Halo iGPU class) | Image gen (`/v1/images/generations`) |
+| **ComfyUI** | ROCm (Strix Halo iGPU class) | Image gen (`/v1/images/generations`) — exclusive GPU arbiter swaps iGPU between inference and generation |
 
-All five are first-class in v1. Each provider is a class with
+All five are first-class in v1. Every slot runs as a **podman container**
+under a `hal0-slot@<name>.service` unit; container images + tuned flags
+come from profiles. Each provider is a class with
 `build_env() / start_cmd() / health() / infer()` — stateless, swappable
 (`ARCHITECTURE.md` §Key boundaries).
 
+**Provider name in TOML**: use `llama-server` (not `llama.cpp`) in slot
+TOML files — `_VALID_PROVIDERS` = `{llama-server, flm, moonshine, kokoro}`
+(`src/hal0/config/schema.py:89`).
+
 ### FLM NPU (AMD XDNA) deep-dive
 
 FLM is live as the NPU provider — opt-in, surfaced in the picker only
@@ -652,7 +662,7 @@ only `models_dir` raises (see comments at
   (`src/hal0/api/routes/models.py:148`,
   `src/hal0/registry/detect.py:140`)
 - Per-slot live metrics endpoint — `GET /api/slots/metrics` reads
-  docker container cgroup memory + `ActiveEnterTimestampMonotonic` +
+  podman container cgroup memory + `ActiveEnterTimestampMonotonic` +
   scraped llama-server `/metrics`; falls back to the systemd unit's
   own `MemoryCurrent` for native-host slots
   (`src/hal0/api/routes/slots.py:376–467`)
@@ -927,10 +937,11 @@ a stale placeholder.
   versioned dir), `/etc/hal0/` (config, preserved across updates),
   `/var/lib/hal0/` (models, registry, OpenWebUI state). `HAL0_HOME`
   overrides all of the above for dev installs. (PLAN §2)
-- **systemd template units** — slots are `hal0-slot@<name>.service`
-  instances (`packaging/systemd/hal0-slot@.service`), not per-slot
-  hand-written units. One template, N instances, all rendered from
-  config.
+- **systemd template units + podman containers** — slots are
+  `hal0-slot@<name>.service` instances that each launch a **podman**
+  container; not per-slot hand-written units and not Docker Compose.
+  One template, N instances, all rendered from config. Container images
+  and flags come from slot profiles managed by hal0-api.
 - **Atomic TOML config** — every config write is
   `NamedTemporaryFile(delete=False) + os.replace()`; a failed write
   leaves the prior file intact (PLAN §5 Tier 1).
diff --git a/docs/SITE-FIXES.md b/docs/SITE-FIXES.md
index 1d3675f..84b78a9 100644
--- a/docs/SITE-FIXES.md
+++ b/docs/SITE-FIXES.md
@@ -18,8 +18,8 @@
 - [ ] **Dashboard framework wrong** — "Vue 3" → **React 18 + TypeScript + Vite** · `src/pages/index.astro:64`
 - [ ] **Version eyebrow 1–2 minors stale** → `0.3.2-alpha.1` (sweep ALL version strings; ideally template from release manifest) · `src/pages/index.astro:13`, `src/content/docs/docs/index.mdx:16`, `src/content/docs/docs/operate/updates.mdx:13`
 - [ ] **Caddy / `--auth=basic` / auto-HTTPS = broken install** — entire auth/TLS subsystem retired (ADR-0012; API binds `0.0.0.0:8080` open). Delete/rewrite to the honest `no-auth-default` story · `src/pages/index.astro:361`, `src/content/docs/docs/operate/auth.mdx:228,288`
-- [ ] **Per-slot toolbox containers / `hal0-slot@.service` / 6 published images** — superseded by the single `lemond` runtime; slots are logical name→model mappings. Rewrite slot + provider-matrix docs · `getting-started/install.mdx:154`, `reference/provider-matrix.mdx:31`, `slots/what-is-a-slot.mdx:53`, `custom-slots.mdx:65`
-- [ ] **Provider matrix = pre-Lemonade set** (Moonshine/ComfyUI/Kokoro first-class) — rewrite around Lemonade-unified runtime · `reference/provider-matrix.mdx:12,39,62,76,90,99,110`, `api/audio.mdx:15`
+- [ ] **Per-slot container runtime story** — slots run as **podman** containers under `hal0-slot@<name>.service` units (not Docker, not lemond); container images + flags from profiles; GPU arbiter for iGPU; NPU = single FastFlowLM container. Rewrite slot + provider-matrix + install docs · `getting-started/install.mdx:66,103,138,156`, `reference/provider-matrix.mdx:31`, `slots/what-is-a-slot.mdx:48-52`, `custom-slots.mdx:65`
+- [ ] **Provider matrix uses wrong provider names** — `llama.cpp` → `llama-server` in TOML examples and matrix; `_VALID_PROVIDERS = {llama-server, flm, moonshine, kokoro}` · `reference/provider-matrix.mdx:20`, `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
 - [ ] **Hermes-Agent labelled "(roadmap: soon)" but SHIPPED** — promote to shipped; document install + dashboard chat surface · `src/pages/index.astro:346`
 
 ## ⛔ BLOCKED — backend must reconcile code↔docs first (do NOT edit site yet)
@@ -51,7 +51,7 @@ _(Most marketing-worthy first. Each is real + user-facing.)_
 - [ ] **Bundle-tier first-run picker** — Lite/Default/Pro/Max/LMX-Omni, RAM-gated · `api/routes/bundles.py:46-82`
 - [ ] **NPU FLM trio** — agent + stt-npu + embed-npu from one `flm serve` · `CONTEXT.md:115-136`
 - [ ] **Bundled pi-coder agent** (single-pick with Hermes) · `agents/pi_coder.py`
-- [ ] **Lemonade admin panel** — `GET/POST /api/lemonade/config` · `api/routes/lemonade_admin.py:297`
+- [ ] **Slot container admin** — expose container-runtime management story (podman slot containers, GPU arbiter, profile-driven image selection) in the docs; `api/routes/lemonade_admin.py` is removed in the lemonade-removal epic
 - [ ] **Prometheus metrics endpoint** · `api/routes/health.py:112`
 - [ ] **Merged journal SSE** — `/api/journal` + `/stream` + Lemonade log proxy · `api/routes/journal.py:203`
 - [ ] **HMAC session cookie** for agents chat proxy (HttpOnly, SameSite=Lax, 8h) · `api/agents/_auth.py`
@@ -67,7 +67,7 @@ _New items beyond the drift report. Full detail + verification notes in the back
 
 ### 🔴 HIGH — OSS / broken copy-paste
 - [ ] **B2 — Scrub private IPs/domains from the website repo** · `astro.config.mjs:93` (`allowedHosts ['.thinmint.dev']` → localhost) · `operate/openwebui.mdx:79,81` (`hal0.thinmint.dev` → `hal0.local`) · `operate/auth.mdx:31,331,337` (`10.0.1.230` — goes away with B1)
-- [ ] **B3 — Fix invalid `--provider` / `provider =` examples (they error on validation)** · valid set `{lemonade,llama-server,flm,moonshine,kokoro}` · `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
+- [ ] **B3 — Fix invalid `--provider` / `provider =` examples (they error on validation)** · valid set `{llama-server,flm,moonshine,kokoro}` (lemonade removed in phase-F epic) · `hardware/amd-discrete.mdx:69`, `hardware/nvidia.mdx:160`, `reference/config-schema.mdx:50`
 
 ### 🟡 MEDIUM
 - [ ] **B6 — Remove `hal0-slot@.service` from install step 5** (template removed PR-9) · `getting-started/install.mdx:137-148`
diff --git a/src/content/docs/docs/getting-started/install.mdx b/src/content/docs/docs/getting-started/install.mdx
index ccb6a23..f80bc6a 100644
--- a/src/content/docs/docs/getting-started/install.mdx
+++ b/src/content/docs/docs/getting-started/install.mdx
@@ -60,10 +60,10 @@ below).
 ## Deployment shapes
 
 hal0 doesn't care which slice of your homelab it runs on, as long as
-the kernel speaks systemd and Docker. Three shapes worth naming:
+the kernel speaks systemd and podman. Three shapes worth naming:
 
 - **Bare-metal Linux.** The simplest case. `hal0-api`, `hal0-openwebui`,
-  and the per-slot toolbox containers all live on the host.
+  and the per-slot podman containers all live on the host.
 - **VM.** Works if you give it enough RAM and pass the GPU through.
   Eats more host memory than an LXC for the same workload.
 - **Privileged LXC on Proxmox (recommended for homelabs).** GPU and
@@ -99,8 +99,9 @@ loads one.
 
 2. **x86_64.** ARM is not currently supported.
 
-3. **Docker reachable.** The slot toolboxes run as containers. The
-   installer checks `docker ps` before doing anything destructive.
+3. **Podman reachable.** Slot inference containers run under podman.
+   The installer checks podman availability before doing anything
+   destructive.
 
 4. **At least 20 GB free under `/var/lib`.** Models, registry, and
    OpenWebUI state land there. Override via `--models-dir=/path` if
@@ -113,7 +114,7 @@ loads one.
 ## What the installer does
 
 <Steps>
-1. **Pre-flight checks.** Verifies systemd, x86\_64, Docker, free
+1. **Pre-flight checks.** Verifies systemd, x86\_64, podman, free
    space, and free ports. Bails before touching anything if a check
    fails. Re-runnable on its own as `hal0 doctor`.
 
@@ -134,10 +135,11 @@ loads one.
    pull a model and flip it on yourself. Existing files are never
    overwritten on re-run.
 
-5. **systemd units.** Drops `hal0-api.service`,
-   `hal0-openwebui.service`, and the `hal0-slot@.service` template into
-   `/etc/systemd/system/`. Reloads the daemon, enables and starts the
-   API plus OpenWebUI.
+5. **systemd units.** Drops `hal0-api.service` and
+   `hal0-openwebui.service` into `/etc/systemd/system/`. The
+   `hal0-slot@.service` template is written separately for per-slot
+   podman containers. Reloads the daemon, enables and starts the API
+   plus OpenWebUI.
 
 6. **`hal0` on PATH.** Symlinks `/usr/local/bin/hal0` →
    `${VENV_DIR}/bin/hal0` so the CLI works without sourcing anything.
@@ -153,7 +155,7 @@ The installer needs `sudo` to drop systemd unit files in
 `/etc/systemd/system/`, create the `/usr/lib/hal0/` tree, and
 optionally write the `/usr/local/bin/hal0` symlink. The `hal0-api`
 service itself runs as the unprivileged `hal0` user, as do the
-per-slot toolbox containers (each declares `User=hal0` in the
+per-slot podman containers (each declares `User=hal0` in the
 template unit).
 </Aside>
 
@@ -209,22 +211,15 @@ then `/var/lib/hal0/models`. The path is persisted as
 `[models].pull_root` and auto-included in `[models].roots` so a fresh
 install scans the existing tree on first boot.
 
-## Authentication & HTTPS (optional)
+## Authentication & HTTPS
 
-The default install has no auth in front. Fine on a trusted home LAN
-behind your Traefik or Caddy. For exposed deployments add
-`--auth=basic`:
-
-```sh frame="terminal"
-sudo bash installer/install.sh --auth=basic
-```
-
-The installer prompts for an admin user and password, installs Caddy,
-generates a TLS cert (self-signed for `.local` hostnames, Let's
-Encrypt for real domains), mints a Bearer token, and round-trips
-`https://${HAL0_HOSTNAME}/api/health` as a self-test before exiting.
-See [Authentication & HTTPS](/docs/operate/auth/) for the full flow
-including client-side cert trust and ACME setup.
+hal0-api binds `0.0.0.0:8080` with no built-in auth or TLS. On a
+trusted home LAN behind your Traefik this is the correct default.
+For exposed deployments, put a reverse proxy (Traefik, nginx,
+Cloudflare Tunnel) in front and handle auth and TLS there.
+See [Authentication & Security](/docs/operate/auth/) for recommended
+patterns including WebSocket passthrough and the MCP transport
+allowlists.
 
 ## From a clone
 
@@ -258,7 +253,7 @@ hal0 doctor
 ```
 
 Re-runs the pre-flight pack against the live host. Handy after a
-kernel upgrade, a Docker daemon swap, or whenever something feels off.
+kernel upgrade, a podman update, or whenever something feels off.
 
 ## Next steps
 
diff --git a/src/content/docs/docs/reference/provider-matrix.mdx b/src/content/docs/docs/reference/provider-matrix.mdx
index 91faea3..b27d554 100644
--- a/src/content/docs/docs/reference/provider-matrix.mdx
+++ b/src/content/docs/docs/reference/provider-matrix.mdx
@@ -1,40 +1,42 @@
 ---
 title: Provider matrix
-description: llama.cpp, FLM, Moonshine, Kokoro, ComfyUI — what each handles, on what hardware.
+description: llama-server, FLM, Moonshine, Kokoro, ComfyUI — what each handles, on what hardware.
 sidebar:
   order: 3
 ---
 
 import { Aside } from '@astrojs/starlight/components';
 
-Five providers ship first-class in v0.1.0-alpha. Each is a class with
-a small contract (`build_env() / start_cmd() / health() / infer()`)
-that makes them stateless and swappable. The slot lifecycle is
-provider-agnostic; what changes between providers is the workload
+Five providers ship first-class. Each runs as a **podman container**
+under a `hal0-slot@<name>.service` unit; container images and tuned
+flags come from slot profiles managed by hal0-api. Each provider is a
+class with a small contract (`build_env() / start_cmd() / health() /
+infer()`) that makes them stateless and swappable. The slot lifecycle
+is provider-agnostic; what changes between providers is the workload
 they serve and the hardware they target.
 
 ## The matrix
 
-| Provider     | Hardware                              | What it serves                                |
-|--------------|---------------------------------------|-----------------------------------------------|
-| **llama.cpp**| Vulkan (default) / ROCm (opt-in)      | chat, embed, rerank, vision                   |
-| **FLM**      | AMD XDNA NPU (opt-in)                 | chat / embed / ASR multiplex                  |
-| **Moonshine**| CPU only                              | STT (`/v1/audio/transcriptions`)              |
-| **Kokoro**   | CPU / Vulkan                          | TTS (`/v1/audio/speech`)                      |
-| **ComfyUI**  | ROCm (Strix Halo iGPU class)          | Image gen (`/v1/images/generations`)          |
+| Provider          | TOML name      | Hardware                         | What it serves                          |
+|-------------------|----------------|----------------------------------|-----------------------------------------|
+| **llama-server**  | `llama-server` | Vulkan (default) / ROCm (opt-in) | chat, embed, rerank, vision             |
+| **FLM**           | `flm`          | AMD XDNA NPU (opt-in)            | chat / embed / ASR multiplex            |
+| **Moonshine**     | `moonshine`    | CPU only                         | STT (`/v1/audio/transcriptions`)        |
+| **Kokoro**        | `kokoro`       | CPU / Vulkan                     | TTS (`/v1/audio/speech`)                |
+| **ComfyUI**       | `comfyui`      | ROCm (Strix Halo iGPU class)     | Image gen (`/v1/images/generations`)    |
 
-All five are first-class as of v0.1.0-alpha.1. FLM is opt-in because
-XDNA NPU support depends on AMD's driver stack being present (kernel
->= 6.11 with the `amdxdna` driver on the host) and a local FLM toolbox
-image; the picker only advertises NPU when both are detected.
+All five are first-class. FLM is opt-in because XDNA NPU support
+depends on AMD's driver stack being present (kernel >= 6.11 with the
+`amdxdna` driver on the host) and a local FLM toolbox image; the
+picker only advertises NPU when both are detected.
 
 All six toolbox images are published to `ghcr.io/hal0ai/` with pinned
 sha256 digests in `manifest.json`: `hal0-toolbox-vulkan`,
 `hal0-toolbox-rocm`, `hal0-toolbox-flm`, `hal0-toolbox-moonshine`,
 `hal0-toolbox-kokoro`, `hal0-toolbox-comfyui`. (Six images, five
-providers — llama.cpp ships both a Vulkan and a ROCm toolbox.)
+providers — llama-server ships both a Vulkan and a ROCm toolbox.)
 
-## llama.cpp
+## llama-server
 
 The default for `primary` and `embed`. Handles:
 
@@ -55,7 +57,7 @@ Backend modes:
   where Vulkan leaves performance on the table.
 
 The CUDA path on NVIDIA uses CUDA-backed llama.cpp through the same
-provider.
+provider. Use `provider = "llama-server"` in slot TOML for all three.
 
 ## FLM
 
@@ -110,12 +112,12 @@ alongside a primary chat model.
 
 Every provider implements:
 
-| Method        | What it does                                                |
-|---------------|-------------------------------------------------------------|
-| `build_env()` | Compute the env file the systemd unit will consume.         |
-| `start_cmd()` | The argv to run inside the toolbox image.                   |
-| `health()`    | Cheap probe to decide `warming → ready`.                    |
-| `infer()`     | The request path the dispatcher proxies to.                 |
+| Method        | What it does                                                    |
+|---------------|-----------------------------------------------------------------|
+| `build_env()` | Compute the env file the systemd unit and container will use.   |
+| `start_cmd()` | The argv passed to podman to launch the container.              |
+| `health()`    | Cheap probe to decide `warming → ready`.                        |
+| `infer()`     | The request path the dispatcher proxies to.                     |
 
 The slot lifecycle (`offline → pulling → starting → warming → ready
 → serving ↔ idle → unloading`) is identical across providers. Adding
@@ -124,7 +126,7 @@ changes required.
 
 <Aside type="note">
   The architecture allows third-party providers (MCP shims, custom
-  shims for proprietary endpoints). v0.1.0-alpha.1 ships the five
-  providers above (llama.cpp, FLM, Moonshine, Kokoro, ComfyUI); v0.2
-  expands.
+  shims for proprietary endpoints). The five providers above
+  (llama-server, FLM, Moonshine, Kokoro, ComfyUI) are first-class;
+  all run as podman containers under per-slot systemd units.
 </Aside>
diff --git a/src/content/docs/docs/slots/what-is-a-slot.mdx b/src/content/docs/docs/slots/what-is-a-slot.mdx
index 33da6d5..7f461cc 100644
--- a/src/content/docs/docs/slots/what-is-a-slot.mdx
+++ b/src/content/docs/docs/slots/what-is-a-slot.mdx
@@ -15,9 +15,11 @@ slot that owns the model, and the slot answers.
 
 Concretely, each slot is a real systemd unit (e.g.
 `hal0-slot@primary.service`, an instance of the `hal0-slot@.service`
-template) running inside your LXC. `systemctl status hal0-slot@primary`
-works the way you expect it to. The slot shares the LXC's unified
-memory pool with any other Proxmox tenants on the same node.
+template) that launches a **podman container** on the host.
+`systemctl status hal0-slot@primary` works the way you expect it to.
+Container images and tuned flags come from slot profiles managed by
+hal0-api. The slot shares the LXC's unified memory pool with any
+other Proxmox tenants on the same node.
 
 ## Why slots exist
 
@@ -45,11 +47,12 @@ Each slot has:
 - A **name** (`primary`, `embed`, `stt`, `tts`, `img`, or a user-defined name).
 - A **model assignment** (a registry ref like
   `qwen2.5-0.5b-instruct-q4_k_m`).
-- A **provider** (`llama.cpp`, `flm`, `moonshine`, `kokoro`, `comfyui`)
-  that knows how to build the env, start the process, and run a health
-  probe.
+- A **provider** (`llama-server`, `flm`, `moonshine`, `kokoro`, or
+  `comfyui`) that knows how to build the env, start the container, and
+  run a health probe.
 - A **systemd unit**, an instance of the `hal0-slot@.service`
-  template (e.g. `hal0-slot@primary.service`).
+  template (e.g. `hal0-slot@primary.service`), which launches a
+  podman container.
 - A **port** in the range `8081`–`8099`, bound to `127.0.0.1` only.
 - A **state file** at `/var/lib/hal0/slots/<name>/state.json`,
   updated atomically on every transition and streamed to clients
@@ -103,14 +106,14 @@ to that slot's local port.
 
 ## What a slot is not
 
-- **Not a container manager.** Slots use plain systemd template
-  units, not Docker Compose or Kubernetes. Containerised backends
-  (toolbox images for FLM, ROCm, ComfyUI, etc.) are an implementation
-  detail of each provider.
-- **Not a model cache.** Weights live under `/mnt/ai-models/local`
-  with the index at `/var/lib/hal0/registry/registry.toml` (see the
-  [model registry](/docs/slots/model-registry/)); slots only reference
-  registry entries.
+- **Not a container orchestrator.** Slots use `hal0-slot@.service`
+  systemd template units that each launch one podman container — not
+  Docker Compose or Kubernetes. Container images and flags are an
+  implementation detail managed by each provider profile.
+- **Not a model cache.** Weights live under `/var/lib/hal0/models/`
+  (default) with the index at `/var/lib/hal0/registry/registry.toml`
+  (see the [model registry](/docs/slots/model-registry/)); slots only
+  reference registry entries.
 - **Not multi-tenant inside hal0.** Slot names are global to the
   install. There's no per-user partitioning in the v0.1 alpha line;
   agent / multi-tenant work is on the v0.2 roadmap. Multi-tenancy
diff --git a/src/pages/index.astro b/src/pages/index.astro
index 0764579..5d254f5 100644
--- a/src/pages/index.astro
+++ b/src/pages/index.astro
@@ -61,14 +61,14 @@ const features = [
   },
   {
     tag: 'dashboard',
-    title: 'Vue 3 + Tailwind 4 admin UI',
+    title: 'React 18 + Tailwind admin UI',
     body: 'Dark-by-default operator console. SSE-backed status and log tail. Capability cards group flat slots into embed / voice / image picks for one-click ops.',
   },
 ];
 
 const providers = [
   {
-    prov: 'llama.cpp',
+    prov: 'llama-server',
     ver: 'b9279',
     hw: 'Vulkan (default) · ROCm · CUDA',
     uses: 'chat · embed · rerank · vision',
@@ -276,9 +276,9 @@ const themes: RoadmapTheme[] = [
     blurb: 'The /v1/* surface and the engines behind it.',
     shipped: [
       { t: 'OpenAI-compatible /v1/* API', d: 'Chat, completions, embeddings, rerank, transcriptions, speech, images. Every OpenAI SDK works unchanged against the local box.' },
-      { t: 'Five-provider stack', d: 'llama.cpp (Vulkan / ROCm / CUDA) for chat and embed, FLM for the XDNA NPU, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation.' },
+      { t: 'Five-provider stack', d: 'llama-server (Vulkan / ROCm / CUDA) for chat and embed, FLM for the XDNA NPU, Moonshine for STT, Kokoro for TTS, ComfyUI for image generation. Each runs as a podman container under per-slot systemd units.' },
       { t: 'Image generation', d: 'POST /v1/images/generations served by a ComfyUI provider on ROCm. Curated SDXL Turbo / SD 1.5 / Flux Schnell with license badges.' },
-      { t: 'FLM NPU provider live', d: 'Self-contained ghcr.io/hal0ai/hal0-toolbox-flm:v1, pinned by sha256. Chat + embed surfaced in the picker only when XDNA hardware is present.' },
+      { t: 'FLM NPU provider live', d: 'Self-contained ghcr.io/hal0ai/hal0-toolbox-flm:v1, pinned by sha256. Single FastFlowLM container serves chat + ASR + embeddings on the NPU; surfaced in the picker only when XDNA hardware is present.' },
     ],
     soon: [],
     later: [
@@ -294,7 +294,7 @@ const themes: RoadmapTheme[] = [
       { t: 'Slot lifecycle state machine', d: 'Atomic transitions (offline → pulling → starting → warming → ready → serving), persisted and SSE-streamed.' },
       { t: 'Capability slots overlay', d: 'Embed / Voice / Img capability cards plus an NPU backend rollup, backed by /etc/hal0/capabilities.toml.' },
       { t: 'embed-rerank built-in slot', d: 'Auto-created on first enable: bge-reranker-v2-m3-q4_k_m on :8086 with --reranking. /v1/rerankings stays separate from chat.' },
-      { t: 'Per-slot live metrics', d: 'GET /api/slots/metrics reads docker cgroup memory + ActiveEnterTimestampMonotonic + scraped /metrics.' },
+      { t: 'Per-slot live metrics', d: 'GET /api/slots/metrics reads podman container cgroup memory + ActiveEnterTimestampMonotonic + scraped /metrics.' },
       { t: 'Orchestrator drift reconcile', d: 'apply() reconciles capabilities.toml ↔ slots/*.toml on every call, not only on selection diff.' },
     ],
     soon: [