Hal0ai · thinmintdev · Jun 17, 2026 · Jun 17, 2026
diff --git a/docs/guides/configure.mdx b/docs/guides/configure.mdx
@@ -23,7 +23,7 @@ with it.
 | File | What it holds |
 |------|---------------|
 | `hal0.toml` | Top-level config: `[meta] schema_version`, `[slots] port_range_start/end`, `[models] store`, telemetry channel. |
-| `api.env` | Environment for the hal0-api service (systemd `EnvironmentFile`) — feature flags and env knobs like `HAL0_MODEL_STORE`, `HAL0_COMFYUI_SWITCHOVER_ENABLED`, `HAL0_MCP_ALLOWED_HOSTS`. |
+| `api.env` | Environment for the hal0-api service (systemd `EnvironmentFile`) — feature flags and env knobs like `HAL0_MODEL_STORE`, `HAL0_MCP_ALLOWED_HOSTS`. |
 | `upstreams.toml` | External / upstream provider routing. |
 | `providers.toml` | Provider credential references (key **names**, never plaintext secrets). |
 | `profiles.toml` | Optional profile catalogue. Absent → built-in seed profiles are used. |

diff --git a/docs/guides/generate-images.mdx b/docs/guides/generate-images.mdx
@@ -21,9 +21,9 @@ reloaded.
 
 <Aside type="caution">
 Switching the iGPU to generation **stops the LLM stack** (chat bots and
-memory extraction go dark) for as long as generation holds the GPU. The
-switchover write path is gated off by default — see
-[Enable the switchover](#enable-the-switchover).
+memory extraction go dark) for as long as generation holds the GPU. hal0
+does this automatically when a render needs the iGPU, then idle-restore
+hands it back to inference.
 </Aside>
 
 ## Read the generation-engine status
@@ -86,8 +86,7 @@ Body fields:
 Responses: **202** (`switching`, runs in the background — poll the
 `switchover` block on `/status`), **200** (`noop`, already in the target
 mode), **409** (switch already in flight, or a busy queue without
-`force`), **501** (switchover disabled on this host), **503** (arbiter not
-wired).
+`force`), **503** (arbiter not wired).
 
 Pin or unpin image mode independently:
 
@@ -97,20 +96,6 @@ curl -X POST http://localhost:8080/api/comfyui/pin \
   -d '{"pinned": true}'
 ```
 
-### Enable the switchover
-
-Both write paths (`/switchover` and `/pin`) stay gated behind an
-environment flag because the switch takes the LLM stack offline — an
-operator decision per host. Set it on hal0-api:
-
-```sh
-HAL0_COMFYUI_SWITCHOVER_ENABLED=1
-```
-
-Without it, both endpoints answer **501** (`comfyui.switchover_disabled`).
-See [Configure](/docs/guides/configure) for where to set environment
-keys (`api.env`).
-
 ## Generate an image
 
 The OpenAI-compatible endpoint translates your request into a ComfyUI
@@ -170,5 +155,5 @@ caps a single render at 10 minutes before it gives up.
 ## See also
 
 - [Manage slots](/docs/guides/manage-slots) — the `img` slot lifecycle.
-- [Configure](/docs/guides/configure) — set `HAL0_COMFYUI_SWITCHOVER_ENABLED` and the `[image]` slot defaults.
+- [Configure](/docs/guides/configure) — set the `[image]` slot defaults.
 - [Observe the system](/docs/guides/logs-and-activity) — tail `hal0-slot@img`.
diff --git a/installer/comfyui/custom_nodes/hal0_gpu_gate.py b/installer/comfyui/custom_nodes/hal0_gpu_gate.py
@@ -1,28 +1,29 @@
-"""hal0 GPU gate — ComfyUI custom node that 403-blocks job submission while
-the Strix Halo iGPU is serving the LLM stack.
+"""hal0 GPU prepare hook — asks hal0 to enter image mode before job submit.
 
 The ComfyUI container is RESIDENT on hal0 (its web UI stays up in both GPU
-modes so users can build workflows/prompts any time), but GPU memory is
-exclusive: running a generation while the LLM slots hold GTT would OOM or
-evict them. hal0-api's GpuArbiter guards its own dispatch path, and this
-middleware closes the remaining door — the web UI's own "Queue Prompt"
-(``POST /prompt`` / ``POST /api/prompt``), which goes straight to ComfyUI.
+modes so users can build workflows/prompts any time), and queueing a prompt is
+now allowed from either mode. When the web UI's own "Queue Prompt"
+(``POST /prompt`` / ``POST /api/prompt``) arrives while hal0 still reports
+inference mode, this middleware asks hal0-api's GpuArbiter to switch to image
+mode and waits briefly for the switch to complete before allowing ComfyUI to
+continue.
 
 Deployment: this single file is dropped into the host-mounted
 ``custom_nodes`` dir (``/mnt/ai-models/comfyui/custom_nodes/``) — no image
 rebuild. ComfyUI imports it at startup; ``_install()`` appends an aiohttp
 middleware to the PromptServer app (the same mechanism ComfyUI-Login uses).
 The container runs with host networking, so hal0-api is on loopback.
 
-Fail-open by design: if hal0-api is unreachable the gate allows the prompt —
-a broken hal0 must never brick standalone ComfyUI use. The pure decision
-logic (``should_block``) is unit-tested in the hal0 repo
+Fail-open by design: if hal0-api is unreachable or the switch request fails,
+the hook allows the prompt — a broken hal0 must never brick standalone ComfyUI
+use. The pure decision logic (``should_prepare_image_mode``) is unit-tested in the hal0 repo
 (tests/comfyui/test_hal0_gpu_gate.py); the aiohttp wiring is exercised by
 the CT105 live verification.
 """
 
 import json
 import os
+import time
 import urllib.error
 import urllib.request
 
@@ -31,41 +32,52 @@
 HAL0_STATUS_URL = os.environ.get(
     "HAL0_COMFYUI_STATUS_URL", "http://127.0.0.1:8080/api/comfyui/status"
 )
+HAL0_SWITCHOVER_URL = os.environ.get(
+    "HAL0_COMFYUI_SWITCHOVER_URL", "http://127.0.0.1:8080/api/comfyui/switchover"
+)
 _STATUS_TIMEOUT_S = 2.0
 
+
+def _float_env(name: str, default: float) -> float:
+    try:
+        return float(os.environ.get(name, str(default)))
+    except ValueError:
+        return default
+
+
+_SWITCH_TIMEOUT_S = _float_env("HAL0_COMFYUI_SWITCH_TIMEOUT_S", 120.0)
+_SWITCH_POLL_S = _float_env("HAL0_COMFYUI_SWITCH_POLL_S", 0.5)
+
 #: Job-submission routes — and ONLY those. Everything the editor needs
 #: (/object_info, /queue GET, workflow save/load, uploads) always passes.
 _BLOCK_PATHS = frozenset({"/prompt", "/api/prompt"})
 
-#: Mirrors ComfyUI's own /prompt error envelope so the frontend renders the
-#: message instead of a generic failure toast.
-GATE_BODY = {
-    "error": {
-        "type": "hal0_gpu_gate",
-        "message": (
-            "The GPU is in inference mode (LLM slots loaded). Flip the "
-            "Image Gen switch in the hal0 dashboard, then queue again."
-        ),
-        "details": "hal0 GpuArbiter mode is 'inference'; generation is gated.",
-        "extra_info": {},
-    },
-    "node_errors": {},
-}
+def _is_prompt_submit(method: str, path: str) -> bool:
+    return method == "POST" and path in _BLOCK_PATHS
 
 
-def should_block(method: str, path: str, status: "dict | None") -> bool:
-    """True iff this request is a job submission while the GPU serves LLMs.
+def should_prepare_image_mode(method: str, path: str, status: "dict | None") -> bool:
+    """True iff this prompt submit should ask hal0 to enter image mode.
 
     ``status`` is hal0-api's /api/comfyui/status JSON (or None when
     unreachable / unparseable → fail-open).
     """
-    if method != "POST" or path not in _BLOCK_PATHS:
+    if not _is_prompt_submit(method, path):
         return False
     if not isinstance(status, dict):
         return False
     return status.get("mode") == "inference"
 
 
+def should_block(method: str, path: str, status: "dict | None") -> bool:
+    """Backward-compatible name for older tests/imports.
+
+    Prompt submission is no longer blocked by this custom node; it prepares
+    image mode best-effort, then lets ComfyUI handle the prompt.
+    """
+    return False
+
+
 def _fetch_status() -> "dict | None":
     """Blocking status fetch via urllib (run in a thread by the middleware).
 
@@ -80,8 +92,55 @@ def _fetch_status() -> "dict | None":
         return None
 
 
+def _post_switchover() -> bool:
+    """Ask hal0-api to enter generation mode; True means accepted/noop.
+
+    stdlib-only on purpose: custom nodes can't assume extra deps in the
+    ComfyUI venv.
+    """
+    body = json.dumps({"mode": "generation"}).encode("utf-8")
+    req = urllib.request.Request(
+        HAL0_SWITCHOVER_URL,
+        data=body,
+        headers={"Content-Type": "application/json"},
+        method="POST",
+    )
+    try:
+        with urllib.request.urlopen(req, timeout=_STATUS_TIMEOUT_S) as resp:
+            return resp.status in (200, 202)
+    except urllib.error.HTTPError as exc:
+        # 409 can mean a switch is already in flight. Poll status below.
+        return exc.code == 409
+    except (urllib.error.URLError, OSError, ValueError):
+        return False
+
+
+def prepare_image_mode() -> None:
+    """Best-effort inference → generation handoff before forwarding /prompt.
+
+    Fail-open: exceptions and timeouts return without blocking the prompt.
+    """
+    status = _fetch_status()
+    if not should_prepare_image_mode("POST", "/prompt", status):
+        return
+    if not _post_switchover():
+        return
+
+    deadline = time.monotonic() + max(0.0, _SWITCH_TIMEOUT_S)
+    while time.monotonic() < deadline:
+        status = _fetch_status()
+        if not isinstance(status, dict):
+            return
+        if status.get("mode") == "generation":
+            return
+        sw = status.get("switchover")
+        if isinstance(sw, dict) and sw.get("error"):
+            return
+        time.sleep(max(0.05, _SWITCH_POLL_S))
+
+
 def _install() -> None:
-    """Attach the gate middleware to ComfyUI's PromptServer (fail-soft)."""
+    """Attach the prepare middleware to ComfyUI's PromptServer (fail-soft)."""
     try:
         import asyncio
 
@@ -90,14 +149,15 @@ def _install() -> None:
 
         @web.middleware
         async def hal0_gpu_gate_middleware(request, handler):
-            if request.method == "POST" and request.path in _BLOCK_PATHS:
-                status = await asyncio.to_thread(_fetch_status)
-                if should_block(request.method, request.path, status):
-                    return web.json_response(GATE_BODY, status=403)
+            if _is_prompt_submit(request.method, request.path):
+                await asyncio.to_thread(prepare_image_mode)
             return await handler(request)
 
         PromptServer.instance.app.middlewares.append(hal0_gpu_gate_middleware)
-        print(f"[hal0_gpu_gate] /prompt gated on hal0 GPU mode ({HAL0_STATUS_URL})")
+        print(
+            "[hal0_gpu_gate] /prompt prepares hal0 image mode "
+            f"({HAL0_STATUS_URL} → {HAL0_SWITCHOVER_URL})"
+        )
     except Exception as exc:  # outside ComfyUI (unit tests) or API drift
         print(f"[hal0_gpu_gate] not installed: {exc}")
 

diff --git a/installer/install.sh b/installer/install.sh
@@ -953,6 +953,7 @@ done
 ui_step "ComfyUI control scripts"
 
 COMFYUI_SCRIPTS_SRC="${REPO_ROOT}/installer/comfyui/scripts"
+COMFYUI_CUSTOM_NODES_SRC="${REPO_ROOT}/installer/comfyui/custom_nodes"
 COMFYUI_DIR="/opt/comfyui"
 COMFYUI_MODELS_ROOT="/mnt/ai-models/comfyui"
 
@@ -973,6 +974,13 @@ else
     done
     info "ensured ${COMFYUI_MODELS_ROOT}/{models,output,input,user,custom_nodes}"
 
+    if [[ -d "${COMFYUI_CUSTOM_NODES_SRC}" ]]; then
+        install -m0644 "${COMFYUI_CUSTOM_NODES_SRC}"/*.py "${COMFYUI_MODELS_ROOT}/custom_nodes/"
+        info "wrote ComfyUI custom nodes → ${COMFYUI_MODELS_ROOT}/custom_nodes/"
+    else
+        warn "${COMFYUI_CUSTOM_NODES_SRC} not found — ComfyUI custom nodes not installed"
+    fi
+
     # Place extra_model_paths.yaml if not already present (operator may have a
     # customised copy — never overwrite).
     _EXTRA_PATHS_SRC="${REPO_ROOT}/installer/comfyui/extra_model_paths.yaml"

diff --git a/scripts/deploy.sh b/scripts/deploy.sh
@@ -145,6 +145,21 @@ elif [[ "$DO_BUILD" -eq 0 ]]; then
     warn "skipping UI build (--no-build)"
 fi
 
+# ── 2b. Sync runtime-mounted ComfyUI custom nodes ─────────────────────────────
+# ComfyUI imports custom nodes from the persistent model share, not the source
+# checkout. Keep shipped hal0 nodes in sync during runtime deploys; the ComfyUI
+# slot still needs a restart to import changed node code.
+comfy_nodes_src="${REPO_ROOT}/installer/comfyui/custom_nodes"
+comfy_nodes_dst="${HAL0_COMFYUI_CUSTOM_NODES_DIR:-/mnt/ai-models/comfyui/custom_nodes}"
+if [[ -d "$comfy_nodes_src" ]]; then
+    if install -d "$comfy_nodes_dst" 2>/dev/null \
+        && install -m0644 "$comfy_nodes_src"/*.py "$comfy_nodes_dst"/ 2>/dev/null; then
+        info "ComfyUI custom nodes synced → ${comfy_nodes_dst}"
+    else
+        warn "could not sync ComfyUI custom nodes to ${comfy_nodes_dst}"
+    fi
+fi
+
 # ── 3. Re-assert group-shared ownership ───────────────────────────────────────
 # The reset + build above just (re)created files as the deploying user. Hand the
 # tree back to the shared group so the hal0 service user (Hermes, agents) can

diff --git a/src/hal0/api/__init__.py b/src/hal0/api/__init__.py
@@ -1105,7 +1105,7 @@ def create_app() -> FastAPI:
     )
     app.include_router(slots.router, prefix="/api/slots", tags=["slots"])
     # Read-only ComfyUI "generation engine" status for the slots-page Image-Gen
-    # tab (docker + systemd + ComfyUI HTTP), plus the feature-gated switchover.
+    # tab (docker + systemd + ComfyUI HTTP), plus arbiter switchover controls.
     app.include_router(comfyui.router, prefix="/api/comfyui", tags=["comfyui"])
     app.include_router(models.router, prefix="/api/models", tags=["models"])
     # Issue #311: HuggingFace Hub discovery (search proxy). Sits next

diff --git a/src/hal0/api/routes/comfyui.py b/src/hal0/api/routes/comfyui.py
@@ -1,4 +1,4 @@
-"""Read-only ComfyUI "generation engine" status aggregator (+ gated switchover).
+"""ComfyUI "generation engine" status aggregator + control routes.
 
 The dashboard models ComfyUI as ONE containerized generation engine, not a list
 of per-model slots (a single run loads many cooperating models at once, and it
@@ -23,8 +23,7 @@
 in the background behind a 202; the ``switchover`` block on /status tracks the
 transition. The API no longer shells out — the ``/opt/comfyui`` control scripts
 stay on disk for manual ops only. ``POST /api/comfyui/pin`` toggles the
-arbiter's manual pin (blocks idle-restore). Both write paths stay feature-gated
-behind ``HAL0_COMFYUI_SWITCHOVER_ENABLED`` (501 when off).
+arbiter's manual pin (blocks idle-restore).
 """
 
 from __future__ import annotations
@@ -445,27 +444,6 @@ async def _run_switch(arbiter: Any, mode: str, *, pin: bool = False, force: bool
         _switch["target"] = None
 
 
-def _gate_closed() -> JSONResponse | None:
-    """501 when the operator hasn't enabled the GPU-switch write path."""
-    if os.environ.get("HAL0_COMFYUI_SWITCHOVER_ENABLED", "0") == "1":
-        return None
-    return JSONResponse(
-        status_code=501,
-        content={
-            "error": {
-                "code": "comfyui.switchover_disabled",
-                "message": (
-                    "ComfyUI switchover is disabled on this host. It stops "
-                    "the LLM stack (bots + memory extraction go dark) while "
-                    "generation holds the iGPU; set "
-                    "HAL0_COMFYUI_SWITCHOVER_ENABLED=1 on hal0-api to "
-                    "enable it."
-                ),
-            }
-        },
-    )
-
-
 def _arbiter_unavailable() -> JSONResponse:
     return JSONResponse(
         status_code=503,
@@ -490,13 +468,10 @@ async def comfyui_switchover(request: Request, background_tasks: BackgroundTasks
     background — track completion via the ``switchover`` block on
     ``GET /status``.
 
-    Stays gated behind ``HAL0_COMFYUI_SWITCHOVER_ENABLED`` because the switch
-    takes the LLM stack (bots + memory extraction) offline (an operator
-    decision per host), not because the path is unwired.
+    The endpoint is always available when the SlotManager/GpuArbiter is wired:
+    ComfyUI prompt submission can call it as the implicit handoff before
+    enqueueing a render.
     """
-    gate = _gate_closed()
-    if gate is not None:
-        return gate
     try:
         body = await request.json()
     except ValueError:
@@ -577,12 +552,9 @@ async def comfyui_switchover(request: Request, background_tasks: BackgroundTasks
 async def comfyui_pin(request: Request) -> JSONResponse:
     """Toggle the arbiter's manual pin (holds image mode against idle-restore).
 
-    Body: ``{"pinned": bool}``. Gated by the same env flag as the switchover —
-    pinning only matters when the GPU-switch write path is live.
+    Body: ``{"pinned": bool}``. Pinning disables idle auto-restore while image
+    mode is active.
     """
-    gate = _gate_closed()
-    if gate is not None:
-        return gate
     try:
         body = await request.json()
     except ValueError: