Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/guides/configure.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ with it.
| File | What it holds |
|------|---------------|
| `hal0.toml` | Top-level config: `[meta] schema_version`, `[slots] port_range_start/end`, `[models] store`, telemetry channel. |
| `api.env` | Environment for the hal0-api service (systemd `EnvironmentFile`) — feature flags and env knobs like `HAL0_MODEL_STORE`, `HAL0_COMFYUI_SWITCHOVER_ENABLED`, `HAL0_MCP_ALLOWED_HOSTS`. |
| `api.env` | Environment for the hal0-api service (systemd `EnvironmentFile`) — feature flags and env knobs like `HAL0_MODEL_STORE`, `HAL0_MCP_ALLOWED_HOSTS`. |
| `upstreams.toml` | External / upstream provider routing. |
| `providers.toml` | Provider credential references (key **names**, never plaintext secrets). |
| `profiles.toml` | Optional profile catalogue. Absent → built-in seed profiles are used. |
Expand Down
25 changes: 5 additions & 20 deletions docs/guides/generate-images.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ reloaded.

<Aside type="caution">
Switching the iGPU to generation **stops the LLM stack** (chat bots and
memory extraction go dark) for as long as generation holds the GPU. The
switchover write path is gated off by default — see
[Enable the switchover](#enable-the-switchover).
memory extraction go dark) for as long as generation holds the GPU. hal0
does this automatically when a render needs the iGPU, then idle-restore
hands it back to inference.
</Aside>

## Read the generation-engine status
Expand Down Expand Up @@ -86,8 +86,7 @@ Body fields:
Responses: **202** (`switching`, runs in the background — poll the
`switchover` block on `/status`), **200** (`noop`, already in the target
mode), **409** (switch already in flight, or a busy queue without
`force`), **501** (switchover disabled on this host), **503** (arbiter not
wired).
`force`), **503** (arbiter not wired).

Pin or unpin image mode independently:

Expand All @@ -97,20 +96,6 @@ curl -X POST http://localhost:8080/api/comfyui/pin \
-d '{"pinned": true}'
```

### Enable the switchover

Both write paths (`/switchover` and `/pin`) stay gated behind an
environment flag because the switch takes the LLM stack offline — an
operator decision per host. Set it on hal0-api:

```sh
HAL0_COMFYUI_SWITCHOVER_ENABLED=1
```

Without it, both endpoints answer **501** (`comfyui.switchover_disabled`).
See [Configure](/docs/guides/configure) for where to set environment
keys (`api.env`).

## Generate an image

The OpenAI-compatible endpoint translates your request into a ComfyUI
Expand Down Expand Up @@ -170,5 +155,5 @@ caps a single render at 10 minutes before it gives up.
## See also

- [Manage slots](/docs/guides/manage-slots) — the `img` slot lifecycle.
- [Configure](/docs/guides/configure) — set `HAL0_COMFYUI_SWITCHOVER_ENABLED` and the `[image]` slot defaults.
- [Configure](/docs/guides/configure) — set the `[image]` slot defaults.
- [Observe the system](/docs/guides/logs-and-activity) — tail `hal0-slot@img`.
126 changes: 93 additions & 33 deletions installer/comfyui/custom_nodes/hal0_gpu_gate.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,29 @@
"""hal0 GPU gate — ComfyUI custom node that 403-blocks job submission while
the Strix Halo iGPU is serving the LLM stack.
"""hal0 GPU prepare hook — asks hal0 to enter image mode before job submit.

The ComfyUI container is RESIDENT on hal0 (its web UI stays up in both GPU
modes so users can build workflows/prompts any time), but GPU memory is
exclusive: running a generation while the LLM slots hold GTT would OOM or
evict them. hal0-api's GpuArbiter guards its own dispatch path, and this
middleware closes the remaining door — the web UI's own "Queue Prompt"
(``POST /prompt`` / ``POST /api/prompt``), which goes straight to ComfyUI.
modes so users can build workflows/prompts any time), and queueing a prompt is
now allowed from either mode. When the web UI's own "Queue Prompt"
(``POST /prompt`` / ``POST /api/prompt``) arrives while hal0 still reports
inference mode, this middleware asks hal0-api's GpuArbiter to switch to image
mode and waits briefly for the switch to complete before allowing ComfyUI to
continue.

Deployment: this single file is dropped into the host-mounted
``custom_nodes`` dir (``/mnt/ai-models/comfyui/custom_nodes/``) — no image
rebuild. ComfyUI imports it at startup; ``_install()`` appends an aiohttp
middleware to the PromptServer app (the same mechanism ComfyUI-Login uses).
The container runs with host networking, so hal0-api is on loopback.

Fail-open by design: if hal0-api is unreachable the gate allows the prompt —
a broken hal0 must never brick standalone ComfyUI use. The pure decision
logic (``should_block``) is unit-tested in the hal0 repo
Fail-open by design: if hal0-api is unreachable or the switch request fails,
the hook allows the prompt — a broken hal0 must never brick standalone ComfyUI
use. The pure decision logic (``should_prepare_image_mode``) is unit-tested in the hal0 repo
(tests/comfyui/test_hal0_gpu_gate.py); the aiohttp wiring is exercised by
the CT105 live verification.
"""

import json
import os
import time
import urllib.error
import urllib.request

Expand All @@ -31,41 +32,52 @@
HAL0_STATUS_URL = os.environ.get(
"HAL0_COMFYUI_STATUS_URL", "http://127.0.0.1:8080/api/comfyui/status"
)
HAL0_SWITCHOVER_URL = os.environ.get(
"HAL0_COMFYUI_SWITCHOVER_URL", "http://127.0.0.1:8080/api/comfyui/switchover"
)
_STATUS_TIMEOUT_S = 2.0


def _float_env(name: str, default: float) -> float:
try:
return float(os.environ.get(name, str(default)))
except ValueError:
return default


_SWITCH_TIMEOUT_S = _float_env("HAL0_COMFYUI_SWITCH_TIMEOUT_S", 120.0)
_SWITCH_POLL_S = _float_env("HAL0_COMFYUI_SWITCH_POLL_S", 0.5)

#: Job-submission routes — and ONLY those. Everything the editor needs
#: (/object_info, /queue GET, workflow save/load, uploads) always passes.
_BLOCK_PATHS = frozenset({"/prompt", "/api/prompt"})

#: Mirrors ComfyUI's own /prompt error envelope so the frontend renders the
#: message instead of a generic failure toast.
GATE_BODY = {
"error": {
"type": "hal0_gpu_gate",
"message": (
"The GPU is in inference mode (LLM slots loaded). Flip the "
"Image Gen switch in the hal0 dashboard, then queue again."
),
"details": "hal0 GpuArbiter mode is 'inference'; generation is gated.",
"extra_info": {},
},
"node_errors": {},
}
def _is_prompt_submit(method: str, path: str) -> bool:
return method == "POST" and path in _BLOCK_PATHS


def should_block(method: str, path: str, status: "dict | None") -> bool:
"""True iff this request is a job submission while the GPU serves LLMs.
def should_prepare_image_mode(method: str, path: str, status: "dict | None") -> bool:
"""True iff this prompt submit should ask hal0 to enter image mode.

``status`` is hal0-api's /api/comfyui/status JSON (or None when
unreachable / unparseable → fail-open).
"""
if method != "POST" or path not in _BLOCK_PATHS:
if not _is_prompt_submit(method, path):
return False
if not isinstance(status, dict):
return False
return status.get("mode") == "inference"


def should_block(method: str, path: str, status: "dict | None") -> bool:
"""Backward-compatible name for older tests/imports.

Prompt submission is no longer blocked by this custom node; it prepares
image mode best-effort, then lets ComfyUI handle the prompt.
"""
return False


def _fetch_status() -> "dict | None":
"""Blocking status fetch via urllib (run in a thread by the middleware).

Expand All @@ -80,8 +92,55 @@ def _fetch_status() -> "dict | None":
return None


def _post_switchover() -> bool:
"""Ask hal0-api to enter generation mode; True means accepted/noop.

stdlib-only on purpose: custom nodes can't assume extra deps in the
ComfyUI venv.
"""
body = json.dumps({"mode": "generation"}).encode("utf-8")
req = urllib.request.Request(
HAL0_SWITCHOVER_URL,
data=body,
headers={"Content-Type": "application/json"},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=_STATUS_TIMEOUT_S) as resp:
return resp.status in (200, 202)
except urllib.error.HTTPError as exc:
# 409 can mean a switch is already in flight. Poll status below.
return exc.code == 409
except (urllib.error.URLError, OSError, ValueError):
return False


def prepare_image_mode() -> None:
"""Best-effort inference → generation handoff before forwarding /prompt.

Fail-open: exceptions and timeouts return without blocking the prompt.
"""
status = _fetch_status()
if not should_prepare_image_mode("POST", "/prompt", status):
return
if not _post_switchover():
return

deadline = time.monotonic() + max(0.0, _SWITCH_TIMEOUT_S)
while time.monotonic() < deadline:
status = _fetch_status()
if not isinstance(status, dict):
return
if status.get("mode") == "generation":
return
sw = status.get("switchover")
if isinstance(sw, dict) and sw.get("error"):
return
time.sleep(max(0.05, _SWITCH_POLL_S))


def _install() -> None:
"""Attach the gate middleware to ComfyUI's PromptServer (fail-soft)."""
"""Attach the prepare middleware to ComfyUI's PromptServer (fail-soft)."""
try:
import asyncio

Expand All @@ -90,14 +149,15 @@ def _install() -> None:

@web.middleware
async def hal0_gpu_gate_middleware(request, handler):
if request.method == "POST" and request.path in _BLOCK_PATHS:
status = await asyncio.to_thread(_fetch_status)
if should_block(request.method, request.path, status):
return web.json_response(GATE_BODY, status=403)
if _is_prompt_submit(request.method, request.path):
await asyncio.to_thread(prepare_image_mode)
return await handler(request)

PromptServer.instance.app.middlewares.append(hal0_gpu_gate_middleware)
print(f"[hal0_gpu_gate] /prompt gated on hal0 GPU mode ({HAL0_STATUS_URL})")
print(
"[hal0_gpu_gate] /prompt prepares hal0 image mode "
f"({HAL0_STATUS_URL} → {HAL0_SWITCHOVER_URL})"
)
except Exception as exc: # outside ComfyUI (unit tests) or API drift
print(f"[hal0_gpu_gate] not installed: {exc}")

Expand Down
8 changes: 8 additions & 0 deletions installer/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -953,6 +953,7 @@ done
ui_step "ComfyUI control scripts"

COMFYUI_SCRIPTS_SRC="${REPO_ROOT}/installer/comfyui/scripts"
COMFYUI_CUSTOM_NODES_SRC="${REPO_ROOT}/installer/comfyui/custom_nodes"
COMFYUI_DIR="/opt/comfyui"
COMFYUI_MODELS_ROOT="/mnt/ai-models/comfyui"

Expand All @@ -973,6 +974,13 @@ else
done
info "ensured ${COMFYUI_MODELS_ROOT}/{models,output,input,user,custom_nodes}"

if [[ -d "${COMFYUI_CUSTOM_NODES_SRC}" ]]; then
install -m0644 "${COMFYUI_CUSTOM_NODES_SRC}"/*.py "${COMFYUI_MODELS_ROOT}/custom_nodes/"
info "wrote ComfyUI custom nodes → ${COMFYUI_MODELS_ROOT}/custom_nodes/"
else
warn "${COMFYUI_CUSTOM_NODES_SRC} not found — ComfyUI custom nodes not installed"
fi

# Place extra_model_paths.yaml if not already present (operator may have a
# customised copy — never overwrite).
_EXTRA_PATHS_SRC="${REPO_ROOT}/installer/comfyui/extra_model_paths.yaml"
Expand Down
15 changes: 15 additions & 0 deletions scripts/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,21 @@ elif [[ "$DO_BUILD" -eq 0 ]]; then
warn "skipping UI build (--no-build)"
fi

# ── 2b. Sync runtime-mounted ComfyUI custom nodes ─────────────────────────────
# ComfyUI imports custom nodes from the persistent model share, not the source
# checkout. Keep shipped hal0 nodes in sync during runtime deploys; the ComfyUI
# slot still needs a restart to import changed node code.
comfy_nodes_src="${REPO_ROOT}/installer/comfyui/custom_nodes"
comfy_nodes_dst="${HAL0_COMFYUI_CUSTOM_NODES_DIR:-/mnt/ai-models/comfyui/custom_nodes}"
if [[ -d "$comfy_nodes_src" ]]; then
if install -d "$comfy_nodes_dst" 2>/dev/null \
&& install -m0644 "$comfy_nodes_src"/*.py "$comfy_nodes_dst"/ 2>/dev/null; then
info "ComfyUI custom nodes synced → ${comfy_nodes_dst}"
else
warn "could not sync ComfyUI custom nodes to ${comfy_nodes_dst}"
fi
fi

# ── 3. Re-assert group-shared ownership ───────────────────────────────────────
# The reset + build above just (re)created files as the deploying user. Hand the
# tree back to the shared group so the hal0 service user (Hermes, agents) can
Expand Down
2 changes: 1 addition & 1 deletion src/hal0/api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -1105,7 +1105,7 @@ def create_app() -> FastAPI:
)
app.include_router(slots.router, prefix="/api/slots", tags=["slots"])
# Read-only ComfyUI "generation engine" status for the slots-page Image-Gen
# tab (docker + systemd + ComfyUI HTTP), plus the feature-gated switchover.
# tab (docker + systemd + ComfyUI HTTP), plus arbiter switchover controls.
app.include_router(comfyui.router, prefix="/api/comfyui", tags=["comfyui"])
app.include_router(models.router, prefix="/api/models", tags=["models"])
# Issue #311: HuggingFace Hub discovery (search proxy). Sits next
Expand Down
42 changes: 7 additions & 35 deletions src/hal0/api/routes/comfyui.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Read-only ComfyUI "generation engine" status aggregator (+ gated switchover).
"""ComfyUI "generation engine" status aggregator + control routes.

The dashboard models ComfyUI as ONE containerized generation engine, not a list
of per-model slots (a single run loads many cooperating models at once, and it
Expand All @@ -23,8 +23,7 @@
in the background behind a 202; the ``switchover`` block on /status tracks the
transition. The API no longer shells out — the ``/opt/comfyui`` control scripts
stay on disk for manual ops only. ``POST /api/comfyui/pin`` toggles the
arbiter's manual pin (blocks idle-restore). Both write paths stay feature-gated
behind ``HAL0_COMFYUI_SWITCHOVER_ENABLED`` (501 when off).
arbiter's manual pin (blocks idle-restore).
"""

from __future__ import annotations
Expand Down Expand Up @@ -445,27 +444,6 @@ async def _run_switch(arbiter: Any, mode: str, *, pin: bool = False, force: bool
_switch["target"] = None


def _gate_closed() -> JSONResponse | None:
"""501 when the operator hasn't enabled the GPU-switch write path."""
if os.environ.get("HAL0_COMFYUI_SWITCHOVER_ENABLED", "0") == "1":
return None
return JSONResponse(
status_code=501,
content={
"error": {
"code": "comfyui.switchover_disabled",
"message": (
"ComfyUI switchover is disabled on this host. It stops "
"the LLM stack (bots + memory extraction go dark) while "
"generation holds the iGPU; set "
"HAL0_COMFYUI_SWITCHOVER_ENABLED=1 on hal0-api to "
"enable it."
),
}
},
)


def _arbiter_unavailable() -> JSONResponse:
return JSONResponse(
status_code=503,
Expand All @@ -490,13 +468,10 @@ async def comfyui_switchover(request: Request, background_tasks: BackgroundTasks
background — track completion via the ``switchover`` block on
``GET /status``.

Stays gated behind ``HAL0_COMFYUI_SWITCHOVER_ENABLED`` because the switch
takes the LLM stack (bots + memory extraction) offline (an operator
decision per host), not because the path is unwired.
The endpoint is always available when the SlotManager/GpuArbiter is wired:
ComfyUI prompt submission can call it as the implicit handoff before
enqueueing a render.
"""
gate = _gate_closed()
if gate is not None:
return gate
try:
body = await request.json()
except ValueError:
Expand Down Expand Up @@ -577,12 +552,9 @@ async def comfyui_switchover(request: Request, background_tasks: BackgroundTasks
async def comfyui_pin(request: Request) -> JSONResponse:
"""Toggle the arbiter's manual pin (holds image mode against idle-restore).

Body: ``{"pinned": bool}``. Gated by the same env flag as the switchover —
pinning only matters when the GPU-switch write path is live.
Body: ``{"pinned": bool}``. Pinning disables idle auto-restore while image
mode is active.
"""
gate = _gate_closed()
if gate is not None:
return gate
try:
body = await request.json()
except ValueError:
Expand Down
Loading