fix: graceful shutdown on external (UPS/qm/ACPI) shutdown#3319
Conversation
|
is StartOS's existing SIGTERM handling already sufficient for this use case? |
|
Good question — and digging in, you're right to push on it. startd already handles SIGTERM. In So the graceful path is already wired on SIGTERM. Re-evaluating this PR against that, most of it is redundant:
So at the startd level the existing handling looks sufficient. That also makes me distrust the #3235 root-cause writeup — it claims the units "lack Given that, I don't want to merge a mostly-cosmetic PR. I'd rather reproduce first: spin a |
Adds an opt-in wait that blocks until graceful teardown (services.shutdown_all) completes, via a watch-backed completion signal on RpcContext. Default false over the API (frontend keeps its immediate reply); the CLI will default it true with --nowait. Groundwork for a systemd pre-shutdown barrier so external shutdowns (UPS/qm/ACPI) tear down containers gracefully.
Two systemd units (startos-shutdown.service / startos-restart.service) with DefaultDependencies=no so each binds to poweroff/halt vs reboot/kexec specifically. Ordered After=startd.service (Before the respective target), their ExecStop calls start-cli server shutdown/restart, which waits for graceful container teardown before systemd proceeds. Authenticates locally via the rpc authcookie. Makes externally-initiated shutdowns (UPS/qm/ACPI) graceful.
33bccf2 to
6177d02
Compare
|
Thanks for the review and merge. Since this went in ahead of a local build, I'm watching the master CI now (Automated Tests + the compile/Debian-package matrix) and will fix forward immediately if anything trips. One tracked follow-up: the new |
Summary
Makes externally-initiated shutdowns graceful, so StartOS service containers are torn down by
startdbefore systemd proceeds — coveringqm shutdown, ACPI, and UPS/NUT-triggeredshutdown -h(the dependency for #3317). Closes #3235.Previously only a UI /
start-cli server shutdowninitiated shutdown ranstartd's graceful teardown; a system-initiated shutdown let systemd stop everything aroundstartd, terminating services abruptly (data-corruption risk for Lightning/Bitcoin workloads).Approach (revised per review)
startdalready handlesSIGTERMby running the same graceful teardown as a UI shutdown. The gap is purely ordering: systemd ordering is symmetric, so becausestartdstarts before the things it manages, it is stopped after them — the opposite of what we need. Two small pre-shutdown barrier units invert that:core/startos-shutdown.service(poweroff/halt) andcore/startos-restart.service(reboot/kexec).DefaultDependencies=nobinds each to its specific target (not the genericshutdown.target), so each fires only on its own mode. OrderedAfter=startd.service(andBefore=/Conflicts=its target), each unit'sExecStopcallsstart-cli server shutdown/restart— which now waits for graceful teardown — whilestartdis still up.start-cliauthenticates locally via/run/startos/rpc.authcookie.startdthen re-issues the matching final poweroff/reboot, which is idempotent since systemd is already on its way there.server shutdown/restartgain await(core/src/shutdown.rs): awatch-backed completion signal onRpcContext(wait_closed(), fired afterservices.shutdown_all()). Default false over the API (the frontend keeps its immediate reply — it already sends{}), default true on the CLI with--nowaitto opt out.Files
core/src/shutdown.rs—ShutdownParams { wait }, wait-on-teardown.core/src/context/rpc.rs—closedwatch +wait_closed().core/startos-shutdown.service,core/startos-restart.service— barrier units.Makefile,debian/startos/postinst— install + enable.core/locales/i18n.yaml—help.arg.nowait(×5 locales).Validation status
cargo check,make ts-bindings(generateShutdownParams.ts+ SDK rebuild), cross-layer typechecks, and a VM test of an external poweroff and reboot on beta-9 (diff teardown logs against a UI shutdown). Pushed for review per @dr-bonez.