Fix serve quadlet: SELinux label and ready-signal healthcheck#18
Merged
Fix serve quadlet: SELinux label and ready-signal healthcheck#18
Conversation
Two regressions in generate_container_serve_quadlet caught when the generator produced its first real deployment on the test server. First: the container crashed on startup with 'Config file not found: /etc/psi/config.yaml', because the generated quadlet did not set SecurityLabelType. Without container_runtime_t, the container runs under the default container_t SELinux type, which cannot read /etc/psi (labeled etc_t) via a :ro bind mount — :Z would relabel the host dir, which we never want on shared config. Setting SecurityLabelType=container_runtime_t is the standard workaround and matches what generate_container_provider_setup_quadlet already does. Second: quadlet emits Type=notify for .container units by default and expects podman to send sd_notify(READY=1). psi serve does not call sd_notify itself, so the unit used to sit in 'activating' until systemd's TimeoutStartSec killed it. Notify=healthy plus a HealthCmd that curls the /healthz endpoint through the unix socket makes podman fire the ready signal once the first healthcheck passes. HealthStartPeriod=60s gives HSM login + cache decrypt enough headroom before the first probe.
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two regressions in
generate_container_serve_quadletcaught on the first real deployment of the generator output:Config file not found: /etc/psi/config.yamlbecause the quadlet did not setSecurityLabelType. Withoutcontainer_runtime_tthe container runs ascontainer_tand cannot read/etc/psi(labeledetc_t) via a:romount. Using:Zwould relabel the host config dir, which we never want.activatingforever, because quadlet emitsType=notifyby default and expectssd_notify(READY=1).psi servedoes not callsd_notifyitself.Why
First real use of
psi systemd install --mode containeron the test server exposed both gaps.generate_container_provider_setup_quadletalready setsSecurityLabelType=container_runtime_t; the serve generator was inconsistent. The Butane-managed quadlet that had been running previously had bothSecurityLabelTypeandNotify=healthy+HealthCmd; the generator dropped them.What changes
psi/unitgen.pySecurityLabelType=container_runtime_tNotify=healthyHealthCmd=curl -sf --unix-socket <sock> http://localhost/healthzHealthInterval=30s,HealthRetries=10,HealthStartPeriod=60s,HealthTimeout=5sHealthStartPeriod=60sgives HSM login plus encrypted cache decrypt enough headroom before the first probe — Nitrokey HSM startup alone can take 25 seconds.tests/test_unitgen.pytest_serve_quadlet_has_security_label_type— asserts the label is emittedtest_serve_quadlet_has_notify_healthy— assertsNotify=healthyand aHealthCmdpointing at/healthzwith a start period long enough for HSM startupTest plan
uv run ruff check psi/ tests/— cleanuv run ruff format --check psi/ tests/— cleanuv run ty check— cleanuv run pytest -q— 296 passed (2 new)psi-secrets.service, confirm the unit goesactivewithinHealthStartPeriodof starting and thatpodman exec psi-secrets psi cache statusworks