From b3016f19e713ae7bf25f22f561ce77ac3c4b1304 Mon Sep 17 00:00:00 2001 From: Joe Doss Date: Wed, 8 Apr 2026 10:30:19 -0500 Subject: [PATCH] docs: add secret cache reference; update README and provider docs MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit New docs/secret-cache.md is the authoritative reference for the secret cache: what it is, why it exists, threat model, TPM and HSM backend details, on-disk envelope format, CLI reference, deployment walkthroughs for native/container × TPM/HSM, cold-boot caveat, rotation, and troubleshooting. README updates: new Secret cache configuration section, cache CLI subcommand reference, FCOS deployment note about auto-wired backend credentials, and a clarification that HSM-in-container-mode wiring is auto-generated when cache.backend is hsm. The 'How it works' block now mentions the in-memory dict path. Infisical provider doc: corrects the stale claim that secret values never touch disk (true with the cache disabled, qualified with it enabled), fixes the stale psi-secrets-setup.service name to psi-infisical-setup.service, describes the cache-aware lookup path, and rewrites the security model table. Nitrokey HSM provider doc: updates the example serve quadlet to match current generator output (ContainerName, SecurityLabelType, Notify=healthy, HealthCmd, HealthStartPeriod) and pins the image tag to :latest. CLAUDE.md: adds psi/cache.py and psi/cache_backends.py to the package layout, updates module dependency arrows, and adds psi cache CLI subcommands to the reference. Replaces all homelab-specific examples (project names, instance URLs) with neutral placeholders across all docs. --- CLAUDE.md | 28 ++- README.md | 117 ++++++++-- docs/infisical-provider.md | 109 +++++---- docs/nitrokeyhsm-provider.md | 12 +- docs/secret-cache.md | 433 +++++++++++++++++++++++++++++++++++ 5 files changed, 625 insertions(+), 74 deletions(-) create mode 100644 docs/secret-cache.md diff --git a/CLAUDE.md b/CLAUDE.md index 5c6863f..2964169 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -74,15 +74,17 @@ psi/ ├── __init__.py Version string ├── provider.py SecretProvider Protocol + registry + parse_mapping() ├── models.py Generic models (SystemdScope, WorkloadConfig, etc.) -├── settings.py PsiSettings — YAML config with providers dict +├── settings.py PsiSettings — YAML config; providers dict + CacheConfig ├── secret.py Shell driver commands (store/lookup/delete/list) -├── serve.py HTTP server on Unix socket — dispatches to providers -├── setup.py Boot-time orchestration — provider-aware +├── serve.py HTTP server on Unix socket — dispatches to providers + cache +├── setup.py Boot-time orchestration — provider-aware, populates cache +├── cache.py Single-file encrypted cache (envelope + in-memory dict) +├── cache_backends.py TPM (systemd-creds + AES-GCM) and HSM (PKCS#11) cache backends ├── output.py TTY-aware output (Rich tables or JSON) -├── systemd.py Query systemd timer/unit status +├── systemd.py Query systemd timer/unit status + daemon_reload helper ├── unitgen.py Generators for systemd unit/quadlet file contents ├── installer.py Orchestrate systemd unit installation -├── cli.py Typer CLI — core commands + provider subcommands +├── cli.py Typer CLI — core + provider + cache subcommands │ ├── providers/ │ ├── __init__.py Provider factory (create_provider) @@ -113,9 +115,15 @@ cli.py → settings.py, setup.py, secret.py, installer.py providers/infisical/cli.py, providers/nitrokeyhsm/cli.py serve.py → provider.py (open_all_providers, parse_mapping, close_all_providers) +serve.py → cache.py, cache_backends.py (Cache, make_backend — optional, degrades gracefully) secret.py → provider.py (get_provider, parse_mapping) setup.py → providers/infisical/ (InfisicalProvider, InfisicalConfig, resolve_auth) +setup.py → cache.py, cache_backends.py (eager cache population when enabled) +setup.py → systemd.py (daemon_reload helper, D-Bus-first fallback) +installer.py → systemd.py (daemon_reload helper) provider.py → providers/__init__.py (create_provider) +cache.py → files.py (write_bytes_secure for atomic writes) +cache_backends.py → providers/nitrokeyhsm/ (crypto, pkcs11, pin — HSM backend reuses the provider) providers/infisical/__init__.py → providers/infisical/api.py, models.py providers/infisical/api.py → providers/infisical/auth.py, token.py @@ -164,7 +172,7 @@ workloads: myapp: provider: infisical unit: myapp.container - depends_on: [psi-secrets-setup.service] + depends_on: [psi-infisical-setup.service] secrets: - project: myproject path: /myapp @@ -173,7 +181,7 @@ workloads: windmill-worker@: provider: infisical secrets: - - project: homelab + - project: myproject path: /windmill infisical: provider: nitrokeyhsm @@ -188,6 +196,12 @@ psi setup Discover secrets, register, generate drop psi install Generate containers.conf.d/psi.conf psi systemd install Generate systemd units +# Secret cache (optional) +psi cache init --backend {tpm,hsm} Provision cache encryption key +psi cache status [--verify] Show cache status (fast) or decrypt and count (slow) +psi cache refresh Re-run setup to repopulate the cache +psi cache invalidate Drop an entry and persist + # Infisical provider psi infisical login Test authentication psi infisical env Fetch secrets as env vars diff --git a/README.md b/README.md index 74c34b0..a774f5a 100644 --- a/README.md +++ b/README.md @@ -25,18 +25,23 @@ no container spawned per lookup, just a fast HTTP request to a local socket. Boot time: psi serve → starts the lookup service on /run/psi/psi.sock opens all configured providers (Infisical client, HSM session) + decrypts state_dir/cache.enc into memory (if cache enabled) psi setup --provider nitrokeyhsm → registers HSM-backed workloads (instant, local-only) psi setup --provider infisical → discovers secrets from Infisical, registers with Podman, - writes systemd drop-ins (retries if Infisical is starting) + writes systemd drop-ins, populates the encrypted cache Container start: Podman → Secret=myapp--DB_HOST,type=env,target=DB_HOST → shell driver calls: curl /run/psi/psi.sock/lookup/{secret_id} - → PSI reads JSON mapping from state_dir - → dispatches to the correct provider (infisical or nitrokeyhsm) - → returns decrypted/fetched value to Podman → injected as env var + → PSI checks the in-memory cache → hit → return plaintext (no I/O, no crypto) + → miss → dispatch to provider, cache the result + → returns value to Podman → injected as env var ``` +The optional [secret cache](docs/secret-cache.md) lets lookups survive upstream provider +outages by decrypting a single encrypted file at `psi serve` startup and holding the dict +in memory. Disabled by default — see the cache doc for the threat model. + ## Quick start ### 1. Install @@ -110,7 +115,9 @@ Secrets are fetched from Infisical at container start time. ### Provider: Infisical -Fetches secrets from Infisical at lookup time. No secret values stored on disk. +Fetches secrets from Infisical at lookup time. By default no secret values are stored on disk — +only coordinate mappings. Enable the [secret cache](docs/secret-cache.md) if you want lookups +to survive Infisical outages (the cache is encrypted-at-rest with a TPM or HSM key). ```yaml providers: @@ -160,6 +167,41 @@ PIN resolution order: `$CREDENTIALS_DIRECTORY/hsm-pin` → config `pin` → `PSI See the [Nitrokey HSM provider reference](docs/nitrokeyhsm-provider.md) for the full documentation. +### Secret cache + +Opt-in single-file encrypted cache. With the cache enabled, `psi-infisical-setup` eagerly +fetches every configured secret value at boot and writes an encrypted bundle to +`state_dir/cache.enc`. `psi serve` decrypts it once at startup and serves lookups from +memory — upstream provider outages no longer stop containers from starting. + +```yaml +cache: + enabled: true + backend: hsm # 'tpm' or 'hsm'. Required for the cache to populate. +``` + +The TPM backend uses a 32-byte AES-256 key sealed by `systemd-creds` to the host TPM2. +The HSM backend reuses the existing Nitrokey hybrid envelope (RSA-OAEP + AES-256-GCM), +unwrapping the AES key via PKCS#11 at `psi serve` startup. + +```bash +# One-time provisioning (host) +sudo psi cache init --backend tpm # or --backend hsm + +# Inspect — fast path, no crypto +sudo podman exec -i psi-secrets psi cache status + +# Full verify — decrypts and counts entries +sudo podman exec -i psi-secrets psi cache status --verify + +# Refresh the cache from providers (e.g. after rotating a secret) +sudo podman exec -i psi-secrets psi cache refresh +``` + +See the [secret cache reference](docs/secret-cache.md) for the threat model, envelope +format, deployment walkthroughs (native TPM, container TPM, container HSM), and +troubleshooting. + ### Workloads Each workload specifies which provider handles its secrets: @@ -194,17 +236,17 @@ workloads: windmill-server: provider: infisical secrets: - - project: homelab + - project: myproject path: /windmill # shared secrets (DB_HOST, REDIS_URL, etc.) - - project: homelab + - project: myproject path: /windmill/server # server-specific (MODE=server) windmill-worker-1: provider: infisical secrets: - - project: homelab + - project: myproject path: /windmill # same shared secrets - - project: homelab + - project: myproject path: /windmill/worker # worker-specific (MODE=worker, NUM_WORKERS) ``` @@ -230,9 +272,9 @@ workloads: provider: infisical depends_on: [psi-infisical-setup.service] secrets: - - project: homelab + - project: myproject path: /windmill - - project: homelab + - project: myproject path: /windmill/worker ``` @@ -256,17 +298,17 @@ workloads: windmill-server: provider: infisical secrets: - - project: homelab + - project: myproject path: /windmill - - project: homelab + - project: myproject path: /windmill/server windmill-worker@: provider: infisical secrets: - - project: homelab + - project: myproject path: /windmill - - project: homelab + - project: myproject path: /windmill/worker ``` @@ -446,6 +488,19 @@ psi install Generate containers.conf.d/psi.conf psi systemd install Generate systemd units (--mode native or container) ``` +### Secret cache + +``` +psi cache init --backend tpm Provision a TPM2-sealed AES key and empty cache.enc +psi cache init --backend hsm Write an empty cache.enc wrapped with the HSM public key +psi cache status Print backend, file metadata, and on-disk tag (fast) +psi cache status --verify Same, plus decrypt and report the entry count (slow) +psi cache refresh Re-run setup to repopulate the cache from providers +psi cache invalidate Drop a single entry and persist the change +``` + +See the [secret cache reference](docs/secret-cache.md) for full documentation. + ### Infisical provider ``` @@ -529,11 +584,23 @@ operators like `&&`, pipes, or redirection are not interpreted. sudo psi systemd install --mode container --image ghcr.io/quickvm/psi:latest --enable ``` +Or run the same command inside a one-shot psi container if you do not have a native `psi` +binary on the host. The container needs `/etc/containers/systemd` mounted read-write plus +the config, D-Bus, and podman sockets — see [secret-cache.md](docs/secret-cache.md) for +the exact invocation. + Generates per-provider setup units based on configured providers: - `psi-secrets.container` — long-running lookup service - `psi-{provider}-setup.container` — oneshot per provider (e.g. `psi-infisical-setup`, `psi-nitrokeyhsm-setup`) - `psi-tls-renew.timer` + service — daily TLS renewal (if configured) +When the [secret cache](docs/secret-cache.md) is configured, the generator automatically +adds the HSM or TPM unseal wiring to both `psi-secrets.container` and the +`psi-{provider}-setup.container` files. For the HSM backend that means the pcscd socket +volume, `CREDENTIALS_DIRECTORY`, `LoadCredentialEncrypted=hsm-pin`, and an +`After=pcscd.service` ordering. For the TPM backend that means +`LoadCredentialEncrypted=psi-cache-key`. + The per-provider split allows independent systemd ordering. For example, Infisical can depend on the HSM setup unit for its bootstrap secrets, while other services depend on the Infisical setup unit: @@ -565,8 +632,8 @@ installs quadlet files (`pcscd.container`, `pcscd-socket.volume`). **Configure PSI serve to use pcscd:** -The PSI serve container needs the pcscd socket volume and a systemd ordering -dependency: +The PSI serve container needs the pcscd socket volume, the systemd credential for +the PIN, and an ordering dependency on `pcscd.service`: ```ini # psi-secrets.container @@ -575,19 +642,19 @@ After=network-online.target pcscd.service [Container] Volume=pcscd-socket:/run/pcscd:rw -``` - -**PIN delivery via TPM-sealed credential:** +Volume=/run/credentials/psi-secrets.service:/run/credentials:ro +Environment=CREDENTIALS_DIRECTORY=/run/credentials -```ini [Service] LoadCredentialEncrypted=hsm-pin - -[Container] -Volume=/run/credentials/psi-secrets.service:/run/credentials:ro -Environment=CREDENTIALS_DIRECTORY=/run/credentials ``` +`psi systemd install --mode container` emits all of this automatically when the +[secret cache](docs/secret-cache.md) is configured with `backend: hsm`, and also +propagates it to `psi-{provider}-setup.container` so the setup path can populate +the cache. Workloads using the nitrokeyhsm *provider* without the cache backend +still need the wiring done by hand (or via Butane). + See [Nitrokey HSM setup](#nitrokey-hsm-setup) for PIN encryption instructions. **For Butane/Ignition deployments**, include the pcscd quadlet files in your diff --git a/docs/infisical-provider.md b/docs/infisical-provider.md index e26b484..eaf1f96 100644 --- a/docs/infisical-provider.md +++ b/docs/infisical-provider.md @@ -1,12 +1,19 @@ # Infisical Provider The Infisical provider fetches secrets from an Infisical instance at -container start time. No secret values are stored on disk — only -coordinate mappings that tell PSI where to fetch the real value. +container start time. By default only coordinate mappings +(`{project, path, key}`) are stored on disk — secret values are fetched +live on each lookup. + +If the [secret cache](secret-cache.md) is enabled, `psi-infisical-setup` +eagerly fetches every configured secret value during boot and stores the +encrypted bundle at `state_dir/cache.enc`. Lookups then resolve from +memory and survive provider outages. See `docs/secret-cache.md` for the +threat model and deployment walkthrough. ## Prerequisites -- **Infisical instance** running and accessible (e.g. `https://infisical.inf7.dev`) +- **Infisical instance** running and accessible (e.g. `https://app.infisical.com` or a self-hosted URL) - **Machine identity** configured in Infisical with access to the required projects and environments - **PSI serve container** running with network access to the Infisical API @@ -16,26 +23,26 @@ coordinate mappings that tell PSI where to fetch the real value. ```yaml providers: infisical: - api_url: https://infisical.inf7.dev + api_url: https://app.infisical.com verify_ssl: true token: ttl: 300 # seconds to cache auth tokens projects: - homelab: - id: "4cd8d50c-a987-4f87-a001-5a1950c84397" - environment: homelab + myproject: + id: "project-uuid" + environment: prod auth: method: universal-auth - client_id: "3f90d6e5-..." - client_secret: "cc68176587..." + client_id: "client-id" + client_secret: "client-secret" workloads: myapp: provider: infisical unit: myapp.container - depends_on: [psi-secrets-setup.service] + depends_on: [psi-infisical-setup.service] secrets: - - project: homelab + - project: myproject path: /myapp ``` @@ -94,7 +101,7 @@ in Infisical and registers them with Podman: psi setup ``` -Or via the systemd oneshot unit (`psi-secrets-setup.service`). +Or via the systemd oneshot unit `psi-infisical-setup.service`. What happens during `psi setup` for each Infisical workload: @@ -105,15 +112,21 @@ What happens during `psi setup` for each Infisical workload: 3. **Register** — for each discovered secret, create a Podman secret with the `shell` driver. The secret's content is a JSON mapping: ```json - {"provider": "infisical", "project": "homelab", "path": "/myapp", "key": "DB_HOST"} + {"provider": "infisical", "project": "myproject", "path": "/myapp", "key": "DB_HOST"} ``` This mapping is stored at `/var/lib/psi/{SECRET_ID}`. 4. **Generate drop-in** — write a systemd drop-in at `{workload}.container.d/50-secrets.conf` that maps each secret to an environment variable via `Secret=` directives +5. **Populate the cache** (cache enabled only) — each listed secret + already carries its value in the API response, so setup also encrypts + the full bundle with the configured cache backend and atomically + writes `state_dir/cache.enc`. No extra API calls. -No secret values touch disk during setup. Only the coordinates -(project, path, key) are stored. +Only coordinates are registered with Podman. With the cache disabled, no +secret values touch disk. With the cache enabled, values are on disk but +encrypted by the TPM or HSM backend — see +[secret-cache.md](secret-cache.md) for the threat model. ### Secret Naming Convention @@ -141,19 +154,26 @@ Container start curl --unix-socket /run/psi/psi.sock http://localhost/lookup/{SECRET_ID} ``` -What happens inside `InfisicalProvider.lookup()`: +What happens inside `_handle_lookup` in `psi/serve.py`: 1. Read the JSON mapping from `/var/lib/psi/{SECRET_ID}` -2. Parse the `provider` field — dispatches to Infisical -3. Extract the coordinate: `project` alias, `path`, `key` -4. Resolve the project config and auth from settings -5. Obtain an auth token (from cache if valid, or re-authenticate) -6. Call the Infisical API: +2. Parse the `provider` field +3. **If the cache is enabled and has this entry**, return the cached bytes + immediately — no provider round trip, no crypto on the hot path. This + is where provider-outage resilience comes from. +4. Otherwise dispatch to `InfisicalProvider.lookup()`, which resolves + project/auth/token and calls `GET /api/v4/secrets/{key}?projectId=...&environment=...&secretPath=...` -7. Return the secret value bytes to Podman +5. On a cache miss that succeeded, insert the value into the in-memory + cache and re-encrypt `cache.enc` so future lookups do not have to + touch Infisical again. +6. Return the value bytes to Podman. -The secret value is fetched live from Infisical on every container start. -If Infisical is down, the lookup fails and the container won't start. +With the cache disabled, every container start is a live round trip — if +Infisical is down, the lookup fails and the container will not start. +With the cache enabled and populated, container starts continue to work +for as long as the in-memory dict has the entry, regardless of Infisical +availability. ## Token Caching @@ -178,7 +198,7 @@ providers: tls: certificates: web: - project: homelab + project: myproject profile_id: "profile-uuid" common_name: "web.example.com" ttl: "90d" @@ -209,13 +229,13 @@ Migrate existing secrets into Infisical from external sources: ```bash # From a .env file -psi infisical import env-file .env --project homelab --path /myapp +psi infisical import env-file .env --project myproject --path /myapp # From existing Podman secrets -psi infisical import podman-secret --all --project homelab --path /myapp +psi infisical import podman-secret --all --project myproject --path /myapp # From quadlet .container files -psi infisical import quadlet myapp.container --project homelab --path /myapp +psi infisical import quadlet myapp.container --project myproject --path /myapp ``` ## CLI Commands @@ -225,13 +245,13 @@ psi infisical import quadlet myapp.container --project homelab --path /myapp psi infisical login # Fetch secrets as environment variables -eval "$(psi infisical env --project homelab --path /myapp)" +eval "$(psi infisical env --project myproject --path /myapp)" # Fetch secrets as KEY=VALUE (for env files) -psi infisical env --project homelab --path /myapp --format env > /run/app/env +psi infisical env --project myproject --path /myapp --format env > /run/app/env # Write a single secret to a file -psi infisical write-file DB_CERT /etc/ssl/db.pem --project homelab +psi infisical write-file DB_CERT /etc/ssl/db.pem --project myproject # TLS management psi infisical tls issue @@ -239,20 +259,27 @@ psi infisical tls renew psi infisical tls status # Import from external sources -psi infisical import env-file .env --project homelab --path /app -psi infisical import podman-secret --all --project homelab --path /app -psi infisical import quadlet app.container --project homelab --path /app +psi infisical import env-file .env --project myproject --path /app +psi infisical import podman-secret --all --project myproject --path /app +psi infisical import quadlet app.container --project myproject --path /app ``` ## Security Model | Component | Where | Protection | |---|---|---| -| Secret values | Infisical server only | Never written to disk — fetched live at lookup time | -| Coordinate mappings | Disk (`/var/lib/psi/`) | Only contain project/path/key — no secret data | +| Coordinate mappings | Disk (`/var/lib/psi/{SECRET_ID}`, mode `0600`) | Only contain project/path/key — no secret data | | Auth credentials | Config file (`/etc/psi/config.yaml`) | `client_id`/`client_secret` or IAM role (no static creds) | -| Auth tokens | Disk (cache) | Short-lived (default 300s), per-auth keyed | - -The key property: **secret values never touch the host filesystem**. They -flow from Infisical API → PSI serve process → Podman → container -environment, entirely in memory. +| Auth tokens | Disk (`/var/lib/psi/.token.{hash}.json`) | Short-lived (default 300s), per-auth keyed | +| Secret values (cache disabled) | Infisical server only, transient in `psi serve` memory during a lookup | Never written to disk | +| Secret values (cache enabled) | `state_dir/cache.enc`, encrypted by TPM or HSM backend | Opaque ciphertext — see [secret-cache.md](secret-cache.md) threat model | + +With the cache disabled, secret values never touch the host filesystem — +they flow from Infisical API → PSI serve process → Podman → container +environment, entirely in memory. The trade-off is that an Infisical +outage stops every container from starting. + +With the cache enabled, the plaintext trust boundary moves from +"Infisical server only" to "this host's TPM-sealed or HSM-wrapped +ciphertext on disk, plus `psi serve` process memory". See +[secret-cache.md](secret-cache.md) for a full analysis. diff --git a/docs/nitrokeyhsm-provider.md b/docs/nitrokeyhsm-provider.md index 883db21..2e09b2a 100644 --- a/docs/nitrokeyhsm-provider.md +++ b/docs/nitrokeyhsm-provider.md @@ -69,8 +69,13 @@ credential: ```ini [Container] -Image=ghcr.io/quickvm/psi:dev +ContainerName=psi-secrets +Image=ghcr.io/quickvm/psi:latest Exec=serve +SecurityLabelType=container_runtime_t +Notify=healthy +HealthCmd=curl -sf --unix-socket /run/psi/psi.sock http://localhost/healthz +HealthStartPeriod=60s Volume=pcscd-socket:/run/pcscd:rw Volume=/run/credentials/psi-secrets.service:/run/credentials:ro Environment=CREDENTIALS_DIRECTORY=/run/credentials @@ -79,6 +84,11 @@ Environment=CREDENTIALS_DIRECTORY=/run/credentials LoadCredentialEncrypted=hsm-pin ``` +`psi systemd install --mode container` emits this automatically when the +[secret cache](secret-cache.md) is configured with `backend: hsm`. The +`HealthStartPeriod=60s` gives HSM login and (optional) cache decrypt enough +headroom before podman's first liveness probe. + ## Encrypting a Secret (Store) ```bash diff --git a/docs/secret-cache.md b/docs/secret-cache.md new file mode 100644 index 0000000..4609453 --- /dev/null +++ b/docs/secret-cache.md @@ -0,0 +1,433 @@ +# Secret Cache + +PSI's secret cache is a single encrypted file that lets `psi serve` resolve +lookups from memory when the upstream provider is unreachable. Without it, +every container start is a live HTTP round trip to Infisical — if Infisical +is down, every PSI-backed container that needs to start fails with +`no such secret`. + +The cache is opt-in only at the backend level — if you do not configure +`cache.backend`, PSI behaves exactly as before and every lookup goes live. + +## Why it exists + +Without the cache, every container start is a live HTTP round trip to +Infisical. A provider outage — planned upgrade, database failure, network +partition, or a `podman-auto-update` cycle that kills an in-flight start — +stops every PSI-backed container on the host from restarting with +`no such secret`. This affects `podman-auto-update` restarts, healthcheck +recovery, manual restarts, and anything else that exercises the start path +while the provider is unavailable. Fixing `TimeoutStartSec` and adding +`Restart=on-failure` on individual services is necessary but does not +address the underlying structural issue: **PSI is a single point of +dependency for the entire fleet, with zero degradation tolerance**. + +The cache is how that changes. `psi-infisical-setup` eagerly fetches every +configured secret value at boot and writes the encrypted bundle to disk. +`psi serve` reads it once at startup, holds the decrypted dict in memory, +and serves lookups from RAM for the lifetime of the process. A provider +outage cannot stop containers from starting as long as the cache was +populated at least once. + +## How it fits in + +``` +Setup (boot, or `psi cache refresh`): + psi-{provider}-setup.service + → queries provider (Infisical API) + → encrypts the bundle with the configured backend + → atomically writes state_dir/cache.enc + +Container start (runtime): + Podman → shell driver → curl /run/psi/psi.sock/lookup/{SECRET_ID} + → PSI checks in-memory dict (populated at serve startup) + → hit → return plaintext (no crypto, no I/O) + → miss → call provider, insert into dict, save file, return plaintext +``` + +Lookups never decrypt the cache file directly. The file is decrypted once +at `psi serve` startup and held as a `dict[str, bytes]` for the lifetime of +the process. Writes (setup refresh, cache miss fill, manual invalidation) +re-encrypt the entire dict and atomically replace `cache.enc`. + +## Threat model + +The cache introduces **secrets-at-rest** where there were none before. +Before shipping it, confirm that this fits your deployment: + +| Component | Where | Protection | +|---|---|---| +| Cache file ciphertext | Disk (`state_dir/cache.enc`, mode `0600`) | AES-256-GCM, key unavailable without TPM or HSM | +| AES key (TPM backend) | `systemd-creds` credential file, sealed to TPM2 PCR 7 | Only decryptable on this host's TPM, with current firmware | +| AES key (HSM backend) | Ephemeral, generated per save, RSA-OAEP-wrapped with the HSM public key | Unwrapping requires the HSM private key which never leaves the device | +| Plaintext dict | `psi serve` process memory | Same protection as today — root-on-host can already read it | + +What the cache **is not** defending against: + +- Root on the host reading `psi serve` process memory +- Root invoking the lookup API directly via `curl` +- A compromised workload container that has PSI secrets injected + +What the cache **is** defending against: + +- Offline disk theft, backup tarballs, filesystem snapshots leaking + plaintext — the cache file is opaque ciphertext +- Drive disposal or RMA +- Forensic disk captures + +If the plaintext-in-process-memory trust boundary is unacceptable for your +deployment, do not enable the cache — accept the provider-is-a-SPOF +trade-off instead. + +## Configuration + +Add a `cache:` block at the top level of `/etc/psi/config.yaml`: + +```yaml +cache: + enabled: true # default: true + backend: hsm # 'tpm' or 'hsm'. Required for the cache to populate. + # path: /var/lib/psi/cache.enc # default: state_dir / cache.enc +``` + +If `cache.backend` is unset, `psi serve` logs a warning at startup and falls +back to today's live-lookup behavior. Existing installs that upgrade but do +not set a backend continue to work exactly as before. + +## Backends + +### TPM2 via `systemd-creds` + +AES-256-GCM with a 32-byte key generated by `psi cache init --backend tpm`, +sealed to the host TPM2 via `systemd-creds encrypt --tpm2-pcrs=7`, and +delivered to `psi serve` through `LoadCredentialEncrypted=psi-cache-key`. + +**Use when:** the host has TPM2 and no Nitrokey HSM, or when you want +independent encryption for the cache that does not depend on HSM hardware. + +**Trade-offs:** the key is sealed to the exact firmware state (PCR 7). A +firmware update or TPM reset will make the key undecryptable and you will +need to re-run `psi cache init` to provision a new one. The cache will +continue to populate on the next setup run. + +### Nitrokey HSM via PKCS#11 + +Hybrid RSA-OAEP-SHA256 + AES-256-GCM using the existing Nitrokey HSM +envelope format from `psi/providers/nitrokeyhsm/crypto.py`. A fresh AES key +is generated per save, wrapped with the HSM public key (software-side), and +stored inline in the envelope. Decryption requires the HSM private key, +which never leaves the device. + +**Use when:** you already have a Nitrokey HSM configured for PSI bootstrap +secrets and want the same hardware root of trust for the cache. + +**Trade-offs:** `psi serve` startup opens a PKCS#11 session, logs in, and +performs one RSA-OAEP unwrap to recover the AES key — roughly 20–30 seconds +on a Nitrokey HSM 2. Subsequent saves use the cached public key and are +software-only, so steady-state write latency is milliseconds. See the +`HealthStartPeriod=60s` note in the deployment section. + +### Envelope format on disk + +``` +magic (4 bytes) "PSIC" +version (1 byte) 0x01 +backend_tag (1 byte) 0x01 = TPM, 0x02 = HSM +payload (variable) backend-specific +``` + +**TPM payload:** +``` +nonce(12) || AES-256-GCM_ciphertext_and_tag +``` + +**HSM payload:** +``` +key_len(2) || RSA-OAEP_wrapped_AES_key || nonce(12) || AES-256-GCM_ciphertext_and_tag +``` + +Inside the AES-GCM ciphertext is a JSON object: + +```json +{ + "version": 1, + "written_at": 1775626954, + "entries": { + "workload--SECRET_KEY": "" + } +} +``` + +The outer magic and backend tag are plaintext so `psi cache status` can +report what the file thinks it was written by without opening the backend. +Everything else — including the list of secret IDs and the entry count — is +inside the encrypted payload. + +## CLI reference + +### `psi cache init --backend {tpm,hsm}` + +Provisions the encryption key and writes an empty `cache.enc`. Required once +per host when first enabling the cache. Running it again replaces the key +(TPM) or overwrites the cache file (HSM) — existing encrypted contents will +no longer be readable with the new key. + +```bash +# TPM — generates 32 random bytes, seals via systemd-creds encrypt, +# writes the sealed file to /etc/psi/cache.key +sudo psi cache init --backend tpm + +# HSM — no key to generate; the HSM already has its RSA keypair. +# Writes an empty cache.enc wrapped with the HSM public key. +sudo psi cache init --backend hsm +``` + +Running `psi cache init` without `--backend` prints the available backends +and exits non-zero by design — the backend choice is deliberately explicit. + +### `psi cache status` + +Prints config, file metadata, and the on-disk backend tag without opening +any backend. No HSM session, no TPM unseal, no decrypt. Sub-second on any +host. + +```text +Cache path: /var/lib/psi/cache.enc +Enabled: True +Backend: hsm +File size: 35894 bytes +Last written: 2026-04-08T05:52:33.478981+00:00 +On-disk tag: hsm (version 1) +Entries: not counted (pass --verify to decrypt) +``` + +### `psi cache status --verify` + +Same as above plus entry count. Requires the backend to be reachable — opens +a PKCS#11 session for HSM or reads the sealed credential for TPM. On HSM +this takes 20–30 seconds end to end. Use when you want to confirm the cache +actually decrypts and how many entries it holds. + +### `psi cache refresh` + +Re-runs `psi setup` to pull fresh secret values from every configured +provider and atomically replace the cache. Use after rotating a secret in +Infisical. + +### `psi cache invalidate ` + +Drops a single entry from the cache and persists the change. Next lookup +for that ID will fall through to the provider and cache the new value on +return. + +## Deployment walkthroughs + +### Native mode + TPM backend + +One-time provisioning: + +```bash +sudo psi cache init --backend tpm +# Writes /etc/psi/cache.key (TPM-sealed) and /var/lib/psi/cache.enc (empty) + +sudo psi systemd install --mode native +# Regenerates psi-secrets.service with LoadCredentialEncrypted=psi-cache-key +# and StateDirectory=psi + +sudo systemctl daemon-reload +sudo systemctl restart psi-secrets.service psi-infisical-setup.service +sudo psi cache status +``` + +### Container mode + TPM backend + +One-time provisioning: + +```bash +# Provision the sealed key on the host (systemd-creds is a host binary) +sudo psi cache init --backend tpm + +# Regenerate quadlets inside a one-shot psi container +sudo podman run --rm \ + --network host \ + --security-opt label=type:container_runtime_t \ + -v /etc/psi:/etc/psi:ro \ + -v /etc/containers/systemd:/etc/containers/systemd:Z \ + -v /run/podman/podman.sock:/run/podman/podman.sock \ + -v /run/dbus/system_bus_socket:/run/dbus/system_bus_socket \ + -v /etc/pki/tls/certs/ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro \ + -e SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \ + ghcr.io/quickvm/psi:latest \ + systemd install --mode container --image ghcr.io/quickvm/psi:latest + +sudo systemctl daemon-reload +sudo systemctl restart psi-secrets.service psi-infisical-setup.service +sudo podman exec -i psi-secrets psi cache status +``` + +### Container mode + HSM backend + +Prereq: the Nitrokey HSM provider is already configured and working +(`psi nitrokeyhsm preflight` passes, `hsm-pin` is available via +`LoadCredentialEncrypted`, pcscd sidecar is running). + +Add the cache config to `/etc/psi/config.yaml`: + +```yaml +cache: + enabled: true + backend: hsm +``` + +Regenerate quadlets inside the psi container so both `psi-secrets.container` +and `psi-{provider}-setup.container` pick up the HSM wiring +(pcscd socket mount, `LoadCredentialEncrypted=hsm-pin`, `After=pcscd.service`): + +```bash +sudo podman run --rm \ + --network host \ + --security-opt label=type:container_runtime_t \ + -v /etc/psi:/etc/psi:ro \ + -v /etc/containers/systemd:/etc/containers/systemd:Z \ + -v /run/podman/podman.sock:/run/podman/podman.sock \ + -v /run/dbus/system_bus_socket:/run/dbus/system_bus_socket \ + -v /etc/pki/tls/certs/ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro \ + -e SSL_CERT_FILE=/etc/ssl/certs/ca-certificates.crt \ + ghcr.io/quickvm/psi:latest \ + systemd install --mode container --image ghcr.io/quickvm/psi:latest + +sudo systemctl daemon-reload +sudo systemctl restart psi-secrets.service psi-infisical-setup.service +``` + +`psi cache init` is not strictly required on the HSM backend — `psi-infisical-setup` +creates the encrypted cache file on its first successful run. Running it +explicitly is a useful smoke test that confirms the HSM session, PIN, and +file permissions are all working before you commit to the new config. + +Verify: + +```bash +# Fast status — sub-second +sudo podman exec -i psi-secrets psi cache status + +# Full verify — opens HSM session, decrypts, counts entries +sudo podman exec -i psi-secrets psi cache status --verify + +# Check setup log for the populate path +sudo journalctl -u psi-infisical-setup.service --since '5 minutes ago' \ + | grep -E 'Writing.*cache|unavailable|WARNING' +``` + +You want to see `Writing N entries to secret cache` in the setup log. If +you instead see `Secret cache backend hsm unavailable during setup: No HSM +PIN found`, the setup quadlet is missing the HSM wiring — either your +config did not enable the cache when the quadlets were regenerated, or you +are running an older psi image. + +### Startup timing (HSM) + +On the HSM backend, `psi-secrets.service` takes roughly 20–30 seconds to +reach `active`: + +1. PKCS#11 module load, slot enumeration, session open +2. HSM login with the resolved PIN +3. Fetch the RSA public key from the HSM +4. RSA-OAEP unwrap of the AES key to decrypt the cache +5. AES-GCM decrypt and JSON parse of the entries +6. Open the Unix socket and start serving + +The generated serve quadlet sets `HealthStartPeriod=60s` to give this chain +enough headroom. Podman sends `sd_notify(READY=1)` once the first +healthcheck succeeds against `/healthz`. + +## Cold-boot caveat + +The cache only protects against provider outages **after it has been +populated at least once**. The first `psi-infisical-setup` run on a new +install still requires the provider to be reachable. If you provision a +host during an active Infisical outage: + +1. `psi-infisical-setup` exhausts its retries and fails +2. `cache.enc` stays empty +3. Container starts fall through to live lookups, which also fail + +Options: + +- Wait for the provider to come back and re-run setup: + `sudo systemctl restart psi-infisical-setup.service` +- Restore `cache.enc` from a backup if you have one that was written with + the same TPM or HSM key + +This is a trade-off of the push-at-rotation-time design — cold-boot +resilience depends on the provider being up during initial provisioning. + +## Rotation + +```bash +# Rotate a secret in Infisical, then refresh the entire cache: +sudo psi cache refresh + +# Or invalidate a single entry and let the next lookup pull it fresh: +sudo psi cache invalidate myapp--DATABASE_URL +``` + +`psi cache refresh` re-runs the same code path as +`psi-infisical-setup.service`, pulls every configured workload secret, +encrypts the bundle, and atomically replaces `cache.enc`. Running +`refresh` during an Infisical outage is safe — it will retry via the +existing setup backoff, and on failure it leaves the old cache untouched. + +## Troubleshooting + +### Setup says `Secret cache backend hsm unavailable during setup: No HSM PIN found` + +The `psi-{provider}-setup.container` quadlet is missing the HSM wiring. +Fix: regenerate the quadlets from the current psi image and daemon-reload. +The generator propagates HSM mounts automatically when `cache.backend: hsm` +is set in config. + +```bash +# Regenerate with the current image, then reload +sudo podman run --rm ... ghcr.io/quickvm/psi:latest systemd install --mode container ... +sudo systemctl daemon-reload +sudo systemctl restart psi-infisical-setup.service +``` + +### `psi serve` logs `Secret cache is enabled but no backend is configured` + +`cache.enabled: true` is set (the default) but `cache.backend` is not. Set +`cache.backend` to `tpm` or `hsm` in config, run `psi cache init --backend `, +and restart the serve unit. + +### `psi cache status` reports zero entries after a successful setup + +Check the setup log for `Writing N entries to secret cache`. If that line +is absent, setup hit a warning about the cache backend before it got to +populate. The two common causes: + +- The setup container cannot reach the HSM (pcscd volume or PIN credential + not wired up — regenerate quadlets) +- Infisical returned zero secrets for the workload (check the "Found N + secrets" log line earlier in the same setup run) + +### `cache.enc` exists but `psi cache status --verify` returns +`Entries: unreadable` + +Either: + +- The backend configured in config no longer matches the backend tag on + disk (e.g. config changed from hsm to tpm but the file was written by + hsm). `psi cache status` will show the mismatch under `On-disk tag`. + Re-run `psi cache init --backend ` to recreate the file. +- The TPM cache key was re-sealed to a new PCR set or the HSM lost its + private key. Re-run `psi cache init` to provision a new key. + +### Serve container hangs in `activating` + +`HealthStartPeriod=60s` gives HSM startup 60 seconds before the first +healthcheck. If HSM startup routinely exceeds that on your hardware, the +container will be killed and restarted in a loop. Confirm the issue with +`sudo journalctl -u psi-secrets.service` and raise `HealthStartPeriod` in +the generated quadlet (or use the TPM backend, which has much lower +startup latency).