diff --git a/docs/specs/2026-05-29-vault-relay-server-design.md b/docs/specs/2026-05-29-vault-relay-server-design.md new file mode 100644 index 0000000..d8afae7 --- /dev/null +++ b/docs/specs/2026-05-29-vault-relay-server-design.md @@ -0,0 +1,325 @@ +# Vault — relay-server side (Postgres + `/api/vault`) + +Status: spec — 2026-05-29 +Scope: **Astation relay-server only.** This is the server half of the "vault" +feature. The atem-side CLI (`atem vault …`) is already merged (Atem repo PR #10) +and calls the endpoints defined here. + +> This spec is self-contained — you do not need the Atem repo to start. The +> atem-side counterpart is `Atem/designs/vault.md` (data model + auth model) and +> `Atem/designs/vault-implementation-plan.md` (the client). This doc restates +> everything the relay-server needs and is the authoritative server contract. + +## What we're building + +A **vault** is a small, durable, append-only, shared context store. Multiple +atems (each driving its own coding agent) read/write a common vault so agents +working toward one goal hand off notes/decisions without a human copy-pasting +between terminals. + +The relay-server (this repo) hosts vaults over HTTP, backed by **Postgres**, and +enforces access control. atems are the only clients — agents never talk to the +vault directly. + +The atem CLI already shipped and issues exactly these calls (so the contract is +fixed): + +``` +Auth header (all requests): Authorization: session +Query (all requests): ?id= (atem instance_id) + +POST /api/vault {summary} -> {vault_id} +GET /api/vault -> [{vault_id, summary}] +GET /api/vault/ [?since=&history=true] -> [VaultEntry] +POST /api/vault/ {text, entry_id?} -> {entry_no, version, seq} +POST /api/vault//summary {text} -> {} (200 OK) + +VaultEntry = {seq, entry_no, version, kind, writer_id, content, created_at} +``` + +(The atem client types live in `Atem/src/vault_client.rs` — `VaultEntry`, +`CreatedVault {vault_id}`, `VaultListItem {vault_id, summary}`, +`WriteResult {entry_no, version, seq}`. Match these field names exactly or the +client's deserialization breaks.) + +## Why this is non-trivial here + +The relay-server today is **fully in-memory** — every store is +`RwLock>` (`SessionStore`, `RelayHub`, `RtcSessionStore`, +`SessionVerifyCache`, `VoiceSessionStore`; see `src/main.rs:25-33`). `Cargo.toml` +has **no database crate**. Vault content must survive restarts and be readable by +an atem that was offline when it was written, so the vault is the **first +persistent store** in this service. That is the bulk of the new work: add +Postgres, a migration, a connection pool in `AppState`, and a `vault_routes.rs` +module following the existing route conventions. + +## Data model (Postgres) + +Two tables. `vaults` holds mutable per-vault metadata plus a denormalized +`writer_list` for fast authz. `vault_entries` is append-only and versioned. + +```sql +CREATE TABLE vaults ( + vault_id TEXT PRIMARY KEY, -- short, URL-safe, e.g. "v-7Kf3qD" + summary TEXT NOT NULL DEFAULT '', -- mutable description + work_session_id TEXT NOT NULL, -- the work session this vault belongs to (see Auth) + created_by TEXT NOT NULL, -- client_id of creator + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + writer_list TEXT[] NOT NULL DEFAULT '{}', -- denormalized content-writer client_ids + next_entry_no INT NOT NULL DEFAULT 1 -- per-vault entry-number allocator +); + +CREATE TABLE vault_entries ( + seq BIGSERIAL PRIMARY KEY, -- global write order (also the --since cursor) + vault_id TEXT NOT NULL REFERENCES vaults(vault_id), + entry_no INT NOT NULL, -- per-vault: 1,2,3 -> shown as e1, e2, e3 + version INT NOT NULL, -- per-entry: 1,2,3 -> shown as v1, v2, v3 + kind TEXT NOT NULL, -- 'content' | 'summary' + writer_id TEXT NOT NULL, -- client_id that wrote this row + content TEXT NOT NULL, + created_at TIMESTAMPTZ NOT NULL DEFAULT now(), + UNIQUE (vault_id, entry_no, version) +); + +CREATE INDEX vault_entries_by_vault_seq ON vault_entries (vault_id, seq); +``` + +### Append vs. override (no separate "override" kind) + +- **Append** (`POST /api/vault/` with `{text}`, no `entry_id`): allocate + `entry_no = vaults.next_entry_no` then `next_entry_no += 1`; insert with + `version = 1`. +- **Override / edit** (`POST /api/vault/` with `{text, entry_id: N}`): keep + `entry_no = N`; insert with `version = max(version where entry_no = N) + 1`. + +Append-vs-override is fully derivable from `version` (v1 = first write, v2+ = +edit), so there is **no** `override` kind. `kind ∈ {content, summary}` only. +Both operations must be transactional (allocate `entry_no`/compute `version` and +insert inside one transaction) so concurrent writers can't collide on +`(vault_id, entry_no, version)`. + +### Render semantics (drives the read queries) + +- **Current view** (`GET /api/vault/` with no `history`): for each + `entry_no`, return only the row with the **highest `version`**, ordered by + `entry_no` ascending. SQL: `DISTINCT ON (entry_no) … ORDER BY entry_no, + version DESC` (or a window function). +- **History** (`?history=true`): return **every** row ordered by `seq` ascending. +- **Incremental** (`?since=`): only rows with `seq > `. Combine with + the above (history-since, or current-view filtered — for v1, `since` applies to + the history query; the client uses it for the `watch` cursor). + +The atem client renders these; the server just returns the rows as JSON arrays +of `VaultEntry`. + +## Auth & work-session resolution + +This is the **one real design decision** for the server. Everything above is +fixed by the client contract; this part depends on how relay sessions bind to a +"work session." + +Two tokens arrive on every request: + +| Token | Source | Role | +|-------|--------|------| +| `session_id` (in `Authorization: session `) | the atem↔Astation session | **authenticates** — proves the caller is a real, granted atem, and resolves its `work_session_id` | +| `client_id` (in `?id=`) | the atem's persistent `instance_id` (UUID) | **authorizes** — checked against the vault's `writer_list` | + +Authorization predicates (enforce server-side): + +``` +can_read(vault, caller): + caller.work_session_id == vault.work_session_id -- in the same work session + OR caller.client_id = ANY(vault.writer_list) -- past content-writer + +can_write(vault, caller): -- append / override content + caller.work_session_id == vault.work_session_id -- in-session only +``` + +- In-session atems get full read + write. +- Out-of-session atems get **read-only**, and only if their `client_id` is in + `writer_list` (they contributed earlier, in a prior session). +- `set-summary` uses the **read** predicate (summary is mutable + low-stakes; any + atem that can see the vault may update it). +- `writer_list` is appended (`client_id`, dedup) only on **content writes** + (append/override), not on reads or set-summary. + +### Resolving `work_session_id` — pick one (recommended: A) + +The "work session" is the set of atems collaborating with one Astation. In the +relay's room model that is the room keyed by the astation_id (`code`), where +multiple atems share one room (`PairRoom.atem_txs` is keyed by atem_id; +`src/relay.rs:30-39`). So **`work_session_id` should resolve to the astation_id +the session is bound to**, NOT to `session-{session_id}` (that would isolate each +atem in its own room and break sharing). + +- **Option A (recommended): bind sessions to an astation_id, use that.** + Persist the astation_id on the relay `Session` when it's granted (today + `Session` stores `hostname` but not astation_id; `src/auth.rs:15-24`). Vault + auth looks up the session, reads its astation_id, and uses it as + `work_session_id`. Clean, no extra params, matches the room model. Requires a + small change to session creation/grant to capture astation_id. + +- **Option B: caller passes the work session explicitly + server verifies.** + Add `?work_session=` to vault requests and verify the session is + authorized for it (via `SessionVerifyCache`, `src/session_verify.rs`, which + already verifies sessions against Astation). No `Session` schema change, but it + adds a param the atem client does **not** send today — so it needs an atem-side + change too. Avoid unless A is infeasible. + +- **Option C (interim/testing only): `work_session_id = session_id`.** + Trivial, but each atem is its own work session → no cross-atem sharing. Only + acceptable as a first vertical slice to exercise the CRUD before wiring real + session→astation binding. Do not ship as the final behavior. + +**Session validation itself** reuses the existing machinery: validate the +`session_id` the same way the WS `?session=` path does (`src/relay.rs:227-261` — +session must exist and be `Granted`), and/or `SessionVerifyCache` +(`src/session_verify.rs`) for cross-service verification. Return **401** for a +missing/invalid session, **403** when the session is valid but `can_read`/ +`can_write` fails. + +## `atem_id` sanitizer change (required for non-ASCII ids) + +atem now generates ids that may contain non-ASCII (Chinese/Japanese/Korean +hostnames) and percent-encodes them into the relay URL. The current sanitizer +**strips non-ASCII and does not percent-decode** (`src/relay.rs:303-308`): + +```rust +let atem_id = params.atem_id + .as_deref() + .map(|s| s.chars().filter(|c| c.is_alphanumeric() || *c == '-' || *c == '_' || *c == '.').collect::()) + .filter(|s| !s.is_empty()) + .unwrap_or_else(|| format!("atem-{:x}", rand::thread_rng().gen::())); +``` + +Update it to **percent-decode first**, then keep non-ASCII while restricting +ASCII to `[A-Za-z0-9-]` (matching atem's own rule in +`Atem/designs/atem-identity.md`): + +```rust +let decoded = urlencoding::decode(params.atem_id.as_deref().unwrap_or("")) + .map(|c| c.into_owned()) + .unwrap_or_default(); +let atem_id: String = decoded + .chars() + .filter(|c| !c.is_ascii() || c.is_ascii_alphanumeric() || *c == '-') + .collect(); +let atem_id = if atem_id.is_empty() { + format!("atem-{:x}", rand::thread_rng().gen::()) +} else { + atem_id +}; +``` + +(`urlencoding` is already a dependency. `is_alphanumeric()` previously also +allowed non-ASCII alphanumerics; the new filter is explicit about the rule.) + +This is independent of the vault tables and can land as its own small commit. + +## Implementation tasks + +Follow the existing relay-server conventions: a `vault_routes.rs` module with +handlers returning `Result, (StatusCode, Json)>` (see +`src/rtc_session.rs:432-449`), errors as `(StatusCode, Json(json!({"error": …})))`, +request types deriving `Deserialize, Validate`, inline `#[cfg(test)] mod tests` +with `#[tokio::test]` + `tower::ServiceExt::oneshot`. + +### Task 1 — Add Postgres + a `VaultStore` +- `Cargo.toml`: add `sqlx = { version = "0.7", features = ["runtime-tokio", "postgres", "macros", "chrono", "uuid"] }` (sqlx fits the existing tokio/async style; alternatives `tokio-postgres` or `deadpool-postgres` are acceptable). +- Add a migration (`migrations/0001_vault.sql` with the two `CREATE TABLE`s above) and run it at startup (sqlx `migrate!`), or document `sqlx migrate run`. +- `src/vault_store.rs`: a `VaultStore { pool: sqlx::PgPool }` with methods: + - `create_vault(work_session_id, created_by, summary) -> vault_id` + - `list_readable(work_session_id, client_id) -> Vec` + - `read(vault_id, since: Option, history: bool) -> Vec` + - `append(vault_id, writer_id, text) -> WriteResult` (txn: allocate entry_no, version=1) + - `override_entry(vault_id, entry_no, writer_id, text) -> WriteResult` (txn: version=max+1) + - `set_summary(vault_id, text)` + - `get_meta(vault_id) -> {work_session_id, writer_list}` (for authz) + - `add_writer(vault_id, client_id)` (dedup; on content writes) +- Connection string from env (`DATABASE_URL`); document in README + `docker-compose`. + +### Task 2 — Wire `VaultStore` into `AppState` +- `src/main.rs:25-33`: add `pub vault: VaultStore` to `AppState`. +- Build the pool in `main()` before constructing `AppState`; fail fast if + `DATABASE_URL` is unset/unreachable. + +### Task 3 — Session→work_session resolution (Option A) +- Extend the relay `Session` (`src/auth.rs:15-24`) + session create/grant to + capture the astation_id the session is bound to. +- Add a helper `resolve_caller(headers, query) -> Result` + that parses `Authorization: session ` + `?id=`, validates the + session (Granted), and returns the bound astation_id as `work_session_id`. + 401 on bad session. + +### Task 4 — `vault_routes.rs` handlers + route wiring +- Implement the 5 endpoints exactly per the contract above, each calling + `resolve_caller` then the `VaultStore`, enforcing `can_read`/`can_write` + (403 on failure), and returning the documented JSON. +- `POST /api/vault/`: branch on `entry_id` present → `override_entry` else + `append`; then `add_writer`. +- Register routes in `main.rs` alongside the others (`main.rs:153-229`). + +### Task 5 — `atem_id` sanitizer +- Apply the `src/relay.rs:303-308` change above. Add a unit test with a + percent-encoded CJK `atem_id` asserting it round-trips (decoded, non-ASCII + preserved). + +### Task 6 — Tests +- Handler tests with `oneshot` for: create→read roundtrip, append then + override (current view shows v2, history shows v1+v2), `--since` filtering, + authz (in-session read+write; out-of-session past-writer read-only → 403 on + write; stranger → 403 on read), set-summary. +- Store tests against a test Postgres. Options: `sqlx::test` fixtures, a + testcontainers Postgres, or abstract `VaultStore` behind a trait with an + in-memory impl for handler tests + a thin live-DB integration test. Pick one + and note it; don't block all tests on a live DB. + +### Task 7 — Notification (v1.5, optional in this pass) +- After a committed content write, broadcast `vault-updated {vault_id, seq}` to + the relay room for that `work_session_id` (the room keyed by astation_id), so + watching atems re-read. The relay already broadcasts Astation→all + (`src/relay.rs:314-318`); reuse that path. The atem `watch` subscriber is a + separate atem-side v1.5 task — server can land the broadcast now or defer. + +## Acceptance test + +1. `DATABASE_URL` set; migrations applied; relay-server running. +2. Two atems paired to the same Astation (same room/work session). +3. atem A: `atem vault new --summary "auth refactor"` → prints `v-XXXX`. +4. atem A: `atem vault write --vault-id v-XXXX --text "decided: JWT in cookie"`. +5. atem B: `atem vault read --vault-id v-XXXX` → sees the entry. +6. atem A: `atem vault write --vault-id v-XXXX --entry-id 1 --text "JWT, 15m exp"`. +7. atem B: `atem vault read --vault-id v-XXXX --history` → shows `e1 v1` + `e1 v2`. +8. An atem **not** in the work session and **not** in `writer_list` → `read` + returns 403. An out-of-session past-writer → `read` OK, `write` → 403. + +## Build / deploy + +```bash +# relay-server dir +cargo build +cargo test # see Task 6 re: DB-backed tests +DATABASE_URL=postgres://… cargo run +``` + +- Add a `postgres` service to `docker-compose.yml` / `docker-compose.dev.yml` + and pass `DATABASE_URL` to the relay-server container. +- Document `DATABASE_URL` alongside the existing env vars (`CORS_ORIGIN`, + `PORT`, `PUBLIC_BASE_URL`, …). + +## Open questions + +1. **Session→astation binding (Task 3).** Confirm the relay `Session` can carry + the astation_id (Option A). If sessions are minted without knowing the bound + astation_id, fall back to Option B (explicit `?work_session=` + verify) — but + that needs a matching atem-side change. +2. **vault_id generation.** Short, URL-safe, not trivially enumerable (authz is + server-enforced so secrecy isn't load-bearing, but don't use sequential ids). + Suggest `v-` + 8–12 base62 random chars. +3. **Summary history.** v1 keeps `summary` as a mutable column only. The schema + supports logging `kind='summary'` rows later if history is wanted. +4. **`writer_list` growth.** Append-only; fine for small teams. Revisit GC if a + vault accrues many one-off writers. +5. **DB-backed test strategy (Task 6).** Decide: `sqlx::test` vs testcontainers + vs store-trait + in-memory. Affects CI.