feat(pitr): anchor commit-timestamp after each backup by paulocsanz · Pull Request #75 · railwayapp-templates/postgres-ssl

paulocsanz · 2026-05-11T21:46:48Z

Summary

Emits one transactional pg_logical_emit_message(true, 'rwy_pitr_anchor', '') from the backup-watcher right after each successful pgbackrest backup. Produces an XLOG_XACT_COMMIT record with a tracked commit timestamp, populating pg_commit_ts/ and (on the next checkpoint) newest_commit_ts_xid in pg_control.

Why: the PITR picker (mono/usePgbackrestProbe) reads GREATEST(pg_last_committed_xact(), pg_xact_commit_timestamp(newest_commit_ts_xid from pg_control_checkpoint())) as the only safe ceiling for recovery_target_time. On a brand-new cluster with a base backup but zero user commits, both return NULL and the picker can't anchor a restore — any target FATALs recovery with "recovery ended before configured recovery target was reached" (it only stops at commit records).

Anchoring once per backup eliminates that dead-end. Replaces the prior UX of telling users to manually CREATE TABLE _warmup(); DROP TABLE _warmup;.

Idempotent: every backup re-fires the emit; once user commits exist it's invisible noise (one trivial transaction, no table side effect). Failure is non-fatal — psql errors are logged and the next backup retries.

Test plan

bash -n pgbackrest-backup-watcher.sh syntax-clean.
Manual: fresh cluster, PITR enabled, no user writes → wait for initial full backup → confirm SELECT (pg_last_committed_xact()).timestamp returns a non-NULL value within ~30s of backup completion.
Manual: existing cluster with active writes → backup completes, no observable behavior change (anchor is one extra trivial commit, lost in the noise).
Manual: restored cluster on a noCommitsYet source → with this change, the picker resolves on the next 30s tick and recovery_target_time restores work normally.
Regression: archive heartbeat (emit_wal_heartbeat) still fires independently.

Emit one transactional `pg_logical_emit_message(true, 'rwy_pitr_anchor', '')` immediately after each successful pgbackrest backup. Produces an XLOG_XACT_COMMIT record with a tracked commit timestamp, populating `pg_commit_ts/` and `newest_commit_ts_xid` so the PITR picker (mono/usePgbackrestProbe) has a ceiling to clamp `recovery_target_time` against. Eliminates the noCommitsYet dead-end: a fresh cluster with a base backup no longer leaves the picker without an anchor, and the prior UX of telling users to "connect and CREATE TABLE _warmup" goes away. Idempotent — repeated on every backup; invisible noise once user commits exist. Failure is non-fatal; next backup retries.

The post-backup anchor commit (this PR's primary change) deliberately gives `recovery_target_time` a stop record on previously-idle sources — which is exactly what the test was asserting must NOT happen. With the anchor, recovery now lands cleanly on the watcher's emit, no FATAL. Drop the test and its `setup_idle_source` helper (no other callers). Update the file-level coverage map to point idleRestore coverage at `t_pitr_target_xid_routes_xid_through_stack` and to call out that target_time on a quiet source is now happy-path territory. The wrapper.sh comment block about `recovery_target_xid` winning over `_TIME` for exactness on idle sources is still accurate (the picker clamps to lastCommittedTxnAt for precision regardless of anchor behavior) — no change there.

…sleep The test waited a fixed 6s after the second `docker run -d`, then single-grepped the docker logs for the "PITR replay staged" line. On slow CI runners the wrapper hadn't reached `configure_pgbackrest_recovery` within 6s (docker run-d returns at create-time, not at process-start, and the wrapper does cert checks + conf rendering before staging). The race flaked once and then failed deterministically on two consecutive runs without the staging line being missing — it just hadn't been written yet. Poll the log for up to 30s instead. Same exit semantics; just stops flaking when the host is slow.

paulocsanz added 3 commits May 11, 2026 18:46

paulocsanz merged commit d37575b into main May 11, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pitr): anchor commit-timestamp after each backup#75

feat(pitr): anchor commit-timestamp after each backup#75
paulocsanz merged 3 commits into
mainfrom
pc/pitr-anchor-after-backup

paulocsanz commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

paulocsanz commented May 11, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant