PostgreSQL High Availability Template for Railway

This template deploys a production-ready, highly-available PostgreSQL cluster on Railway with automatic failover, read replicas, and transparent connection routing.

Features

3-node PostgreSQL cluster with streaming replication
Automatic failover in <10 seconds using Patroni
2 standby replicas for automatic failover and read scaling
HAProxy load balancer with separate read-write and read-only endpoints
Transparent routing - single endpoint for all database connections
Automatic recovery - failed nodes rejoin as replicas
Health monitoring - Patroni REST API for cluster status
Multi-version support - PostgreSQL 14, 15, 16, 17, and 18

Architecture

Application
    ↓
HAProxy ← Load balanced endpoint
    ├─ :5432 (read-write) → Primary only
    └─ :5433 (read-only)  → Replicas (round-robin)
    ↓
PostgreSQL Cluster
    ├─ postgres-1 (Leader)  ← Writes + Reads
    ├─ postgres-2 (Standby) ← Reads + Failover ready
    └─ postgres-3 (Standby) ← Reads + Failover ready
         ↓
    etcd (3 nodes) ← Distributed consensus

Services Deployed

postgres-1, postgres-2, postgres-3 - PostgreSQL with Patroni orchestration
etcd-1, etcd-2, etcd-3 - Distributed key-value store for leader election
haproxy - Load balancer with automatic primary detection

Total: 7 services

Point-in-time recovery (opt-in)

The postgres-patroni image ships with pgBackRest installed but dormant. When WAL_ARCHIVE_BUCKET is unset, the image behaves identically to a vanilla HA Postgres image — no archiving, no config mutation. When set, the current Patroni leader archives WAL continuously to S3-compatible storage in async mode (archive_mode=on); standbys carry the same config + binary so promotion instantly enables archiving with no config change.

The image reads a tool-agnostic WAL_ARCHIVE_* / WAL_RECOVER_FROM_* env contract. patroni-runner translates WAL_ARCHIVE_* to pgBackRest's native PGBACKREST_REPO1_S3_* so the main pgbackrest.conf's repo1 always = the cluster's own archive bucket. WAL_RECOVER_FROM_* is materialised in a separate /etc/pgbackrest/pgbackrest-recovery-source.conf (only that file references the source bucket) and is NEVER exported as env vars — pgBackRest's option resolution is command-line > env > config

default, so a global env export would silently override the --config we pass during recovery and could leak archive-push to the source bucket. With this isolation:

WAL_ARCHIVE_* only → standalone archiving cluster. Main conf's repo1 = archive bucket.
WAL_RECOVER_FROM_* only → cluster restoring from a source bucket; recovery-source conf has the source. After promote the cluster runs as plain non-archiving Postgres.
WAL_ARCHIVE_* + WAL_RECOVER_FROM_* → restored cluster with PITR re-enabled. Main conf's repo1 = own archive bucket (post-promote archive-push); recovery-source conf has source bucket (archive-get during replay only). Source and own bucket never appear in the same config file, so archive-push from the fork can never spray into the source bucket.

Operator-facing env contract:

Env var	Purpose
`WAL_ARCHIVE_BUCKET`	bucket name — gates archiving on this cluster
`WAL_ARCHIVE_ENDPOINT`	S3-compatible endpoint (e.g. `fly.storage.tigris.dev`)
`WAL_ARCHIVE_REGION`	bucket region
`WAL_ARCHIVE_KEY` / `WAL_ARCHIVE_SECRET`	bucket credentials
`WAL_ARCHIVE_PATH`	path prefix where archive-push writes (default `/pgbackrest`)
`WAL_RECOVER_FROM_BUCKET` / `_ENDPOINT` / `_REGION` / `_KEY` / `_SECRET` / `_PATH`	source-bucket coordinates on a PITR-restored cluster; `archive-get` reads source WAL from here during replay. Set by backboard on restore; not normally a manual knob.
`POSTGRES_RECOVERY_TARGET_TIME`	ISO 8601 timestamp; stages archive-recovery replay on next start
`POSTGRES_ARCHIVE_TIMEOUT`	seconds Postgres waits before forcing a WAL switch (default `60`)
`WAL_BACKUP_FULL_INTERVAL_HOURS`	image-owned full base-backup cadence (default `168` = weekly; `0` disables periodic fulls). Initial / gap-recovery fulls fire regardless.
`WAL_BACKUP_DIFF_INTERVAL_HOURS`	image-owned differential base-backup cadence (default `24`; `0` disables)
`WAL_BACKUP_RETENTION_FULL`	full backups kept by `pgbackrest expire` (default `4`)
`WAL_BACKUP_RETENTION_DIFF`	differentials kept by `pgbackrest expire` (default `14`)
`WAL_HEARTBEAT_DISABLED`	set to `1` to disable the idle-DB WAL heartbeat (advanced; reduces archive cost on quiet DBs at the price of stale PITR ceiling)

Image-level tuning knobs (pgBackRest-native, internal):

Env var	Purpose
`WAL_DROP_THRESHOLD_MB`	`pg_wal/` size at which the archive-push wrapper drops failing segments to keep Postgres running (default `500`). Outside the `PGBACKREST_` namespace because pgBackRest treats unknown `PGBACKREST_` vars as config options and warns about them on every push.
`PGBACKREST_ARCHIVE_PUSH_PROCESS_MAX`	parallel workers for `archive-push`. Default auto-sized as `clamp(cpus/8, 2, 8)`.
`PGBACKREST_ARCHIVE_GET_PROCESS_MAX`	parallel workers for `archive-get`. Default `1` (WAL replay is serial).
`PGBACKREST_BACKUP_PROCESS_MAX`	parallel workers for `backup`. Default auto-sized as `clamp(cpus/4, 1, 16)`.
`PGBACKREST_RESTORE_PROCESS_MAX`	parallel workers for `restore`. Default auto-sized as `clamp(cpus, 1, 32)`.

When WAL_ARCHIVE_BUCKET is set, patroni-runner writes archive_mode=on, archive_command='/usr/local/bin/pgbackrest-archive-push-wrapper.sh %p', and archive_timeout (default 60, override via POSTGRES_ARCHIVE_TIMEOUT) into the Patroni-generated cluster config, and renders /etc/pgbackrest/pgbackrest.conf with operator-policy defaults (archive-async=y, archive-push-queue-max=5GiB, per-command process-max sized off cgroup-detected vCPU, compress-type=zst, spool-path=$PGDATA/pgbackrest-spool so segments staged but not yet pushed survive container restarts).

Patroni propagates the same config to every node, but under archive_mode=on only the current leader fires archive_command. Standbys hold the pgBackRest binary and config so promotion instantly enables archiving with no config re-push. Residual failover RPO is archive_timeout (60s) plus failover-detection time; Patroni's promotion logic flushes any pending archive_status/.ready markers on the new leader to close most of that gap on planned switchover.

The never-halt guarantee comes from two orthogonal thresholds:

archive-push-queue-max=5GiB (image-baked) governs the spool. Trips on transient S3 stalls — async worker keeps retrying and most segments eventually land. Generous buffer to absorb multi-hour outages.
WAL_DROP_THRESHOLD_MB=500 (default) governs pg_wal/ when pgbackrest's foreground returns non-zero. Trips on hard failures (bad creds, deleted bucket) where retrying without operator intervention has zero chance of success. The archive_command is the wrapper script (not pgbackrest directly), which measures pg_wal/ on failure and drops segments past the threshold to keep Postgres up.

Either threshold tripping truncates the PITR window; the database keeps running. This is the explicit reason this image uses pgBackRest instead of wal-g — synchronous archive_command shapes halt Postgres on full pg_wal, requiring manual intervention to recover.

patroni-runner runs pgbackrest stanza-create automatically once Postgres becomes reachable — idempotent, runs from every node, gated on WAL_ARCHIVE_BUCKET being set (skipped in dual-repo mode so we never mutate the source's stanza).

Env-var changes on existing clusters: patroni-runner includes a one-shot DCS reconcile that runs after Patroni's REST API comes up. When WAL_ARCHIVE_BUCKET is set but DCS lacks archive params (or vice versa), the reconcile patches DCS via PATCH /config so the env-var intent is authoritative. Because archive_mode is PGC_POSTMASTER, applying the patch flags pending_restart on the node — operators must run patronictl restart <cluster> <node> (or trigger one redeploy per node through Railway) for archiving to actually start or stop. The dashboard PITR enable/disable flow handles the rolling restart automatically; raw env-var users need one extra restart per node.

PITR restore creates a new service

Per RFC, PITR restore creates a brand-new Postgres service in the project (typically a single-node SSL Postgres, not a fresh HA cluster); the source HA cluster stays online and untouched. The restored service's volume is populated from the source's snapshot, then booted with WAL_RECOVER_FROM_* pointing at the source's bucket and POSTGRES_RECOVERY_TARGET_TIME set — archive-get reads source WAL during replay. After promote, the restored service has no archive bucket of its own and runs as plain non-archiving Postgres until the operator opts in via the standard PITR-enable flow.

Source and restored services therefore never share a write path: there is no risk of the recovered timeline overwriting the source's ongoing WAL chain. The previous "mandatory repo-path divergence" guard and the .pgbackrest_source_path sentinel are gone.

For backwards compatibility, when POSTGRES_RECOVERY_TARGET_TIME is set on an HA primary volume directly, patroni-runner still creates recovery.signal and writes the matching recovery settings into postgresql.auto.conf before starting Patroni. The restore_command references --config=/etc/pgbackrest/pgbackrest-recovery-source.conf so archive-get during replay reads only the source bucket. Two filesystem stamps coordinate "exactly once per successful promote": .pitr_staging is written when replay is handed to Postgres, .pitr_configured is written on the boot AFTER Postgres consumes recovery.signal (which it only removes on successful promote). A failed replay leaves .pitr_staging behind WITHOUT .pitr_configured — fix env vars and restart, the next boot re-stages cleanly.

Image-owned base backups

When WAL_ARCHIVE_BUCKET is set, patroni-runner spawns a leader-only backup-watcher task that polls every 60s. Every iteration the watcher asks the local Patroni REST API (GET /leader, 200 = leader, 503 = replica) — replicas no-op the iteration, so only the current leader runs backups. After failover the new leader's watcher takes over within one poll cycle. pgBackRest's stanza locks are the second-line guarantee against concurrent backups across cluster nodes.

The watcher runs pgbackrest backup against the archive bucket when one of these conditions holds:

Initial backup — no full has been recorded on this volume. Triggers a --type=full immediately; pgBackRest brackets the base in pg_backup_start/pg_backup_stop and waits for the closing WAL to archive before declaring success, so a broken archive_command fails the backup loudly instead of producing an unrestorable base.
Gap recovery — either the archive-push wrapper dropped a segment (touches $PGDATA/.pgbackrest_gap_pending) or pg_stat_archiver.failed_count grew since the last full. Once archive failures have been quiescent for WAL_BACKUP_GAP_RESOLVED_GRACE_SECONDS (default 300s), runs a fresh full so the PITR window resumes from the new base. The dropped segment itself remains unrestorable; everything from the new base forward is.
Periodic — WAL_BACKUP_FULL_INTERVAL_HOURS (default 168 h / weekly) for fulls, WAL_BACKUP_DIFF_INTERVAL_HOURS (default 24 h) for differentials. Set either to 0 to disable that schedule.

Every iteration also emits a tiny non-transactional WAL record (pg_logical_emit_message) so archive_timeout=60 flushes a segment on idle DBs — without it the picker's "latest restorable" lags wall-clock by the checkpoint interval (default 5 min) on quiet services. Disable via WAL_HEARTBEAT_DISABLED=1.

State persists at $PGDATA/.pgbackrest_backup_state (key=value lines: last_full_at, last_diff_at, last_full_failed_count). The bucket-side pgbackrest --stanza=main info --output=json is the canonical source of truth; the local file is a cache that survives restarts.

Known gap: pgBackRest's archive-push-queue-max trip drops segments without going through the archive-push wrapper and without incrementing failed_count, so neither gap signal fires. Until log parsing or LSN-lag detection lands, queue-max-trip gaps are sealed by the next periodic full rather than promptly.

Per-cluster archive paths

Each cluster archives under a sub-prefix derived from its system_identifier: ${WAL_ARCHIVE_PATH}/cluster-<system_identifier>. The path is persisted in $PGDATA/.pgbackrest_repo_path so the archive-push wrapper, the backup watcher, pgbackrest stanza-create, and out-of- band invocations (e.g. mono's PITR coverage probe over SSH) all converge on the same value. patroni-runner rewrites the rendered /etc/pgbackrest/pgbackrest.conf's repo1-path= line once pg_control is on disk, so an SSH-driven pgbackrest info (which inherits no shell env from the container) reads the per-cluster path correctly.

Why per-cluster: a wipe-and-reuse-bucket cycle (operator drops the data volume, redeploys against the same WAL_ARCHIVE_BUCKET) produces a brand-new system_identifier from initdb. Without discrimination, pgBackRest's stanza-create would refuse the new cluster on system-id mismatch and the new cluster's WAL would never land — silent data loss for any operator who didn't notice. With per-cluster paths, the new cluster lands at cluster-<new_sysid>, the previous cluster's archive stays at cluster-<old_sysid>, and both histories coexist.

WAL_RECOVER_FROM_PATH on a restored cluster must point at the specific source-side cluster-<sysid> sub-prefix the user wants to restore from — pgbackrest restore reads from one path. Backboard discovers per-cluster sub-prefixes by listing the bucket and surfaces them as separate "histories" in the restore UI.

Retention

For PITR-enabled clusters, pgbackrest expire is the sole WAL retention authority — no Postgres GUCs, no bucket-side lifecycle policy. Backup manifests pin the WAL needed to make each backup restorable; expire releases both together when a backup ages out. Earlier iterations proposed a bucket-side TTL as a safety net but it's superfluous: any TTL shorter than expire's horizon would yank WAL out from under live manifests, and any TTL ≥ that horizon is redundant.

pgbackrest expire runs automatically after each pgbackrest backup the watcher invokes, removing fulls/diffs beyond WAL_BACKUP_RETENTION_FULL / WAL_BACKUP_RETENTION_DIFF, plus the WAL their manifests no longer pin. The default retention (full=4, diff=14, weekly fulls + daily diffs) covers approximately a four-week PITR window before the oldest full ages out.

Quick Start

Deploy to Railway

Click "Deploy Template" in Railway marketplace
Configure variables (or use defaults):
- POSTGRES_USER - Database username (default: railway)
- POSTGRES_PASSWORD - Auto-generated secure password
- POSTGRES_DB - Database name (default: railway)
Wait for deployment (~2-3 minutes)
Connect using the provided DATABASE_URL

Connection Strings

Once deployed, connect to your database via HAProxy:

# Primary (read-write) - From Railway private network
postgresql://railway:${POSTGRES_PASSWORD}@haproxy.railway.internal:5432/railway

# Replicas (read-only) - For read scaling
postgresql://railway:${POSTGRES_PASSWORD}@haproxy.railway.internal:5433/railway

# From external (via TCP proxy)
postgresql://railway:${POSTGRES_PASSWORD}@${HAPROXY_TCP_PROXY_DOMAIN}/railway

HAProxy automatically routes connections:

Port 5432: Routes to current Patroni leader (read-write)
Port 5433: Load-balances across healthy replicas (read-only)

Configuration

Environment Variables

Variable	Default	Description
`POSTGRES_USER`	`railway`	Database superuser
`POSTGRES_PASSWORD`	Auto-generated	Superuser password
`POSTGRES_DB`	`railway`	Default database name
`PATRONI_SCOPE`	`pg-ha-cluster`	Cluster identifier
`PATRONI_TTL`	`30`	Leader lease TTL (seconds)
`PATRONI_LOOP_WAIT`	`10`	Health check interval
`POSTGRES_NODES`	(required for HAProxy)	Node list in format `host:pgport:patroniport,...`
`HAPROXY_MAX_CONN`	`1000`	Maximum concurrent connections
`HAPROXY_CHECK_INTERVAL`	`3s`	Backend health check interval

Scaling HAProxy

HAProxy is stateless and can be scaled horizontally via Railway replicas:

# In haproxy/railway.toml
[deploy]
numReplicas = 3

Monitoring

Health Checks

Each service exposes health endpoints:

# PostgreSQL + Patroni
curl http://postgres-1.railway.internal:8008/health
curl http://postgres-1.railway.internal:8008/cluster  # Full cluster status
curl http://postgres-1.railway.internal:8008/primary  # 200 if primary
curl http://postgres-1.railway.internal:8008/replica  # 200 if replica

# etcd
curl http://etcd-1.railway.internal:2379/health

# HAProxy stats dashboard
curl http://haproxy.railway.internal:8404/stats

Cluster Status

Check Patroni cluster status from any PostgreSQL node:

curl http://postgres-1.railway.internal:8008/cluster

Response:

{
  "members": [
    {
      "name": "postgres-1",
      "role": "leader",
      "state": "running",
      "timeline": 1,
      "lag": 0
    },
    {
      "name": "postgres-2",
      "role": "replica",
      "state": "streaming",
      "timeline": 1,
      "lag": 0
    },
    {
      "name": "postgres-3",
      "role": "replica",
      "state": "streaming",
      "timeline": 1,
      "lag": 0
    }
  ]
}

HAProxy Stats

Access the HAProxy stats dashboard at http://haproxy.railway.internal:8404/stats for real-time backend health and connection metrics.

Failover Behavior

Automatic Failover (Primary Crashes)

Timeline:

T+0s   postgres-1 (leader) crashes
T+3s   HAProxy health check fails (3s interval)
T+6s   HAProxy marks backend DOWN (fall 3)
T+8s   Patroni elects new leader via etcd
T+10s  HAProxy routes to new primary

Impact:

Existing write connections: Dropped (apps retry)
Existing read connections: Unaffected (if using :5433)
New connections: Routed to new primary
Total downtime: ~10 seconds

Automatic Recovery (Failed Node Returns)

When postgres-1 recovers:

T+0s   postgres-1 restarts
T+3s   Patroni registers with etcd
T+4s   Discovers postgres-2 is leader
T+5s   Rejoins as replica
T+10s  Begins streaming replication
T+12s  HAProxy adds to replica pool

Result: Original primary rejoins as replica, not leader.

Manual Switchover

To manually switch leaders (zero-downtime):

# From inside any PostgreSQL container
patronictl -c /etc/patroni/patroni.yml switchover

# Follow prompts to select new leader

Local Development

Test the cluster locally with Docker Compose:

docker-compose up -d

This starts all 7 services on your local machine.

Connect to the cluster:

# Read-write (primary)
psql postgresql://railway:railway@localhost:5432/railway

# Read-only (replicas)
psql postgresql://railway:railway@localhost:5433/railway

View HAProxy stats:

open http://localhost:8404/stats

Troubleshooting

Cluster won't start

Check etcd is healthy:

curl http://etcd-1.railway.internal:2379/health

Check Patroni logs for PostgreSQL services:

Railway Dashboard → postgres-1 → Logs

Verify private networking is enabled in Railway project settings

Split-brain (multiple leaders)

This should never happen due to etcd quorum, but if it does:

Stop all PostgreSQL services
Clear etcd data: etcdctl del --prefix /service/
Restart services in order: etcd → postgres-1 → postgres-2 → postgres-3

High replication lag

Check lag from Patroni API:

curl http://postgres-2.railway.internal:8008/ | jq '.replication[0].lag'

If lag is high (>1GB):

Check network connectivity between nodes
Increase wal_keep_size in patroni configuration
Consider increasing PostgreSQL resources

HAProxy connection errors

Check HAProxy stats:

curl http://haproxy.railway.internal:8404/stats

Verify backends are healthy:

curl http://postgres-1.railway.internal:8008/health

Check HAProxy logs for backend failures

Cost Estimation

Resource allocation:

3 PostgreSQL: 2 vCPU, 2GB RAM each + 10GB volume
3 etcd: 0.5 vCPU, 512MB RAM each
1 HAProxy: 0.5 vCPU, 512MB RAM

Estimated cost (Railway Pro):

Compute: ~$60-120/month
Storage: ~$7.50/month (30GB)
Total: ~$70-130/month

Comparison:

AWS RDS Multi-AZ (db.t4g.small): ~$120/month
GCP Cloud SQL HA: ~$80/month

Backups

Railway Pro includes automatic volume snapshots:

Frequency: Daily
Retention: 6 days (daily), 27 days (weekly), 89 days (monthly)
Max 10 backups per volume

Manual backup:

pg_dump -h haproxy.railway.internal -U railway railway > backup.sql

Restore from snapshot:

Railway Dashboard → postgres-1 → Volumes → Snapshots
Click "Restore" on desired snapshot
Create new service from snapshot

Upgrading PostgreSQL

This template supports PostgreSQL 14, 15, 16, 17, and 18. To upgrade:

Create a logical backup:

pg_dumpall -h haproxy.railway.internal -U railway > cluster_backup.sql

Deploy new template with updated PostgreSQL version

Restore data:

psql -h new-haproxy.railway.internal -U railway < cluster_backup.sql

Update application DATABASE_URL to new cluster
Delete old cluster after verification

Security

All passwords are auto-generated and encrypted at rest
Private networking isolates cluster from public internet
SSL enabled by default for PostgreSQL connections
Recommend enabling Railway's 2FA for project access

Performance Tuning

PostgreSQL

Patroni dynamically generates PostgreSQL configuration. Override via environment variables:

postgresql:
  parameters:
    shared_buffers: 512MB        # 25% of RAM
    effective_cache_size: 2GB    # 50-75% of RAM
    max_connections: 300         # Increase for high concurrency
    work_mem: 4MB                # Per-query memory

HAProxy

Adjust via environment variables:

HAPROXY_MAX_CONN=2000           # More concurrent connections
HAPROXY_CHECK_INTERVAL=1s       # Faster failover detection
HAPROXY_TIMEOUT_CLIENT=60m      # Longer idle connections

Docker Images

Pre-built images are published to GitHub Container Registry:

# PostgreSQL + Patroni (multiple versions)
ghcr.io/railwayapp/postgres-ha/postgres-patroni:18
ghcr.io/railwayapp/postgres-ha/postgres-patroni:17
ghcr.io/railwayapp/postgres-ha/postgres-patroni:16
ghcr.io/railwayapp/postgres-ha/postgres-patroni:15
ghcr.io/railwayapp/postgres-ha/postgres-patroni:14

# etcd
ghcr.io/railwayapp/postgres-ha/etcd:3.5.16

# HAProxy
ghcr.io/railwayapp/postgres-ha/haproxy:3.2

Support

Issues: https://github.com/railwayapp/postgres-ha/issues
Documentation: https://docs.railway.app/
Community: https://discord.gg/railway

License

MIT License - See LICENSE file for details

Credits

Built with:

PostgreSQL - World's most advanced open source database
Patroni - Template for PostgreSQL HA with Python
etcd - Distributed reliable key-value store
HAProxy - Reliable, high-performance TCP/HTTP load balancer
Railway - Infrastructure, instantly

Name		Name	Last commit message	Last commit date
Latest commit History 325 Commits
.github/workflows		.github/workflows
common		common
etcd		etcd
haproxy		haproxy
postgres-patroni		postgres-patroni
test		test
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

PostgreSQL High Availability Template for Railway

Features

Architecture

Services Deployed

Point-in-time recovery (opt-in)

PITR restore creates a new service

Image-owned base backups

Per-cluster archive paths

Retention

Quick Start

Deploy to Railway

Connection Strings

Configuration

Environment Variables

Scaling HAProxy

Monitoring

Health Checks

Cluster Status

HAProxy Stats

Failover Behavior

Automatic Failover (Primary Crashes)

Automatic Recovery (Failed Node Returns)

Manual Switchover

Local Development

Troubleshooting

Cluster won't start

Split-brain (multiple leaders)

High replication lag

HAProxy connection errors

Cost Estimation

Backups

Upgrading PostgreSQL

Security

Performance Tuning

PostgreSQL

HAProxy

Docker Images

Support

License

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages