Skip to content

pg0 start deletes data directory on stale postmaster.pid (data loss after unclean shutdown) #6

@rustamk

Description

@rustamk

Bug Description

pg0 start deletes the entire data directory when it detects a stale postmaster.pid lock file, causing complete data loss. This happens after any unclean shutdown (VM reboot, power loss, kill -9, OOM kill).

Expected Behavior

PostgreSQL itself handles stale postmaster.pid files gracefully — it removes the lock file and starts up using the existing data directory. pg0 start should do the same: remove the stale lock and recover, not nuke all data.

Actual Behavior

  1. VM reboots (or postgres killed without clean shutdown)
  2. postmaster.pid left in data directory (stale lock)
  3. pg0 start detects stale lock → logs "Stale lock file detected; removing file to attempt process recovery"
  4. pg0 deletes the entire ~/.pg0/instances/<name>/data/ directory
  5. Runs initdb to create a fresh empty database
  6. All data is permanently lost

Evidence

Binary strings confirm the logic path:

"Stale lock file detected; removing file to attempt process recovery:"
"Deleting data directory: "

Impact

  • Data loss is silent — no warning, no confirmation, no backup
  • README states "Persistent data — survives restarts, stored in ~/.pg0/" which is misleading
  • Any production or development use with embedded PG is at risk of total data loss on crash/reboot
  • We lost ~13,000 knowledge graph nodes and 752 documents from a Hindsight memory database

Environment

  • pg0 version: 0.12.0
  • OS: Linux 6.17.0-1009-gcp (x64), GCE VM
  • PostgreSQL: 18.1.0 (bundled)
  • Instance name: hindsight-embed-openclaw

Suggested Fix

When a stale postmaster.pid is detected:

  1. Remove only the postmaster.pid file
  2. Attempt to start PostgreSQL with the existing data directory
  3. Only if the cluster is actually corrupt (e.g., WAL recovery fails), offer to reinitialize — with an explicit --force flag, never silently

Workaround

We added a systemd shutdown hook that cleanly stops pg0 before VM shutdown to prevent stale PIDs. Also running periodic pg_dump backups.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions