Skip to content

SQLite log WAL grows rapidly from self-watched CODEX_HOME inotify events #11

@th317erd

Description

@th317erd

This is a sanitized report for a severe SQLite WAL growth issue observed while running the codext fork. A related upstream Codex issue is open at openai#28997, but the newest evidence was collected from codext/vendored-codex processes, so this may be fork-specific or fork-amplified.

Additional sanitized findings from a deeper investigation:

The strongest evidence now points to a self-watch / self-log loop around $CODEX_HOME.

Likely loop

  1. Codex writes a diagnostic row to logs_2.sqlite.
  2. SQLite updates logs_2.sqlite-wal.
  3. A Codex inotify watcher sees logs_2.sqlite-wal modified.
  4. Codex records a TRACE row for that inotify event into the same logs_2.sqlite.
  5. That write modifies logs_2.sqlite-wal again.
  6. Repeat.

Important environment note

The newest evidence was collected while the affected session was being run through codext, which vendors/runs a codex binary. The process model involved a codext app-server plus remote codext sessions, each launching vendored codex processes. This may be fork-specific, fork-amplified, or caused by an interaction between upstream Codex file watching/logging code and codexts app-server/remote process model.

Observed state

Sanitized paths:

CODEX_HOME=/home/<user>/.codex-work
SQLite DB=/home/<user>/.codex-work/logs_2.sqlite
SQLite WAL=/home/<user>/.codex-work/logs_2.sqlite-wal

The current post-cleanup WAL still grew back to roughly 12 GB:

11,884,873,392  /home/<user>/.codex-work/logs_2.sqlite-wal
115,650,560     /home/<user>/.codex-work/logs_2.sqlite
23,101,440      /home/<user>/.codex-work/logs_2.sqlite-shm

Earlier growth samples during active sessions showed severe WAL growth:

2,492,175,672 -> 2,842,709,392 bytes in 10 seconds
5,953,070,432 -> 6,316,977,672 bytes in 10 seconds
8,921,312,072 -> 9,101,376,672 bytes in 5 seconds

This is after a previous stale/closed WAL reached 219 GB and filled the /home filesystem.

SQLite log-table evidence

Counts from the affected logs_2.sqlite:

total_rows=47,565
TRACE=42,794
INFO=3,533
DEBUG=864
WARN=356
ERROR=18
inotify_rows=39,532
logs_2.sqlite-wal mentions=28,727

Top repeated inotify messages:

28,699  inotify event: Event { wd: WatchDescriptor { id: 1, fd: (Weak) }, mask: EventMask(MODIFY), cookie: 0, name: Some("logs_2.sqlite-wal") }
 8,049  inotify event: Event { wd: WatchDescriptor { id: 1, fd: (Weak) }, mask: EventMask(MODIFY), cookie: 0, name: Some("logs_2.sqlite") }
   147  inotify event: Event { wd: WatchDescriptor { id: 1, fd: (Weak) }, mask: EventMask(MODIFY), cookie: 0, name: Some("state_5.sqlite-wal") }

There are also unrelated file-open watcher events for system files such as ld.so.cache, locale.alias, and passwd, which appear to come from a separate /etc watch. The disk-filling loop is the one involving logs_2.sqlite*.

Kernel inotify evidence

The active affected process had an inotify file descriptor with:

FD 30 anon_inode:inotify
  inotify wd:1 ino:c6860b ...

The inode maps to $CODEX_HOME:

hex c6860b == decimal 13010443
13010443 /home/<user>/.codex-work

The repeated SQLite log rows also use WatchDescriptor { id: 1, ... }, and the row names are logs_2.sqlite-wal / logs_2.sqlite. That ties the logged file events directly to a watcher rooted at $CODEX_HOME, not just to a random project directory.

Open file holders

The affected codext/vendored codex process held open handles to:

/home/<user>/.codex-work/logs_2.sqlite
/home/<user>/.codex-work/logs_2.sqlite-wal
/home/<user>/.codex-work/logs_2.sqlite-shm

Other active codext app-server/remote processes held handles to a separate profile's logs_2.sqlite-wal.

Process model clue

The process tree included:

node .../bin/codext ... app-server --listen ws://127.0.0.1:<port>
.../codex ... app-server --listen ws://127.0.0.1:<port>
node .../bin/codext ... --remote ws://127.0.0.1:<port> -C /home/<user>/Projects/<repo>
.../codex ... --remote ws://127.0.0.1:<port> -C /home/<user>/Projects/<repo>
node .../bin/codext ... --sandbox danger-full-access --ask-for-approval never
.../codex ... --sandbox danger-full-access --ask-for-approval never

The exact project names have been omitted intentionally. Non-Codex dev-server processes running in one project did not hold handles to logs_2.sqlite*; only Codex/codext processes did.

Trust/root clue

The current Git repo root resolved correctly to a project directory under /home/<user>/Projects/<repo>, but the UI reportedly displayed a trust warning for /home/<user>, not the project root. If a broad home directory becomes a trusted/watch surface, it can include $CODEX_HOME and therefore Codex's own SQLite state.

Even without that clue, the kernel inotify evidence above shows a watcher rooted at /home/<user>/.codex-work.

Relevant source areas

These upstream files look relevant:

codex-rs/file-watcher/src/lib.rs
codex-rs/app-server/src/fs_watch.rs
codex-rs/app-server/src/skills_watcher.rs
codex-rs/tui/src/onboarding/onboarding_screen.rs

Specifically:

  • file-watcher can watch requested paths and can fall back to the nearest existing ancestor.
  • app-server exposes fs/watch.
  • skills roots can be watched recursively.
  • trust onboarding falls back to cwd if no Git root is resolved.

Expected behavior

Codex should not log file watcher events for its own SQLite diagnostic/state files into the same SQLite log sink.

At minimum, file watcher trace logging should suppress:

logs_2.sqlite
logs_2.sqlite-wal
logs_2.sqlite-shm
state_5.sqlite
state_5.sqlite-wal
state_5.sqlite-shm
goals_1.sqlite*
memories_1.sqlite*

Better fixes:

  1. Never watch $CODEX_HOME as a filesystem watch root unless explicitly required.
  2. Never log watcher events for Codex's own SQLite files into the SQLite log sink.
  3. Respect log-level filtering before inserting TRACE rows into logs_2.sqlite.
  4. Add WAL size limits, checkpointing, rotation, or emergency safeguards so diagnostic logs cannot consume hundreds of GB.
  5. If a broad home directory trust target is selected, exclude Codex state directories from any watch surfaces.

Local mitigation being tested

The local mitigation is to move SQLite-backed runtime state outside the watched CODEX_HOME path:

sqlite_home = "/home/<user>/.local/share/codex-sqlite/work"

This was added after the issue was observed. Already-running Codex/codext processes can continue holding the old logs_2.sqlite* files until restarted.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions