retention: archive stale results/*.txt on startup (D half of flood fix)#354
Open
retention: archive stale results/*.txt on startup (D half of flood fix)#354
Conversation
First half of the D+C DM-fallback redesign owner approved after the 2026-04-15 DM-flood post-mortem. This PR lands D (separate retention policy) as a standalone hygiene win. C (event-driven fs.watch wiring for the DM fallback) will stack on top as a follow-up PR. ### What this does - Adds `src/archive-stale-results.py` — a small one-shot script that walks `results/*.txt`, moves any file whose mtime is older than `$RETENTION_HOURS` (default 24) under `results/archive-YYYY-MM-DD/`. Files inside existing `archive-*` subdirectories are never touched. Honors `DRY_RUN=1` for safe preview. - Invokes it from `src/startup.sh` right after `mkdir -p tasks results data` — i.e. before any long-running service starts iterating `results/`. Any accumulated backlog from a prior quiet day gets swept out of the service iteration paths at boot time. ### Why this first, before any DM fallback rewiring Today's incident proved that `results/` accumulates long-dead files indefinitely: task-*, question-*, briefing-*, insight-*, friction-* from voice-originated work that was never spoken because voice wasn't connected at the time. Any code that scans `results/` for delivery — my reverted #352 task-bridge wiring, #349's discord-bridge polling loop, or any future design — is going to re-deliver the entire backlog on first tick after a cold boot unless the directory is actually kept small. D (retention) fixes the underlying hygiene concern independently of the DM-fallback question. Even if we never wire fallback again, the directory staying bounded is an unambiguous win. With D in place, C (the event-driven follow-up) can use `fs.watch` without needing a complex "pre-existing files at boot" escape hatch — D has already moved them out of the way. ### Retention window rationale Defaulted to **24 hours**. This is a judgment call: - Too short (e.g. 1h) — risks archiving same-day results the user might still want the voice agent to speak when they reconnect. - Too long (e.g. 7d) — bounds the accumulation less aggressively. - 24h feels right: anything older than one full day is effectively dead mail. Overridable per-run via `RETENTION_HOURS=N`. ### Verification - `bash -n src/startup.sh` clean - `ast.parse` clean on `archive-stale-results.py` - Dry run against fresh clean results/: "no stale files to archive" - Fabricated a 2-day-old test file, DRY_RUN=1 correctly flagged it, then real run correctly moved it to `results/archive-2026-04-15/`. - Does not affect `results/calls/` (subdirectory, skipped by `f.is_file()` check). ### What's not in this PR - **C (event-driven fs.watch wiring)** — stacks on this. Will replace the current `readdirSync(RESULT_DIR)` scan in task-bridge.ts with an fs.watch listener that only reacts to newly-created files. Separate PR, coming next. - **Full architectural redo of DM fallback delivery** — out of scope; the pause-on-DM-fallback rule from the post-mortem still holds. This PR is pure hygiene. ### References - Post-mortem: `notes/post-mortem-dm-flood-2026-04-15.md` - Feedback memory: `feedback_results_dir_is_graveyard.md` - Owner's D+C decision: Discord 23:20 local Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi
commented
Apr 15, 2026
Owner
Author
sonichi
left a comment
There was a problem hiding this comment.
MacBook review: LGTM. Clean retention sweep — archives stale results/*.txt on startup before services can iterate them. Configurable via RETENTION_HOURS env var, DRY_RUN support, date-stamped archive dirs. Good defensive fix for the DM flood class of bug.
liususan091219
added a commit
that referenced
this pull request
Apr 16, 2026
…#354 Each script reproduces the bug (before the fix) and verifies it's resolved (after the fix). All POCs pass on current main. - poc-pr353-open-file.sh (11/11) — 18s polling timeout in open_file - poc-pr355-subtitled-pending.sh (9/9) — false positive subtitled_pending - poc-pr332-team-tier-revert.sh (9/9) — team-tier -C /tmp broke codex - poc-pr325-bodhi-dep.sh (7/7) — bodhi dep pointed at deleted repo - poc-pr354-retention-sweep.sh — retention sweep for stale results Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This was referenced Apr 16, 2026
josuachavesolmos
approved these changes
Apr 16, 2026
josuachavesolmos
left a comment
There was a problem hiding this comment.
Reviewed by Sutando proactive loop.
APPROVE — core logic is sound. All downstream consumers verified safe (discord-bridge, telegram-bridge, daily-insight, agent-api — none recurse into subdirectories). Placement in startup.sh is correct (after mkdir, before services).
Two optional hardening suggestions:
- [MEDIUM] Guard against
RETENTION_HOURS=0(archives all in-flight results). Clamp withmax(1, RETENTION_HOURS). - [MEDIUM] Wrap
int()parse in try/except so invalid values fall back to 24h instead of silently skipping via|| true.
Neither is blocking — safe to merge as-is.
Collaborator
Reproduces the results/ unbounded growth bug and verifies the archival fix. Run: bash scripts/test-pr354-retention-sweep.sh Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First half of the D+C DM-fallback redesign you approved after the 2026-04-15 DM-flood post-mortem. Lands D (separate retention policy) as a standalone hygiene PR. C (event-driven
fs.watchwiring for the fallback delivery path) will stack on top as a follow-up.What this does
src/archive-stale-results.py(new, 85 lines) — one-shot script that walksresults/*.txt, moves any file whose mtime is older thanRETENTION_HOURS(default24) intoresults/archive-YYYY-MM-DD/. HonorsDRY_RUN=1for preview. Skips files already underarchive-*subdirs and the non-.txtcalls/subdirectory.src/startup.sh(+6 lines) — invokes the script right aftermkdir -p tasks results data, i.e. before any long-running service starts iteratingresults/. On fresh boots the backlog gets swept out of the delivery path at startup.Why this first, independent of any DM-fallback rewiring
Today's incident proved
results/accumulates long-dead files indefinitely (task-/question-/briefing-/insight-/friction- from voice work that was never spoken because voice wasn't connected). Any future code that scansresults/for delivery — reverted #352, #349's polling loop, or anything we do next — will flood again on first tick unless the directory is actually kept small. This PR makes the directory small independent of the fallback question. Even if we never wire fallback again, boundedresults/is an unambiguous win.Retention window = 24h
Judgment call, overridable per-run via
RETENTION_HOURS=N:Test plan
bash -n src/startup.shcleanast.parseclean onarchive-stale-results.pyDRY_RUN=1) against fresh cleanresults/— "no stale files to archive"results/task-test-old.txt, dry-run correctly flagged it, real run moved it toresults/archive-2026-04-15/results/calls/subdirectory is skipped (f.is_file()check)Not in this PR
🤖 Generated with Claude Code
Fixes #362