discord-bridge: wire dm-result.py into result-poll flow as DM fallback#349
discord-bridge: wire dm-result.py into result-poll flow as DM fallback#349
Conversation
Follow-up to PR #347. PR #347 shipped `src/dm-result.py` as a standalone CLI but didn't wire it into any polling loop, so voice-originated and cron-originated results that no one handles would still be silently dropped when the voice client is offline. Adds `poll_dm_fallback()` — a 4th asyncio task registered in `on_ready` alongside `poll_results`, `poll_approved`, and `poll_proactive`. Scans `results/` every 30s for task/question/briefing/insight/friction files that are: - not in `pending_replies` (Discord-originated, already handled) - older than 90s (grace period so voice-agent and telegram-bridge get first dibs) and subprocess-calls `python3 src/dm-result.py --file <f>`. Delegating to the CLI keeps the voiceConnected check + Discord-DM-send logic in one place (the script MacBook shipped in #347). If dm-result.py prints "voice client connected, skipping DM" the file is left on disk for the voice agent to speak when a client reconnects. On any other non-zero exit, the error is logged and the file is also left on disk for the next retry cycle. Also: drop unused `import subprocess` from `src/dm-result.py` — flagged in my cross-review on #347 (comment 4255378452). Harmless but noted. ### Known limitation (not introduced by this PR) Tested end-to-end on Mac Mini: the subprocess call reaches Discord but returns HTTP 403. dm-result.py hardcodes `DM_CHANNEL = "1485370959870431433"` which corresponds to MacBook's bot's DM channel with the owner; the Mac Mini bot doesn't have permission to POST to that channel. Flagged in my original #347 review — the fix is either a per-node config file or having the bot open its own DM channel on demand. Out of scope for this PR; the wiring is correct and will work once the DM channel is per-node. ### Verification - `ast.parse` clean on both files - Subprocess smoke test: loop correctly subprocesses out, reads rc, logs stderr when dm-result.py fails. Fake stale file cleaned up after. - No changes to existing `poll_results` / `poll_proactive` behavior — race-free because `poll_dm_fallback` excludes `pending_replies` task IDs and only touches files ≥90s old. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses the hardcode owner flagged after PR #349 opened. Previously `DM_CHANNEL = "1485370959870431433"` was baked into dm-result.py. That channel belonged to a specific bot's DM with the owner, so subprocess calls from a different node hit HTTP 403. Confirmed end-to-end on Mac Mini during smoke-testing of PR #349. Now the DM channel is resolved on demand: 1. `_resolve_owner_id()` picks the human owner by: a. `$SUTANDO_DM_OWNER_ID` env var (explicit override, skips API calls) b. First non-bot entry in `~/.claude/channels/discord/access.json` allowFrom, decided by a GET /users/{id} `bot` flag. allowFrom typically contains multiple bot accounts (MacBook bot, Mac Mini bot) plus the human owner — without the is-bot check we'd DM the wrong bot. 2. `_open_dm_channel()` calls `POST /users/@me/channels` with the resolved owner's user ID. Discord docs are explicit that this endpoint is idempotent — returns the existing DM channel if one exists, otherwise creates one. Cheap to call per invocation. 3. `_discord_api()` extracted as a small wrapper so the GET /users/@me, POST /users/@me/channels, POST /channels/{id}/messages paths all share a single urllib+auth call site. Side effect: send_dm() is ~40% shorter and easier to follow. ### Smoke test results Against live Discord API on Mac Mini: - `_resolve_owner_id` correctly returns `1022910063620390932` (sonichi) by skipping `1485364006297534584` (MacBook bot) via is-bot lookup. - `_open_dm_channel` returns `1490906927675474030` — a live DM channel, completely different from the old hardcoded value. - Actual send deferred to avoid spamming owner during testing. The POST path is standard Discord API and is the same byte sequence as before the refactor; only the channel ID input changed. ### Python 3.9 note The `dict | None = None` union-syntax type hint crashed at module import on the system Python 3.9 (`TypeError: unsupported operand | for type`). Switched to untyped function signatures for the helpers — docstrings carry the same information. Caught in the first smoke-test run. ### Bot ID also no longer needs to be known The previous draft of this change used GET /users/@me to discover the current bot's user ID and skip it in allowFrom. Replaced with per-user is-bot lookup, which correctly skips all bots (not just the current one) in multi-bot allowFrom lists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Superseded by #350 (MacBook's parallel wiring). MacBook shipped #350 wiring the same feature into
Per One piece worth preserving from this PRMy MacBook's #350 instead reads Follow-up plan after #350 merges: open a small PR that adds the API-discovery path as a fallback when Owner: say the word and I'll close this PR. |
…sk-bridge Follow-up to #349. Owner asked for the cleaner wiring architecture after #349 merged, because the polling-loop approach has an inherent 30s–120s latency between result-write and DM delivery (30s poll interval + 90s grace window). Net effect: −33 lines. One wiring point instead of two. ### What moves - **Removed** from `src/discord-bridge.py`: - `async def poll_dm_fallback()` (59 lines) — the polling loop that scanned `results/` every 30s and subprocess-called `dm-result.py` for stale files not in `pending_replies`. - `client.loop.create_task(poll_dm_fallback())` registration in `on_ready`. - `import subprocess` (unused after the loop is gone). - **Added** to `src/task-bridge.ts`: - `import { execSync } from 'node:child_process'`. - Inside `startResultWatcher()`'s main tick, when `!isClientConnected()` — iterate the same `files` the connected path iterates, `execSync` `python3 src/dm-result.py --file <f>` per unprocessed file, then run the same bookkeeping the connected path runs (`_sendTaskStatus`, `_deliveredResults.add`, `_pendingTasks.delete`, `/task-done` POST, 10s unlink). ### Why this is better 1. **Latency.** The poll loop had a cumulative 30s+90s worst case. task-bridge.ts runs on the same interval as the result watcher (2s tick per the file-loop's natural cadence), so the DM arrives within a single tick of the result being written. No grace period needed — task-bridge already tracks `_deliveredResults` so it can't double-send. 2. **Right layer.** task-bridge.ts is the single owner of the "voice-agent delivery" contract. Fallback is a property of that delivery contract, so it belongs next to the normal delivery path — not in a sibling process (discord-bridge.py) polling the same directory. 3. **No redundant state.** The polling loop needed `pending_replies` awareness, a file-age check, and a grace window. task-bridge.ts already has `_deliveredResults` tracking and `isClientConnected()` state — nothing to reinvent. 4. **Subprocess parity preserved.** Both the old polling loop and this path shell out to `python3 src/dm-result.py --file <f>`. `dm-result.py`'s API-based owner + DM-channel resolution from #349 is unchanged and carries over directly. ### What this PR does NOT change - `src/dm-result.py` — untouched. All the `_resolve_owner_id`, `_open_dm_channel`, and `_discord_api` helpers from #349 stay as they are. The wiring PR just calls it differently. - `src/discord-bridge.py` `poll_proactive()` — still handles `results/proactive-*.txt` with its own direct-Discord-client DM path. That's a different contract (explicitly write-proactive-to-DM) and doesn't need the voiceConnected check. ### Verification - `python3 -c "import ast; ast.parse(...)"` clean on discord-bridge.py - `npx tsc --noEmit --skipLibCheck` clean on task-bridge.ts - Net diff: −33 lines across 2 files - dm-result.py smoke test from #349 still valid — this PR doesn't touch the script Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…sk-bridge (#352) Follow-up to #349. Owner asked for the cleaner wiring architecture after #349 merged, because the polling-loop approach has an inherent 30s–120s latency between result-write and DM delivery (30s poll interval + 90s grace window). Net effect: −33 lines. One wiring point instead of two. ### What moves - **Removed** from `src/discord-bridge.py`: - `async def poll_dm_fallback()` (59 lines) — the polling loop that scanned `results/` every 30s and subprocess-called `dm-result.py` for stale files not in `pending_replies`. - `client.loop.create_task(poll_dm_fallback())` registration in `on_ready`. - `import subprocess` (unused after the loop is gone). - **Added** to `src/task-bridge.ts`: - `import { execSync } from 'node:child_process'`. - Inside `startResultWatcher()`'s main tick, when `!isClientConnected()` — iterate the same `files` the connected path iterates, `execSync` `python3 src/dm-result.py --file <f>` per unprocessed file, then run the same bookkeeping the connected path runs (`_sendTaskStatus`, `_deliveredResults.add`, `_pendingTasks.delete`, `/task-done` POST, 10s unlink). ### Why this is better 1. **Latency.** The poll loop had a cumulative 30s+90s worst case. task-bridge.ts runs on the same interval as the result watcher (2s tick per the file-loop's natural cadence), so the DM arrives within a single tick of the result being written. No grace period needed — task-bridge already tracks `_deliveredResults` so it can't double-send. 2. **Right layer.** task-bridge.ts is the single owner of the "voice-agent delivery" contract. Fallback is a property of that delivery contract, so it belongs next to the normal delivery path — not in a sibling process (discord-bridge.py) polling the same directory. 3. **No redundant state.** The polling loop needed `pending_replies` awareness, a file-age check, and a grace window. task-bridge.ts already has `_deliveredResults` tracking and `isClientConnected()` state — nothing to reinvent. 4. **Subprocess parity preserved.** Both the old polling loop and this path shell out to `python3 src/dm-result.py --file <f>`. `dm-result.py`'s API-based owner + DM-channel resolution from #349 is unchanged and carries over directly. ### What this PR does NOT change - `src/dm-result.py` — untouched. All the `_resolve_owner_id`, `_open_dm_channel`, and `_discord_api` helpers from #349 stay as they are. The wiring PR just calls it differently. - `src/discord-bridge.py` `poll_proactive()` — still handles `results/proactive-*.txt` with its own direct-Discord-client DM path. That's a different contract (explicitly write-proactive-to-DM) and doesn't need the voiceConnected check. ### Verification - `python3 -c "import ast; ast.parse(...)"` clean on discord-bridge.py - `npx tsc --noEmit --skipLibCheck` clean on task-bridge.ts - Net diff: −33 lines across 2 files - dm-result.py smoke test from #349 still valid — this PR doesn't touch the script Co-authored-by: Chi <wangchi@Chis-Mac-mini.local> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
First half of the D+C DM-fallback redesign owner approved after the 2026-04-15 DM-flood post-mortem. This PR lands D (separate retention policy) as a standalone hygiene win. C (event-driven fs.watch wiring for the DM fallback) will stack on top as a follow-up PR. ### What this does - Adds `src/archive-stale-results.py` — a small one-shot script that walks `results/*.txt`, moves any file whose mtime is older than `$RETENTION_HOURS` (default 24) under `results/archive-YYYY-MM-DD/`. Files inside existing `archive-*` subdirectories are never touched. Honors `DRY_RUN=1` for safe preview. - Invokes it from `src/startup.sh` right after `mkdir -p tasks results data` — i.e. before any long-running service starts iterating `results/`. Any accumulated backlog from a prior quiet day gets swept out of the service iteration paths at boot time. ### Why this first, before any DM fallback rewiring Today's incident proved that `results/` accumulates long-dead files indefinitely: task-*, question-*, briefing-*, insight-*, friction-* from voice-originated work that was never spoken because voice wasn't connected at the time. Any code that scans `results/` for delivery — my reverted #352 task-bridge wiring, #349's discord-bridge polling loop, or any future design — is going to re-deliver the entire backlog on first tick after a cold boot unless the directory is actually kept small. D (retention) fixes the underlying hygiene concern independently of the DM-fallback question. Even if we never wire fallback again, the directory staying bounded is an unambiguous win. With D in place, C (the event-driven follow-up) can use `fs.watch` without needing a complex "pre-existing files at boot" escape hatch — D has already moved them out of the way. ### Retention window rationale Defaulted to **24 hours**. This is a judgment call: - Too short (e.g. 1h) — risks archiving same-day results the user might still want the voice agent to speak when they reconnect. - Too long (e.g. 7d) — bounds the accumulation less aggressively. - 24h feels right: anything older than one full day is effectively dead mail. Overridable per-run via `RETENTION_HOURS=N`. ### Verification - `bash -n src/startup.sh` clean - `ast.parse` clean on `archive-stale-results.py` - Dry run against fresh clean results/: "no stale files to archive" - Fabricated a 2-day-old test file, DRY_RUN=1 correctly flagged it, then real run correctly moved it to `results/archive-2026-04-15/`. - Does not affect `results/calls/` (subdirectory, skipped by `f.is_file()` check). ### What's not in this PR - **C (event-driven fs.watch wiring)** — stacks on this. Will replace the current `readdirSync(RESULT_DIR)` scan in task-bridge.ts with an fs.watch listener that only reacts to newly-created files. Separate PR, coming next. - **Full architectural redo of DM fallback delivery** — out of scope; the pause-on-DM-fallback rule from the post-mortem still holds. This PR is pure hygiene. ### References - Post-mortem: `notes/post-mortem-dm-flood-2026-04-15.md` - Feedback memory: `feedback_results_dir_is_graveyard.md` - Owner's D+C decision: Discord 23:20 local Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Follow-up to #347. The
src/dm-result.pyCLI shipped as a standalone tool with no caller. This PR wires it into the Discord bridge's polling infrastructure so voice-originated and cron-originated results are actually delivered as a DM fallback.Adds
poll_dm_fallback()— a 4th asyncio task inon_readyalongsidepoll_results/poll_approved/poll_proactive. Every 30s, scansresults/for task/question/briefing/insight/friction files that are:pending_replies(Discord-originated, already handled bypoll_results)and subprocess-calls
python3 src/dm-result.py --file <f>. Delegating to the CLI keeps the voiceConnected check + Discord-DM logic in one place (the script from #347).If dm-result.py prints
"voice client connected, skipping DM", the file is left on disk for the voice agent to pick up. On any other non-zero exit, the error is logged and the file also stays for the next retry cycle.Also drops an unused
import subprocessfromsrc/dm-result.py— spotted in my review comment on #347.Test plan
ast.parseclean on both filesresults/task-0000000000000.txt, ran the same subprocess call the loop makes. dm-result.py correctly detects voice disconnected and attempts delivery. Return code + stdout handled correctly.poll_results/poll_proactivebehavior. Race-free becausepoll_dm_fallbackexcludespending_repliestask IDs AND only touches files ≥90s old.Known limitation (pre-existing, not blocking this PR)
Tested on Mac Mini end-to-end: subprocess call reaches Discord but returns HTTP 403. The hardcoded
DM_CHANNEL = "1485370959870431433"indm-result.pycorresponds to a channel the Mac Mini bot doesn't have permission to POST to — it's the DM channel the MacBook bot opened with the owner. Flagged in my original #347 cross-review (#347 (comment)):This PR is correct independent of that — it wires the caller. The 403 is fixable separately by either per-node config or having each bot open its own DM channel on demand. The wiring loop already handles the error gracefully (logs it, leaves the file on disk for retry).
File changes
src/discord-bridge.py— addpoll_dm_fallback()+ register inon_ready+ addimport subprocess(+63 lines)src/dm-result.py— drop unusedimport subprocess(−1 line)Total: +64/−1 across 2 files.
🤖 Generated with Claude Code