Skip to content

discord-bridge: wire dm-result.py into result-poll flow as DM fallback#349

Merged
sonichi merged 2 commits intomainfrom
feat/wire-dm-fallback
Apr 15, 2026
Merged

discord-bridge: wire dm-result.py into result-poll flow as DM fallback#349
sonichi merged 2 commits intomainfrom
feat/wire-dm-fallback

Conversation

@sonichi
Copy link
Copy Markdown
Owner

@sonichi sonichi commented Apr 15, 2026

Summary

Follow-up to #347. The src/dm-result.py CLI shipped as a standalone tool with no caller. This PR wires it into the Discord bridge's polling infrastructure so voice-originated and cron-originated results are actually delivered as a DM fallback.

Adds poll_dm_fallback() — a 4th asyncio task in on_ready alongside poll_results / poll_approved / poll_proactive. Every 30s, scans results/ for task/question/briefing/insight/friction files that are:

  • Not in pending_replies (Discord-originated, already handled by poll_results)
  • Older than 90 seconds (grace period so voice-agent and telegram-bridge get first dibs)

and subprocess-calls python3 src/dm-result.py --file <f>. Delegating to the CLI keeps the voiceConnected check + Discord-DM logic in one place (the script from #347).

If dm-result.py prints "voice client connected, skipping DM", the file is left on disk for the voice agent to pick up. On any other non-zero exit, the error is logged and the file also stays for the next retry cycle.

Also drops an unused import subprocess from src/dm-result.py — spotted in my review comment on #347.

Test plan

  • ast.parse clean on both files
  • Subprocess smoke test: wrote a fake stale results/task-0000000000000.txt, ran the same subprocess call the loop makes. dm-result.py correctly detects voice disconnected and attempts delivery. Return code + stdout handled correctly.
  • No changes to poll_results / poll_proactive behavior. Race-free because poll_dm_fallback excludes pending_replies task IDs AND only touches files ≥90s old.

Known limitation (pre-existing, not blocking this PR)

Tested on Mac Mini end-to-end: subprocess call reaches Discord but returns HTTP 403. The hardcoded DM_CHANNEL = "1485370959870431433" in dm-result.py corresponds to a channel the Mac Mini bot doesn't have permission to POST to — it's the DM channel the MacBook bot opened with the owner. Flagged in my original #347 cross-review (#347 (comment)):

Hardcoded DM channel ID … fine for single-owner use but doesn't survive if the owner changes Discord accounts or if we ever want multi-owner setups.

This PR is correct independent of that — it wires the caller. The 403 is fixable separately by either per-node config or having each bot open its own DM channel on demand. The wiring loop already handles the error gracefully (logs it, leaves the file on disk for retry).

File changes

  • src/discord-bridge.py — add poll_dm_fallback() + register in on_ready + add import subprocess (+63 lines)
  • src/dm-result.py — drop unused import subprocess (−1 line)

Total: +64/−1 across 2 files.

🤖 Generated with Claude Code

Chi and others added 2 commits April 15, 2026 14:03
Follow-up to PR #347. PR #347 shipped `src/dm-result.py` as a standalone
CLI but didn't wire it into any polling loop, so voice-originated and
cron-originated results that no one handles would still be silently
dropped when the voice client is offline.

Adds `poll_dm_fallback()` — a 4th asyncio task registered in `on_ready`
alongside `poll_results`, `poll_approved`, and `poll_proactive`. Scans
`results/` every 30s for task/question/briefing/insight/friction files
that are:

  - not in `pending_replies` (Discord-originated, already handled)
  - older than 90s (grace period so voice-agent and telegram-bridge get
    first dibs)

and subprocess-calls `python3 src/dm-result.py --file <f>`. Delegating
to the CLI keeps the voiceConnected check + Discord-DM-send logic in one
place (the script MacBook shipped in #347). If dm-result.py prints
"voice client connected, skipping DM" the file is left on disk for the
voice agent to speak when a client reconnects. On any other non-zero
exit, the error is logged and the file is also left on disk for the
next retry cycle.

Also: drop unused `import subprocess` from `src/dm-result.py` — flagged
in my cross-review on #347 (comment 4255378452). Harmless but noted.

### Known limitation (not introduced by this PR)
Tested end-to-end on Mac Mini: the subprocess call reaches Discord but
returns HTTP 403. dm-result.py hardcodes `DM_CHANNEL = "1485370959870431433"`
which corresponds to MacBook's bot's DM channel with the owner; the
Mac Mini bot doesn't have permission to POST to that channel. Flagged
in my original #347 review — the fix is either a per-node config file
or having the bot open its own DM channel on demand. Out of scope for
this PR; the wiring is correct and will work once the DM channel is
per-node.

### Verification
- `ast.parse` clean on both files
- Subprocess smoke test: loop correctly subprocesses out, reads rc, logs
  stderr when dm-result.py fails. Fake stale file cleaned up after.
- No changes to existing `poll_results` / `poll_proactive` behavior —
  race-free because `poll_dm_fallback` excludes `pending_replies` task
  IDs and only touches files ≥90s old.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Addresses the hardcode owner flagged after PR #349 opened.

Previously `DM_CHANNEL = "1485370959870431433"` was baked into dm-result.py.
That channel belonged to a specific bot's DM with the owner, so
subprocess calls from a different node hit HTTP 403. Confirmed
end-to-end on Mac Mini during smoke-testing of PR #349.

Now the DM channel is resolved on demand:

1. `_resolve_owner_id()` picks the human owner by:
   a. `$SUTANDO_DM_OWNER_ID` env var (explicit override, skips API calls)
   b. First non-bot entry in `~/.claude/channels/discord/access.json`
      allowFrom, decided by a GET /users/{id} `bot` flag. allowFrom
      typically contains multiple bot accounts (MacBook bot, Mac Mini
      bot) plus the human owner — without the is-bot check we'd DM the
      wrong bot.

2. `_open_dm_channel()` calls `POST /users/@me/channels` with the
   resolved owner's user ID. Discord docs are explicit that this
   endpoint is idempotent — returns the existing DM channel if one
   exists, otherwise creates one. Cheap to call per invocation.

3. `_discord_api()` extracted as a small wrapper so the GET /users/@me,
   POST /users/@me/channels, POST /channels/{id}/messages paths all share
   a single urllib+auth call site. Side effect: send_dm() is ~40% shorter
   and easier to follow.

### Smoke test results

Against live Discord API on Mac Mini:
- `_resolve_owner_id` correctly returns `1022910063620390932` (sonichi)
  by skipping `1485364006297534584` (MacBook bot) via is-bot lookup.
- `_open_dm_channel` returns `1490906927675474030` — a live DM channel,
  completely different from the old hardcoded value.
- Actual send deferred to avoid spamming owner during testing. The POST
  path is standard Discord API and is the same byte sequence as before
  the refactor; only the channel ID input changed.

### Python 3.9 note

The `dict | None = None` union-syntax type hint crashed at module import
on the system Python 3.9 (`TypeError: unsupported operand | for type`).
Switched to untyped function signatures for the helpers — docstrings
carry the same information. Caught in the first smoke-test run.

### Bot ID also no longer needs to be known

The previous draft of this change used GET /users/@me to discover the
current bot's user ID and skip it in allowFrom. Replaced with per-user
is-bot lookup, which correctly skips all bots (not just the current
one) in multi-bot allowFrom lists.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@sonichi
Copy link
Copy Markdown
Owner Author

sonichi commented Apr 15, 2026

Superseded by #350 (MacBook's parallel wiring).

MacBook shipped #350 wiring the same feature into src/task-bridge.ts startResultWatcher() instead of discord-bridge.py. That's the better layer:

  • task-bridge already watches results/ and already has isClientConnected(), so there's no polling loop or 90s grace window. Responds immediately to state changes.
  • Less surface to review: +47/-7 vs my +178.

Per feedback_two_bot_collaboration.md ("take the better version and close the other"), recommending #349 close in favor of #350. Posted the same recommendation on #350 itself with honest comparison.

One piece worth preserving from this PR

My dm-result.py refactor resolves the owner user ID and DM channel via the Discord API (walks access.json allowFrom, queries /users/{id} for bot flag, opens DM channel via /users/@me/channels). That's zero-config — no per-node .env edits required.

MacBook's #350 instead reads DISCORD_DM_CHANNEL from .env. Simpler but requires per-node config.

Follow-up plan after #350 merges: open a small PR that adds the API-discovery path as a fallback when DISCORD_DM_CHANNEL is unset. Best of both.

Owner: say the word and I'll close this PR.

@sonichi sonichi merged commit c82d864 into main Apr 15, 2026
1 check passed
@sonichi sonichi deleted the feat/wire-dm-fallback branch April 15, 2026 21:14
sonichi pushed a commit that referenced this pull request Apr 15, 2026
…sk-bridge

Follow-up to #349. Owner asked for the cleaner wiring architecture
after #349 merged, because the polling-loop approach has an inherent
30s–120s latency between result-write and DM delivery (30s poll
interval + 90s grace window).

Net effect: −33 lines. One wiring point instead of two.

### What moves

- **Removed** from `src/discord-bridge.py`:
  - `async def poll_dm_fallback()` (59 lines) — the polling loop
    that scanned `results/` every 30s and subprocess-called
    `dm-result.py` for stale files not in `pending_replies`.
  - `client.loop.create_task(poll_dm_fallback())` registration in
    `on_ready`.
  - `import subprocess` (unused after the loop is gone).

- **Added** to `src/task-bridge.ts`:
  - `import { execSync } from 'node:child_process'`.
  - Inside `startResultWatcher()`'s main tick, when
    `!isClientConnected()` — iterate the same `files` the connected
    path iterates, `execSync` `python3 src/dm-result.py --file <f>`
    per unprocessed file, then run the same bookkeeping the
    connected path runs (`_sendTaskStatus`, `_deliveredResults.add`,
    `_pendingTasks.delete`, `/task-done` POST, 10s unlink).

### Why this is better

1. **Latency.** The poll loop had a cumulative 30s+90s worst case.
   task-bridge.ts runs on the same interval as the result watcher
   (2s tick per the file-loop's natural cadence), so the DM arrives
   within a single tick of the result being written. No grace period
   needed — task-bridge already tracks `_deliveredResults` so it
   can't double-send.

2. **Right layer.** task-bridge.ts is the single owner of the
   "voice-agent delivery" contract. Fallback is a property of that
   delivery contract, so it belongs next to the normal delivery path
   — not in a sibling process (discord-bridge.py) polling the same
   directory.

3. **No redundant state.** The polling loop needed `pending_replies`
   awareness, a file-age check, and a grace window. task-bridge.ts
   already has `_deliveredResults` tracking and `isClientConnected()`
   state — nothing to reinvent.

4. **Subprocess parity preserved.** Both the old polling loop and
   this path shell out to `python3 src/dm-result.py --file <f>`.
   `dm-result.py`'s API-based owner + DM-channel resolution from
   #349 is unchanged and carries over directly.

### What this PR does NOT change

- `src/dm-result.py` — untouched. All the `_resolve_owner_id`,
  `_open_dm_channel`, and `_discord_api` helpers from #349 stay as
  they are. The wiring PR just calls it differently.
- `src/discord-bridge.py` `poll_proactive()` — still handles
  `results/proactive-*.txt` with its own direct-Discord-client DM
  path. That's a different contract (explicitly write-proactive-to-DM)
  and doesn't need the voiceConnected check.

### Verification

- `python3 -c "import ast; ast.parse(...)"` clean on discord-bridge.py
- `npx tsc --noEmit --skipLibCheck` clean on task-bridge.ts
- Net diff: −33 lines across 2 files
- dm-result.py smoke test from #349 still valid — this PR doesn't
  touch the script

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi added a commit that referenced this pull request Apr 15, 2026
…sk-bridge (#352)

Follow-up to #349. Owner asked for the cleaner wiring architecture
after #349 merged, because the polling-loop approach has an inherent
30s–120s latency between result-write and DM delivery (30s poll
interval + 90s grace window).

Net effect: −33 lines. One wiring point instead of two.

### What moves

- **Removed** from `src/discord-bridge.py`:
  - `async def poll_dm_fallback()` (59 lines) — the polling loop
    that scanned `results/` every 30s and subprocess-called
    `dm-result.py` for stale files not in `pending_replies`.
  - `client.loop.create_task(poll_dm_fallback())` registration in
    `on_ready`.
  - `import subprocess` (unused after the loop is gone).

- **Added** to `src/task-bridge.ts`:
  - `import { execSync } from 'node:child_process'`.
  - Inside `startResultWatcher()`'s main tick, when
    `!isClientConnected()` — iterate the same `files` the connected
    path iterates, `execSync` `python3 src/dm-result.py --file <f>`
    per unprocessed file, then run the same bookkeeping the
    connected path runs (`_sendTaskStatus`, `_deliveredResults.add`,
    `_pendingTasks.delete`, `/task-done` POST, 10s unlink).

### Why this is better

1. **Latency.** The poll loop had a cumulative 30s+90s worst case.
   task-bridge.ts runs on the same interval as the result watcher
   (2s tick per the file-loop's natural cadence), so the DM arrives
   within a single tick of the result being written. No grace period
   needed — task-bridge already tracks `_deliveredResults` so it
   can't double-send.

2. **Right layer.** task-bridge.ts is the single owner of the
   "voice-agent delivery" contract. Fallback is a property of that
   delivery contract, so it belongs next to the normal delivery path
   — not in a sibling process (discord-bridge.py) polling the same
   directory.

3. **No redundant state.** The polling loop needed `pending_replies`
   awareness, a file-age check, and a grace window. task-bridge.ts
   already has `_deliveredResults` tracking and `isClientConnected()`
   state — nothing to reinvent.

4. **Subprocess parity preserved.** Both the old polling loop and
   this path shell out to `python3 src/dm-result.py --file <f>`.
   `dm-result.py`'s API-based owner + DM-channel resolution from
   #349 is unchanged and carries over directly.

### What this PR does NOT change

- `src/dm-result.py` — untouched. All the `_resolve_owner_id`,
  `_open_dm_channel`, and `_discord_api` helpers from #349 stay as
  they are. The wiring PR just calls it differently.
- `src/discord-bridge.py` `poll_proactive()` — still handles
  `results/proactive-*.txt` with its own direct-Discord-client DM
  path. That's a different contract (explicitly write-proactive-to-DM)
  and doesn't need the voiceConnected check.

### Verification

- `python3 -c "import ast; ast.parse(...)"` clean on discord-bridge.py
- `npx tsc --noEmit --skipLibCheck` clean on task-bridge.ts
- Net diff: −33 lines across 2 files
- dm-result.py smoke test from #349 still valid — this PR doesn't
  touch the script

Co-authored-by: Chi <wangchi@Chis-Mac-mini.local>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
sonichi pushed a commit that referenced this pull request Apr 15, 2026
First half of the D+C DM-fallback redesign owner approved after the
2026-04-15 DM-flood post-mortem. This PR lands D (separate retention
policy) as a standalone hygiene win. C (event-driven fs.watch wiring
for the DM fallback) will stack on top as a follow-up PR.

### What this does

- Adds `src/archive-stale-results.py` — a small one-shot script that
  walks `results/*.txt`, moves any file whose mtime is older than
  `$RETENTION_HOURS` (default 24) under `results/archive-YYYY-MM-DD/`.
  Files inside existing `archive-*` subdirectories are never touched.
  Honors `DRY_RUN=1` for safe preview.
- Invokes it from `src/startup.sh` right after `mkdir -p tasks results
  data` — i.e. before any long-running service starts iterating
  `results/`. Any accumulated backlog from a prior quiet day gets
  swept out of the service iteration paths at boot time.

### Why this first, before any DM fallback rewiring

Today's incident proved that `results/` accumulates long-dead files
indefinitely: task-*, question-*, briefing-*, insight-*, friction-*
from voice-originated work that was never spoken because voice wasn't
connected at the time. Any code that scans `results/` for delivery —
my reverted #352 task-bridge wiring, #349's discord-bridge polling
loop, or any future design — is going to re-deliver the entire
backlog on first tick after a cold boot unless the directory is
actually kept small.

D (retention) fixes the underlying hygiene concern independently of
the DM-fallback question. Even if we never wire fallback again, the
directory staying bounded is an unambiguous win. With D in place, C
(the event-driven follow-up) can use `fs.watch` without needing a
complex "pre-existing files at boot" escape hatch — D has already
moved them out of the way.

### Retention window rationale

Defaulted to **24 hours**. This is a judgment call:

- Too short (e.g. 1h) — risks archiving same-day results the user
  might still want the voice agent to speak when they reconnect.
- Too long (e.g. 7d) — bounds the accumulation less aggressively.
- 24h feels right: anything older than one full day is effectively
  dead mail. Overridable per-run via `RETENTION_HOURS=N`.

### Verification

- `bash -n src/startup.sh` clean
- `ast.parse` clean on `archive-stale-results.py`
- Dry run against fresh clean results/: "no stale files to archive"
- Fabricated a 2-day-old test file, DRY_RUN=1 correctly flagged it,
  then real run correctly moved it to `results/archive-2026-04-15/`.
- Does not affect `results/calls/` (subdirectory, skipped by
  `f.is_file()` check).

### What's not in this PR

- **C (event-driven fs.watch wiring)** — stacks on this. Will replace
  the current `readdirSync(RESULT_DIR)` scan in task-bridge.ts with
  an fs.watch listener that only reacts to newly-created files.
  Separate PR, coming next.
- **Full architectural redo of DM fallback delivery** — out of scope;
  the pause-on-DM-fallback rule from the post-mortem still holds.
  This PR is pure hygiene.

### References

- Post-mortem: `notes/post-mortem-dm-flood-2026-04-15.md`
- Feedback memory: `feedback_results_dir_is_graveyard.md`
- Owner's D+C decision: Discord 23:20 local

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant