Skip to content

bug: make subagent completion delivery durable under busy-lane and restart conditions #21

@100yenadmin

Description

@100yenadmin

Problem

Subagent results are being lost or delayed because completion delivery depends on synchronous direct announce into a busy parent lane. Restart/drain windows and local loopback tick timeouts make this worse.

Findings so far

  • dominant failure mode is announce-path design under congestion, not just transport instability
  • drain/restart windows reject announces outright
  • retries amplify congestion
  • there is no durable spool/inbox protecting results after retry exhaustion

Goal

Make completion durable first, delivery second.

Proposed direction

  • persist failed completion payloads to a durable queue/inbox
  • short-circuit direct announces during drain/restart states
  • batch/opportunistic later delivery when the requester lane is available
  • instrument effective timeout, queue depth, and event-loop lag

Acceptance criteria

  • no completed subagent result can be silently lost after retry exhaustion
  • restart/draining states persist instead of re-hammering direct announce
  • blocked parent lanes delay delivery, but do not destroy result visibility

Notes

This should be traced in code, not just from logs. GitHub-native fix path only.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions