Local source: coven-github/issues/01-add-durable-task-queue-and-delivery-idempotency.md
Summary
coven-github should persist accepted webhook work before returning success and should deduplicate GitHub webhook deliveries by X-GitHub-Delivery. The current development path uses an in-process tokio::mpsc channel and can drop tasks when the queue is full while still returning 200 OK to GitHub.
Current Evidence
crates/webhook/src/routes.rs validates the webhook, parses the event, maps it to a Task, and then calls state.task_tx.try_send(task).
- If
try_send fails, the route logs task queue full - dropping task and still returns StatusCode::OK.
crates/github/src/tasks.rs provides an in-memory task store backed by a HashMap.
README.md marks durable queue / task store as planned and required for hosted reliability.
DESIGN.md, ROADMAP.md, docs/security.md, and docs/hosted-mvp-plan.md all call out durable task state and delivery idempotency as hosted prerequisites.
Problem
GitHub considers a 200 OK webhook response a successful delivery. If coven-github drops the task after accepting the webhook, GitHub will not retry it, and the user sees a missing agent response with no obvious recovery path. Similarly, without delivery id idempotency, GitHub redelivery can create duplicate tasks, duplicate comments, duplicate branches, or duplicate PRs.
This is acceptable for local smoke testing, but it is not acceptable for a hosted GitHub App that promises reliable issue, PR, or commit review.
Impact
- Accepted tasks can disappear under backpressure.
- Process restarts lose queued/running/done task history.
- GitHub webhook redelivery can run the same task more than once.
- CovenCave cannot reliably reconstruct task state after a server restart.
- Billing, audit, and support cannot answer whether a task was accepted, ignored, failed, retried, or duplicated.
Proposed Design
Introduce a durable task subsystem with these records:
webhook_deliveries: delivery id, event type, installation id, repository id/name, action, received time, payload hash, routing result, processing state.
tasks: task id, delivery id, installation id, repository id/name, familiar id, task kind, target issue/PR/commit, target head SHA, policy snapshot, state, retry count, timestamps.
task_attempts: task id, attempt number, worker id, start/end time, failure category, result pointer.
The webhook route should:
- Validate HMAC.
- Parse
X-GitHub-Delivery.
- Persist the delivery before task dispatch.
- Check whether the delivery has already been routed.
- Persist the task in
queued state.
- Enqueue by durable task id.
- Return
202 Accepted or 200 OK only after durable state exists.
The worker should consume task ids from the durable queue, claim them atomically, and update task state after every phase.
Acceptance Criteria
- Webhook handling refuses or explicitly marks requests missing
X-GitHub-Delivery.
- Replaying the same GitHub delivery id does not create a duplicate task.
- A process restart after webhook acceptance does not lose the task.
- A full worker queue does not silently drop work after returning success to GitHub.
- Task states include at least
received, queued, running, failed, completed, and ignored.
- Tests cover duplicate delivery, queue-full behavior, restart/reload behavior, and unsupported event routing.
- README status changes from planned to partial or implemented only after the durable path is actually wired.
Test Notes
Add webhook route tests that inject a delivery id and verify persistence before enqueue. Add integration tests with a fake queue/store that fails enqueue and assert the webhook response does not claim success unless the task can be recovered or retried.
Local source:
coven-github/issues/01-add-durable-task-queue-and-delivery-idempotency.mdSummary
coven-githubshould persist accepted webhook work before returning success and should deduplicate GitHub webhook deliveries byX-GitHub-Delivery. The current development path uses an in-processtokio::mpscchannel and can drop tasks when the queue is full while still returning200 OKto GitHub.Current Evidence
crates/webhook/src/routes.rsvalidates the webhook, parses the event, maps it to aTask, and then callsstate.task_tx.try_send(task).try_sendfails, the route logstask queue full - dropping taskand still returnsStatusCode::OK.crates/github/src/tasks.rsprovides an in-memory task store backed by aHashMap.README.mdmarks durable queue / task store as planned and required for hosted reliability.DESIGN.md,ROADMAP.md,docs/security.md, anddocs/hosted-mvp-plan.mdall call out durable task state and delivery idempotency as hosted prerequisites.Problem
GitHub considers a
200 OKwebhook response a successful delivery. Ifcoven-githubdrops the task after accepting the webhook, GitHub will not retry it, and the user sees a missing agent response with no obvious recovery path. Similarly, without delivery id idempotency, GitHub redelivery can create duplicate tasks, duplicate comments, duplicate branches, or duplicate PRs.This is acceptable for local smoke testing, but it is not acceptable for a hosted GitHub App that promises reliable issue, PR, or commit review.
Impact
Proposed Design
Introduce a durable task subsystem with these records:
webhook_deliveries: delivery id, event type, installation id, repository id/name, action, received time, payload hash, routing result, processing state.tasks: task id, delivery id, installation id, repository id/name, familiar id, task kind, target issue/PR/commit, target head SHA, policy snapshot, state, retry count, timestamps.task_attempts: task id, attempt number, worker id, start/end time, failure category, result pointer.The webhook route should:
X-GitHub-Delivery.queuedstate.202 Acceptedor200 OKonly after durable state exists.The worker should consume task ids from the durable queue, claim them atomically, and update task state after every phase.
Acceptance Criteria
X-GitHub-Delivery.received,queued,running,failed,completed, andignored.Test Notes
Add webhook route tests that inject a delivery id and verify persistence before enqueue. Add integration tests with a fake queue/store that fails enqueue and assert the webhook response does not claim success unless the task can be recovered or retried.