Local source: coven-github/issues/12-add-audit-artifact-retention-and-redaction-controls.md
Summary
Hosted coven-github needs an audit and artifact model that records what happened without retaining raw secrets, private checkouts, or unfiltered transcripts. The current local/dev path has task state and docs describing what not to persist, but no durable audit system yet.
Current Evidence
docs/container-isolation.md says not to persist raw installation tokens, GitHub App private keys, full repository checkouts, unredacted model keys, or arbitrary command output without secret filtering.
docs/security.md says logs, Check Runs, issue comments, PR bodies, and task APIs must redact tokens.
docs/architecture.md and HOSTED.md call for usage, audit, retention controls, and task history.
README.md marks durable queue/task store and hosted tier as planned.
Problem
The hosted product needs enough auditability to debug, bill, support, and satisfy customer trust requirements. At the same time, retaining raw agent transcripts, tokens, full repository contents, or command output can create serious data exposure and compliance risk.
Impact
- Operators cannot answer what triggered a task, which permissions were used, what was published, or why a task failed.
- Customers cannot inspect or delete task artifacts by installation/repo.
- Sensitive data can leak into durable storage or support views.
- Memory governance cannot be audited against actual task inputs and outputs.
Proposed Design
Define artifact classes:
task_metadata: durable, tenant-scoped, safe by default.
publication_metadata: durable links to Check Runs, comments, PRs, branches.
agent_result: durable after schema validation and redaction.
logs: retained only after filtering and retention policy.
transcripts: opt-in, scoped, redacted, and retention-limited.
repo_checkout: never retained after task cleanup.
tokens/secrets: never retained.
Add audit events:
- webhook received/routed/ignored;
- task queued/claimed/started/finished/failed;
- token minted with installation id and permission class, but never token value;
- memory read/write/proposal decisions;
- publication decisions and GitHub API target links;
- redaction events.
Acceptance Criteria
- Durable stores never persist raw installation token values.
- Task API responses are redacted and tenant-scoped.
- Audit records include trigger, actor, installation, repository, task id, target refs, policy snapshot id, and publication links.
- Retention policy can delete artifacts by installation/repo/task.
- Tests scan stored artifacts for token and secret patterns.
- Documentation explains what is retained in self-hosted and hosted modes.
Test Notes
Add tests that run a fake task containing token-like strings in output. Assert stored task state, logs, Check Run summaries, issue comments, PR bodies, and API responses contain redacted values only.
Local source:
coven-github/issues/12-add-audit-artifact-retention-and-redaction-controls.mdSummary
Hosted
coven-githubneeds an audit and artifact model that records what happened without retaining raw secrets, private checkouts, or unfiltered transcripts. The current local/dev path has task state and docs describing what not to persist, but no durable audit system yet.Current Evidence
docs/container-isolation.mdsays not to persist raw installation tokens, GitHub App private keys, full repository checkouts, unredacted model keys, or arbitrary command output without secret filtering.docs/security.mdsays logs, Check Runs, issue comments, PR bodies, and task APIs must redact tokens.docs/architecture.mdandHOSTED.mdcall for usage, audit, retention controls, and task history.README.mdmarks durable queue/task store and hosted tier as planned.Problem
The hosted product needs enough auditability to debug, bill, support, and satisfy customer trust requirements. At the same time, retaining raw agent transcripts, tokens, full repository contents, or command output can create serious data exposure and compliance risk.
Impact
Proposed Design
Define artifact classes:
task_metadata: durable, tenant-scoped, safe by default.publication_metadata: durable links to Check Runs, comments, PRs, branches.agent_result: durable after schema validation and redaction.logs: retained only after filtering and retention policy.transcripts: opt-in, scoped, redacted, and retention-limited.repo_checkout: never retained after task cleanup.tokens/secrets: never retained.Add audit events:
Acceptance Criteria
Test Notes
Add tests that run a fake task containing token-like strings in output. Assert stored task state, logs, Check Run summaries, issue comments, PR bodies, and API responses contain redacted values only.