Skip to content

Add audit, artifact retention, and redaction controls #12

@romgenie

Description

@romgenie

Local source: coven-github/issues/12-add-audit-artifact-retention-and-redaction-controls.md

Summary

Hosted coven-github needs an audit and artifact model that records what happened without retaining raw secrets, private checkouts, or unfiltered transcripts. The current local/dev path has task state and docs describing what not to persist, but no durable audit system yet.

Current Evidence

  • docs/container-isolation.md says not to persist raw installation tokens, GitHub App private keys, full repository checkouts, unredacted model keys, or arbitrary command output without secret filtering.
  • docs/security.md says logs, Check Runs, issue comments, PR bodies, and task APIs must redact tokens.
  • docs/architecture.md and HOSTED.md call for usage, audit, retention controls, and task history.
  • README.md marks durable queue/task store and hosted tier as planned.

Problem

The hosted product needs enough auditability to debug, bill, support, and satisfy customer trust requirements. At the same time, retaining raw agent transcripts, tokens, full repository contents, or command output can create serious data exposure and compliance risk.

Impact

  • Operators cannot answer what triggered a task, which permissions were used, what was published, or why a task failed.
  • Customers cannot inspect or delete task artifacts by installation/repo.
  • Sensitive data can leak into durable storage or support views.
  • Memory governance cannot be audited against actual task inputs and outputs.

Proposed Design

Define artifact classes:

  • task_metadata: durable, tenant-scoped, safe by default.
  • publication_metadata: durable links to Check Runs, comments, PRs, branches.
  • agent_result: durable after schema validation and redaction.
  • logs: retained only after filtering and retention policy.
  • transcripts: opt-in, scoped, redacted, and retention-limited.
  • repo_checkout: never retained after task cleanup.
  • tokens/secrets: never retained.

Add audit events:

  • webhook received/routed/ignored;
  • task queued/claimed/started/finished/failed;
  • token minted with installation id and permission class, but never token value;
  • memory read/write/proposal decisions;
  • publication decisions and GitHub API target links;
  • redaction events.

Acceptance Criteria

  • Durable stores never persist raw installation token values.
  • Task API responses are redacted and tenant-scoped.
  • Audit records include trigger, actor, installation, repository, task id, target refs, policy snapshot id, and publication links.
  • Retention policy can delete artifacts by installation/repo/task.
  • Tests scan stored artifacts for token and secret patterns.
  • Documentation explains what is retained in self-hosted and hosted modes.

Test Notes

Add tests that run a fake task containing token-like strings in output. Assert stored task state, logs, Check Run summaries, issue comments, PR bodies, and API responses contain redacted values only.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions