Skip to content

Add export v1 contracts and session summary export#991

Open
wesm wants to merge 5 commits into
mainfrom
dust-promotion-pr
Open

Add export v1 contracts and session summary export#991
wesm wants to merge 5 commits into
mainfrom
dust-promotion-pr

Conversation

@wesm

@wesm wesm commented Jul 4, 2026

Copy link
Copy Markdown
Member

This PR turns report/export JSON into explicit v1 contracts for programmatic consumers. It adds shared schema, pricing provenance, and project identity metadata to usage daily and activity report outputs, and introduces a daemonless agentsview export sessions summary export for headless analytics. The session export is content-free, supports JSON/NDJSON, and includes per-session usage, model, cost, project, worktree, branch, machine, timestamp, and classification metadata without transcript content.

Pricing provenance is centralized under internal/export: reports use a resolver-derived block with source/table metadata, RFC 8785-style digest, fallback indicators, cost_source, and a bounded per-model effective rates map. Source-reported costs are marked so consumers know when token-times-rate recomputation is not expected, and reasoning tokens are handled as output-rate billing breakdowns.

Project identity now persists raw observations at sync/import time and recomputes stable identities at export time. Remote-backed identities use normalized network remotes with sha256: keys; path-backed fallbacks remain explicit and machine-local. The identity store is preserved through resync and mirrored through PostgreSQL/DuckDB so CLI and HTTP exports stay aligned across backends.

The new session-summary export adds stable watermark/keyset pagination, cursor-reset signaling, --all, NDJSON meta rows, root/child and automation filtering, and shared pricing/project metadata. Existing usage/activity payloads stay additive: metadata lands as sibling blocks, and daily breakdown arrays are pinned as arrays rather than omitted.

Docs now describe the v1 contract rules, pricing digest input, project identity derivation, cursor behavior, session-export limits, and default exclusion caveats. Golden fixtures pin usage daily, usage daily with breakdowns, activity report, and session export JSON/NDJSON shapes. Stale docs/superpowers design notes were removed, and the shared contract package was renamed from internal/exportcontracts to internal/export.

Reviewers should focus on:

  • shared DTO/resolver code in internal/export
  • project identity capture, fallback, resync preservation, and mirror-backend persistence
  • session summary export query/cursor behavior in internal/db/session_export.go and cmd/agentsview/export.go
  • pricing provenance coupling across SQLite, PostgreSQL, and DuckDB usage/activity paths

The main tradeoff is landing the related export-contract issues together so field names and semantics stay shared across surfaces. This intentionally does not add redaction flags or per-row pricing provenance: raw project paths/remotes are emitted by default, and pricing provenance remains report-level with a bounded per-model map.

wesm added 3 commits July 3, 2026 22:31
Programmatic consumers need stable JSON contracts for report and session-summary exports, including reproducible pricing metadata and project identity that can survive cross-machine aggregation. This commit publishes the final branch state without the private local-path examples that existed only in intermediate local commits.

The export surfaces stay additive while gaining shared schema metadata, pricing provenance, persisted project identity, and a content-free daemonless session summary export for headless analytics.
V1 report metadata should distinguish an empty projects map from legacy absence. Usage summary and activity responses now keep the projects field present, with regression coverage at the DB, activity, service, and HTTP response layers.
The v1 activity report golden only needs enough rows to pin the contract surface. The prior full-day default generated hundreds of empty five-minute buckets, making review noisy without adding distinct schema coverage.

Use a narrow custom window around the seeded sessions so the fixture still carries pricing, project identity, partial-range metadata, nonzero buckets, summaries, sessions, and intervals while staying small enough to read.
@roborev-ci

roborev-ci Bot commented Jul 4, 2026

Copy link
Copy Markdown

roborev: Combined Review (c96e40e)

High: internal/postgres/push.go:509 persists raw git remotes into shared PostgreSQL. Since GitRemote is read unchanged from .git/config at internal/sync/engine.go:6628-6630, credential-bearing remotes like https://user:token@github.com/org/repo.git can leak tokens to DB readers/operators. Sanitize before storage or persist only the normalized remote/key, and add a migration to scrub existing rows.

Medium: cmd/agentsview/export.go:196 opens SQLite via openDB, bypassing the write-owner lock/live daemon coordination. The export path can still write through GetOrCreateDatabaseID, so running agentsview export sessions beside a writable server may create an uncoordinated second writer. Route export through the locked write path when metadata may be created, or make export strictly read-only by reading an existing DB ID only.


Reviewers: 2 done | Synthesis: codex, 7s | Total: 17m41s

V1 export metadata is now consumed as join and provenance data, so it must not leak credential-bearing remotes or mutate archives from daemonless read commands. Persist sanitized project remotes and scrub existing observations while preserving remote-derived project keys.\n\nSession-summary export now requires an existing archive database ID and uses the read-only open path, avoiding an uncoordinated SQLite writer beside the daemon.\n\nPostgreSQL and DuckDB now treat explicit pricing rows as the effective pricing table, matching SQLite provenance digests while retaining fallback pricing for empty fresh mirrors. The docs also call out that the new JSON contracts aim for compatibility but may still settle.
@roborev-ci

roborev-ci Bot commented Jul 4, 2026

Copy link
Copy Markdown

roborev: Combined Review (367a7ea)

High risk overall: reviewers found one production-blocking export issue plus two medium integrity/ordering issues.

High

  • cmd/agentsview/export.go:203: export sessions requires an existing database_id, but no production path creates one. GetOrCreateDatabaseID is only used in tests, so real archives can keep failing with “database id missing” even after restarting serve.
    • Fix: Initialize the database ID from a writable startup/open path, such as after db.Open/migration during serve, before read-only export relies on it.

Medium

  • internal/db/session_export.go:424: Session export orders and paginates by raw RFC3339Nano text timestamps. Whole-second values like ...00Z sort after fractional values like ...00.999Z lexicographically, so latest-activity ordering and cursor boundaries can be wrong within the same second.

    • Fix: Use a normalized sortable timestamp key for watermark, comparisons, prefix fingerprints, and ORDER BY, such as fixed-width fractional UTC text or parsed epoch/nanosecond components.
  • internal/db/session_export.go:653: Session export cursors are HMAC-signed with a public constant instead of the archive cursor secret, so callers can forge valid cursors and bypass the filter/database/prefix integrity checks that the signature is meant to enforce.

    • Fix: Sign and verify export cursors with db.cursorSecret, optionally domain-separated for session export.

Reviewers: 2 done | Synthesis: codex, 10s | Total: 20m12s

Session-summary export is intentionally read-only, so fresh archives need their database identity created by the normal writable startup path rather than by the export command. This keeps daemonless export from opening a second writer while letting archives initialized by serve or sync export immediately.\n\nDuckDB mirrors also need a repair for rows written before remote sanitization. The mirror schema migration now rewrites credential-bearing project identity remotes to their sanitized storage form and records a repair marker so compatibility checks can detect mirrors that still need the scrub.
@roborev-ci

roborev-ci Bot commented Jul 4, 2026

Copy link
Copy Markdown

roborev: Combined Review (ec435ca)

Summary verdict: one medium-severity issue should be fixed before merge.

Medium

  • internal/db/session_export.go:798: last_activity_at is derived with MAX() over raw timestamp text and then reused for ordering, watermarks, and cursor comparisons. SQLite sorts RFC3339/RFC3339Nano strings lexically, so timestamps like ...00.123Z can sort before ...00Z, causing incorrect export ordering, paging, or reported activity when sub-second timestamps are present.

    Fix: Use a normalized sortable timestamp value for max/order/cursor predicates, and add a regression test with mixed Z and fractional timestamps.


Reviewers: 2 done | Synthesis: codex, 7s | Total: 21m33s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant