Skip to content

Add event import functionality with validation and command support#183

Merged
asphaltbuffet merged 11 commits into
mainfrom
import-command
May 29, 2026
Merged

Add event import functionality with validation and command support#183
asphaltbuffet merged 11 commits into
mainfrom
import-command

Conversation

@asphaltbuffet
Copy link
Copy Markdown
Owner

This pull request introduces a new import command for the CLI, enabling users to restore inventory state by importing events from an NDJSON file. The implementation includes robust input handling, user safety checks for destructive operations, and a comprehensive suite of tests. It also updates documentation to clarify the import process and warning handling.

Major features and changes:

New Import Command Implementation

  • Added a new import command (cmd/import/import.go, cmd/import/db.go) that reads events from an NDJSON file or stdin and replays them into the database, with options for replacing existing data, continuing on errors, and detailed summary output. [1] [2]
  • Integrated the new import command into the CLI by registering it in cmd/root.go. [1] [2]

Robust Input and Error Handling

  • Enforced a per-line input size limit of 16 MiB to prevent garbage/binary files from causing issues, with documentation added to CONTEXT.md explaining the rationale. [1] [2]
  • Implemented safety checks requiring --yes confirmation for destructive --replace operations and preventing import if the database is non-empty unless explicitly replaced.

Comprehensive Testing

  • Added a thorough test suite for the import command (cmd/import/import_test.go), covering valid/invalid input, error handling, quiet mode, large event notes, warning surfaces, and destructive operation safeguards.

Documentation Updates

  • Clarified the import process, including the use of a shared writeEvent helper and the handling of warnings in ImportResult, in the ADR documentation. [1] [2]

References:
[1] [2] [3] [4] [5] [6] [7] [8]

…l timestamp

- Accepts fully-populated *inventory.Event; preserves ev.TimestampUTC
- Returns new DB-assigned autoincrement event_id
- EntityPathChangedEvent is a no-op (returns 0, nil; no row inserted)
- All other event types go through applyEventTx as normal

Closes #176
- HasEvents: returns true if events table is non-empty (EXISTS short-circuit)
- ClearAllData: deletes entities_current then events in FK-safe order; leaves schema_metadata intact

Closes #177
… validation

- ImportOptions{Replace, Continue} and ImportResult{Replayed, Failed, Warnings, Errors}
- Non-monotonic event_id input returns error with no DB writes
- EntityPathChangedEvent records buffered for post-reparent validation, not replayed
- Path-changed count/payload mismatches increment Warnings via structured logger
- --continue accumulates per-event errors in ImportResult.Errors instead of aborting
- importRunner struct extracts per-record logic to keep cognitive complexity under limit
- 10 tests covering all acceptance criteria

Closes #178
…tput

- cmd/import/db.go: importApp interface (ImportEvents + HasEvents)
- cmd/import/import.go: NewImportCmd/NewDefaultImportCmd, --file/--replace/--yes flags
- Reads NDJSON from stdin or --file, parses each line as ExportResult
- Prints 'Replayed: N, Failed: M, Warnings: W' to stderr; suppressed by --quiet
- app.HasEvents delegation added to *app.App to satisfy importApp interface
- Registered in cmd/root.go
- 5 tests covering all acceptance criteria

Closes #179
- Non-empty DB without --replace → error before any replay
- --replace without --yes → error, ClearAllData not called
- --replace --yes → ClearAllData called before ImportEvents
- --yes without --replace → silently accepted
- --continue → passes Continue:true to ImportOptions, exits 0 on partial failure
- app.ClearAllData delegation added to *app.App
- Extract checkReplaceFlags/openInput/parseNDJSON to keep runImport under complexity limit
- 5 new tests (10 total in cmd/import)

Closes #180
Move ClearAllData ownership from the import command into App.ImportEvents
via ImportOptions.Replace. The cmd layer now only decides whether to set
Replace=true (based on --replace --yes); the app layer is the single
owner of the destructive step.

Removes the dead HasEvents probe in importEvents that discarded its
boolean return — checking store responsiveness redundantly with the
ClearAllData call that immediately followed.

Updates TestRunImport_ReplaceAndYes_ClearsBeforeImport to assert the new
cmd-layer contract (Replace=true passed through), and adds an app-layer
test exercising opts.Replace=true end-to-end against a real SQLite app.

No behavior change for the CLI user.
…d-import failure

Add a Level 2 dry-validation pass to importEvents that runs after the
monotonic event_id check and before the opts.Replace ClearAllData call.
For every record:
- ParseEventType must succeed (rejects unknown event types)
- Payload must unmarshal into a JSON object (rejects malformed JSON,
  non-object payloads, truncated bytes)

Both checks happen before any DB write. A file that would have left the
database empty after ClearAllData + mid-replay failure now errors before
the destructive step, leaving the original database intact.

This is Level 2 of a four-level validation scale. Level 3 (per-event-type
payload schema validation) is tracked as deferred follow-up work — it
would require the app layer to import handler-specific payload schemas
from internal/eventbus, which is a layering decision that deserves its
own ADR.

The --continue test was updated: it previously used an unknown event
type to trigger per-event errors, which the new pre-flight pass now
rejects unconditionally. Replaced with a rename-of-nonexistent-entity
case that passes Level 2 validation but fails at replay time inside
applyEventTx, which is what --continue is actually for.
bufio.Scanner's default 64 KiB per-token limit is too tight for a single
NDJSON line that includes a long human-typed Note — import would fail
with bufio.ErrTooLong on otherwise-valid input.

Raise the per-line cap to 16 MiB. This is 4-5 orders of magnitude above
realistic event sizes (typical payloads + notes are a few hundred bytes
each) while still failing loudly on garbage files (e.g. binary blobs
masquerading as NDJSON). The 16 MiB ceiling is also well below SQLite's
default row-size limit, so the bottleneck stays at the parser with a
clear error rather than deep in the storage layer.

Also adds a CONTEXT.md glossary entry for Note making the canonical
'human-typed, used infrequently, not for blobs' constraint explicit —
the size cap is the operational expression of that constraint, so they
land together.
ImportResult.Warnings was an int counter. Callers could see 'how many
warnings' but not 'which ones' — diagnostic detail was only visible via
the structured logger, which the CLI summary did not surface.

Split into two fields kept in lockstep:
- WarningCount int  (the count, was previously named Warnings)
- Warnings     []error  (one descriptive error per increment)

Invariant: len(Warnings) == WarningCount.

A private importRunner.warn(err) helper bumps both fields together, so
call sites stay one line and the two fields cannot drift.

The CLI summary now prints the count on the existing summary line and
one indented 'warning: <message>' line per entry in Warnings, suppressed
by --quiet. This makes path-changed-mismatch diagnostics actionable
without requiring the user to inspect logs.

ADR 0005 updated to document the new warning surface.
…ReplayEvent

Dispatch and ReplayEvent shared ~95% of their bodies — same INSERT SQL,
same LastInsertId handling, same applyEventTx call — differing only in
how TimestampUTC and EntityID were sourced.

Extract a private Bus.writeEvent(ctx, *inventory.Event) (int64, error)
that owns the INSERT + LastInsertId + applyEventTx sequence. Dispatch
builds an *inventory.Event with a time.Now() timestamp and payload-parsed
entity_id, then delegates. ReplayEvent's EntityPathChangedEvent early
return is preserved at the public method, and every other event type
delegates to the shared helper.

The helper does a value-copy of its input (inserted := *ev) before
setting EventID and calling applyEventTx, so callers' input structs are
never mutated.

Net diff: ~95 LOC → ~75 LOC in bus.go; one canonical write path for
every future change.

ADR 0004 updated with an implementation note: the public separation
remains because the timestamp/entity_id sourcing genuinely differs, but
any further divergence should be expressed as a parameter to writeEvent
rather than re-splitting into two implementations.
ClearAllData hardcodes entities_current and events as the two tables it
deletes. Per CONTEXT.md, entities_current is currently the only projection
table — so the hardcoding is correct, not a missing abstraction.

Add an inline comment block making the invariant and the tripwire explicit:
adding a second projection table requires both updating ClearAllData and
recording an ADR (the second projection breaks the documented single-
projection invariant and that decision deserves to be captured).

This is the comment-only version of the more elaborate projection-registry
abstraction that was considered and rejected: with one projection, the
registry is just a one-element slice, and the comment-as-tripwire makes the
dependency discoverable without inventing the abstraction.
@asphaltbuffet asphaltbuffet merged commit 303fe58 into main May 29, 2026
6 checks passed
@asphaltbuffet asphaltbuffet deleted the import-command branch May 29, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant