`forgelm purge` — Subcommand Reference

Audience: ForgeLM operators honouring GDPR Article 17 erasure requests against training corpora and run-scoped artefacts, plus auditors verifying the resulting data.erasure_* audit chain. Mirror: purge_subcommand-tr.md

forgelm purge is the operator-facing implementation of the GDPR Article 17 right-to-erasure for ForgeLM training corpora and run artefacts (Phase 21). It atomically deletes a row, a run's staging directory, or a run's compliance bundle, and records every step in the tamper-evident audit_log.jsonl chain.

For the deployer-flow walkthrough (DSAR ticket → CLI invocation → verification), see ../guides/gdpr_erasure.md. This page is the per-flag, per-event reference.

Synopsis

forgelm purge --row-id ID --corpus PATH [--row-matches {one,all}]
              [--output-dir DIR] [--justification TEXT] [--dry-run]
              [--output-format {text,json}]

forgelm purge --run-id RUN_ID --kind {staging,artefacts}
              --output-dir DIR [--justification TEXT] [--dry-run]
              [--output-format {text,json}]

forgelm purge --check-policy --config PATH [--output-dir DIR]
              [--output-format {text,json}]

The three modes are mutually exclusive (argparse enforces a single mode group).

Modes

Corpus-row erasure (`--row-id`)

Erases a single row identified by its id (or row_id) field from a JSONL training corpus. Implemented in forgelm.cli.subcommands._purge._run_purge_row_id.

Flag	Required	Description
`--row-id ID`	yes	Identifier to erase. Hashed before audit emission via `forgelm.cli.subcommands._purge._hash_target_id`.
`--corpus PATH`	yes	Single JSONL file. Directory mode is rejected — operators loop in their own script.
`--row-matches {one,all}`	no (default `one`)	`one` refuses on >=2 matches; `all` deletes every match (operator confirms intent).
`--output-dir DIR`	no	Defaults to the parent of `--corpus`. The per-output-dir salt at `<output_dir>/.forgelm_audit_salt` is read here for `target_id` hashing. The implicit fallback emits a WARNING naming the resolved dir so the operator can pin `--output-dir` for cross-tool correlation with `forgelm reverse-pii`.
`--justification TEXT`	no	Operator-supplied reason recorded on every erasure event. Reference your internal ticket id; do not paste subject identifiers.
`--dry-run`	no	Preview the deletion + emit the audit chain (with `dry_run=true`) without modifying disk.

Atomic write contract. The corpus is rewritten via a sibling temp file + os.replace, so an interrupted purge leaves either the full pre-erasure file or the full post-erasure file — never a partial state.

Line-number fallback is rejected (design §4.2). ForgeLM does not currently provide an id-population helper (forgelm audit --add-row-ids is on the Phase 28 backlog); operators with id-less corpora must pre-populate ids with an operator-side script before invoking forgelm purge --row-id.

Run-scoped artefact erasure (`--run-id` + `--kind`)

Kind	Target
`staging`	`<output_dir>/final_model.staging.<run_id>/` (and the legacy `final_model.staging/` if present).
`artefacts`	`<output_dir>/compliance/*.json` files whose name embeds `<run_id>`.

--kind logs is intentionally absent: audit logs are append-only Article 17(3)(b) records and are not deleted by the tool.

Retention-policy report (`--check-policy`)

Read-only scan. Walks <output_dir>, compares each artefact's age (canonical: audit-log genesis timestamp; fallback: filesystem mtime with age_source=mtime) against the loaded config's retention: horizons, and emits a structured violation list.

Successful policy reports always exit 0 (report-not-gate semantic per design §10 Q5). Config-load failures exit EXIT_CONFIG_ERROR (1) — an explicit --config that is missing, unreadable, or fails Pydantic validation surfaces as a non-zero so the operator does not mistake "loader failed" for "no violations". Operators wiring a CI gate use --output-format json and pipe to jq '.violations | length' themselves.

Salt resolution

The per-output-dir salt at <output_dir>/.forgelm_audit_salt is created on first use (mode 0600, atomic O_EXCL write) by forgelm.cli.subcommands._purge._read_persistent_salt. forgelm.cli.subcommands._purge._resolve_salt returns (salt_bytes, salt_source):

salt_source = "per_dir" — FORGELM_AUDIT_SECRET is unset; the persistent salt is used verbatim.
salt_source = "env_var" — FORGELM_AUDIT_SECRET is set; the first 16 bytes are XOR'd with the persistent salt.

This XOR feeds identifier hashing only. The audit-chain HMAC key is derived independently as SHA-256(FORGELM_AUDIT_SECRET ‖ run_id) inside forgelm.compliance.AuditLogger.__init__. The two primitives are intentionally separate; rotating FORGELM_AUDIT_SECRET therefore rotates both, but inspecting either does not reveal the other.

Cross-tool digest correlation with `forgelm reverse-pii`

When forgelm purge --row-id <value> and forgelm reverse-pii --query <value> --salt-source per_dir run against the same <output_dir> (i.e. consume the same .forgelm_audit_salt), the target_id field on data.erasure_* events and the query_hash field on data.access_request_query events are byte-identical. A compliance reviewer can therefore correlate Article 17 erasures and Article 15 access requests for the same data subject without ever seeing the cleartext identifier. Pinned by tests/test_reverse_pii.py::test_purge_target_id_matches_reverse_pii_query_hash_on_same_output_dir.

Audit events emitted

All six events ride the common envelope from audit_event_catalog.md. The catalog rows are reproduced here for convenience.

Event	When emitted	Key payload
`data.erasure_requested`	First step of any `--row-id` / `--run-id` invocation, BEFORE any deletion. `--check-policy` is read-only and emits no events.	`target_kind` ∈ `{row, staging, artefacts}`, `target_id` (hashed in row mode), `salt_source` (row mode), `corpus_path` (row), `output_dir` (run), `justification`, `dry_run`
`data.erasure_completed`	Successful deletion finished.	All `requested` fields + `bytes_freed`, `files_modified`, `pre_erasure_line_number` (row mode), `match_count` (row mode)
`data.erasure_failed`	Disk operation raised, OR no matching row/run, OR multi-row policy refused on ambiguity.	All `requested` fields + `error_class`, `error_message`
`data.erasure_warning_memorisation`	Row erasure × `final_model/` exists for any run that consumed this corpus.	All `completed` fields + `affected_run_ids`
`data.erasure_warning_synthetic_data_present`	Row erasure × `synthetic_data*.jsonl` exists in `output_dir`.	All `completed` fields + `synthetic_files`
`data.erasure_warning_external_copies`	Loaded config has a non-empty `webhook` block; downstream consumers may have received notices.	All `completed` fields + `webhook_targets` (redacted URLs)

forgelm verify-audit <output_dir>/audit_log.jsonl continues to validate the chain after any number of erasure events — the tool appends new events, never rewrites old ones.

Exit codes

Code	Meaning
0	Success, or a successful `--check-policy` report (report-not-gate semantic).
1	Config error: unknown `--row-id`, missing `--corpus`, mutually-exclusive flag combination, conflicting `staging_ttl_days` values, or `--check-policy --config <path>` that is missing / unreadable / fails Pydantic validation.
2	Runtime error: I/O failure, permission denied, atomic-rename failure.

--check-policy never returns code 3 or 4. Codes 3 (EXIT_EVAL_FAILURE) and 4 (EXIT_AWAITING_APPROVAL) are reserved for the training pipeline and are not part of this subcommand's surface.

JSON output envelope

With --output-format json every invocation prints exactly one JSON object on stdout. The shape varies by mode — three envelopes in total.

Row mode (success) — emitted by _perform_row_erasure_and_audit:

{
  "mode": "row",
  "dry_run": false,
  "salt_source": "per_dir",
  "corpus_path": "/abs/path/data/train.jsonl",
  "matches": 1,
  "first_line": 142,
  "bytes_freed": 482,
  "warnings": []
}

For --dry-run invocations the same envelope appears with dry_run: true and without bytes_freed (the rewrite never happens).

Run mode (success) — emitted by _run_purge_run_id:

{
  "mode": "run",
  "kind": "staging",
  "dry_run": false,
  "run_id": "fg-abc123def456",
  "deleted": ["/abs/path/outputs/run42/final_model.staging.fg-abc123def456"],
  "bytes_freed": 102400000
}

deleted is a list of absolute paths actually removed (or, on --dry-run, the would_delete key carries the same shape and deleted is absent).

--check-policy mode — emitted by _run_purge_check_policy:

{
  "success": true,
  "violations": [
    {
      "artefact_kind": "ephemeral_artefact",
      "path": "/abs/path/outputs/run42/compliance/data_audit_report.json",
      "age_days": 121.4,
      "horizon_days": 90,
      "age_source": "audit_genesis"
    }
  ],
  "count": 1
}

age_source ∈ {audit_genesis, mtime} — audit_genesis is the canonical age (run's first audit event timestamp); mtime is the filesystem fallback flagged for the operator's awareness.

Failure envelope (any mode) — _output_error_and_exit:

{
  "success": false,
  "error": "Row id 'ali@example.com' not found in 'data/train.jsonl'."
}

Note the asymmetry: row/run success envelopes do not carry an explicit success field (the mode discriminator is the success signal); --check-policy and the failure envelope do. Operators wiring CI gates should branch on the mode key (success path) or success: false (failure path), not on the presence/absence of success: true.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`forgelm purge` — Subcommand Reference

Synopsis

Modes

Corpus-row erasure (`--row-id`)

Run-scoped artefact erasure (`--run-id` + `--kind`)

Retention-policy report (`--check-policy`)

Salt resolution

Cross-tool digest correlation with `forgelm reverse-pii`

Audit events emitted

Exit codes

JSON output envelope

See also

FilesExpand file tree

purge_subcommand.md

Latest commit

History

purge_subcommand.md

File metadata and controls

forgelm purge — Subcommand Reference

Synopsis

Modes

Corpus-row erasure (--row-id)

Run-scoped artefact erasure (--run-id + --kind)

Retention-policy report (--check-policy)

Salt resolution

Cross-tool digest correlation with forgelm reverse-pii

Audit events emitted

Exit codes

JSON output envelope

See also

`forgelm purge` — Subcommand Reference

Corpus-row erasure (`--row-id`)

Run-scoped artefact erasure (`--run-id` + `--kind`)

Retention-policy report (`--check-policy`)

Cross-tool digest correlation with `forgelm reverse-pii`