-
Notifications
You must be signed in to change notification settings - Fork 0
Checkpoint and Resume
DeepDiff DB checkpoints progress during long-running operations so they can be safely resumed after interruption — network failure, process kill, machine restart, or timeout.
A checkpoint file (.deepdiffdb_checkpoint.json) is created in the output directory at the start of any gen-pack or apply operation. It is deleted on successful completion.
{
"version": "1.0",
"operation": "hash_table",
"config_hash": "sha256:abc123...",
"output_dir": "./diff-output",
"created_at": "2024-01-15T10:00:00Z",
"last_updated": "2024-01-15T10:14:22Z",
"hash_table_state": {
"completed_tables": ["users", "products", "categories"],
"table_hashes": {
"users": { "1": "aaa...", "2": "bbb..." },
"products": { "10": "ccc..." },
"categories": { "1": "ddd..." }
}
}
}type State struct {
Version string
Operation string // "hash_table", "generate_pack", "apply_pack"
ConfigHash string // SHA-256 of current config file
OutputDir string
CreatedAt time.Time
LastUpdated time.Time
HashTableState *HashTableState
GeneratePackState *GeneratePackState
ApplyPackState *ApplyPackState
}
type HashTableState struct {
CompletedTables []string
TableHashes map[string]map[string]string
}
type GeneratePackState struct {
CompletedTables []string
GeneratedStatements []string
}
type ApplyPackState struct {
ExecutedStatements int
}Checkpoint files are written atomically to prevent corruption if the process is killed mid-write:
1. Serialize state to JSON
2. Write to <outputDir>/.deepdiffdb_checkpoint.json.tmp
3. os.Rename(.tmp → .json) ← atomic on POSIX filesystems
If the process dies during step 2, the .tmp file is left behind but the valid .json is unchanged. If it dies during step 3, the rename either completes or doesn't — the file is never partially written.
Before resuming, the current config file is hashed and compared against the checkpoint's stored config_hash. If they differ, the resume is rejected:
Error: checkpoint config hash mismatch
checkpoint: sha256:abc123...
current: sha256:def456...
Suggestion: Delete the checkpoint file and re-run without --resume
This prevents incorrect results from resuming with different databases, batch sizes, or ignore rules than the original run.
# Original run interrupted
deepdiff-db gen-pack --config deepdiffdb.yaml
# ... process killed after 8 of 20 tables ...
# Resume from where it stopped
deepdiff-db gen-pack --config deepdiffdb.yaml --resume# Apply interrupted mid-way
deepdiff-db apply --pack ./diff-output/migration_pack.sql
# ... process killed after 450 of 1200 statements ...
# Resume
deepdiff-db apply --pack ./diff-output/migration_pack.sql --resume| Operation | Resumed from |
|---|---|
gen-pack (hash phase) |
Last completed table — already-hashed tables are loaded from checkpoint |
gen-pack (pack phase) |
Last completed table — already-generated statements are loaded |
apply |
Statement count — first N statements are skipped |
Checkpoints expire after 24 hours by default. An expired checkpoint will not be used on resume and must be deleted manually.
This prevents accidentally resuming from a very old checkpoint against a database that has changed significantly since the original run.
If you want to discard a checkpoint and start fresh:
rm ./diff-output/.deepdiffdb_checkpoint.jsonThen re-run without --resume.
The checkpoint manager is passed through all operations via context.Context:
ctx = checkpoint.ToContext(ctx, mgr)
// Later, in any nested function:
mgr := checkpoint.FromContext(ctx)
mgr.Update(func(s *State) {
s.HashTableState.CompletedTables = append(s.HashTableState.CompletedTables, tableName)
})Home · Problem Statement · Architecture · Data Flow · CLI Reference · Configuration · Contributing
DeepDiff DB — safe, deterministic database synchronization