Skip to content

Checkpoint and Resume

iamvirul edited this page Mar 21, 2026 · 1 revision

Checkpoint & Resume

DeepDiff DB checkpoints progress during long-running operations so they can be safely resumed after interruption — network failure, process kill, machine restart, or timeout.


How It Works

Checkpoint File

A checkpoint file (.deepdiffdb_checkpoint.json) is created in the output directory at the start of any gen-pack or apply operation. It is deleted on successful completion.

{
  "version": "1.0",
  "operation": "hash_table",
  "config_hash": "sha256:abc123...",
  "output_dir": "./diff-output",
  "created_at": "2024-01-15T10:00:00Z",
  "last_updated": "2024-01-15T10:14:22Z",
  "hash_table_state": {
    "completed_tables": ["users", "products", "categories"],
    "table_hashes": {
      "users":      { "1": "aaa...", "2": "bbb..." },
      "products":   { "10": "ccc..." },
      "categories": { "1": "ddd..." }
    }
  }
}

State Types

type State struct {
    Version     string
    Operation   string        // "hash_table", "generate_pack", "apply_pack"
    ConfigHash  string        // SHA-256 of current config file
    OutputDir   string
    CreatedAt   time.Time
    LastUpdated time.Time

    HashTableState     *HashTableState
    GeneratePackState  *GeneratePackState
    ApplyPackState     *ApplyPackState
}

type HashTableState struct {
    CompletedTables []string
    TableHashes     map[string]map[string]string
}

type GeneratePackState struct {
    CompletedTables    []string
    GeneratedStatements []string
}

type ApplyPackState struct {
    ExecutedStatements int
}

Atomic Writes

Checkpoint files are written atomically to prevent corruption if the process is killed mid-write:

1. Serialize state to JSON
2. Write to <outputDir>/.deepdiffdb_checkpoint.json.tmp
3. os.Rename(.tmp → .json)   ← atomic on POSIX filesystems

If the process dies during step 2, the .tmp file is left behind but the valid .json is unchanged. If it dies during step 3, the rename either completes or doesn't — the file is never partially written.


Config Hash Validation

Before resuming, the current config file is hashed and compared against the checkpoint's stored config_hash. If they differ, the resume is rejected:

Error: checkpoint config hash mismatch
  checkpoint: sha256:abc123...
  current:    sha256:def456...
Suggestion: Delete the checkpoint file and re-run without --resume

This prevents incorrect results from resuming with different databases, batch sizes, or ignore rules than the original run.


Resume Usage

# Original run interrupted
deepdiff-db gen-pack --config deepdiffdb.yaml
# ... process killed after 8 of 20 tables ...

# Resume from where it stopped
deepdiff-db gen-pack --config deepdiffdb.yaml --resume
# Apply interrupted mid-way
deepdiff-db apply --pack ./diff-output/migration_pack.sql
# ... process killed after 450 of 1200 statements ...

# Resume
deepdiff-db apply --pack ./diff-output/migration_pack.sql --resume

What Gets Skipped on Resume

Operation Resumed from
gen-pack (hash phase) Last completed table — already-hashed tables are loaded from checkpoint
gen-pack (pack phase) Last completed table — already-generated statements are loaded
apply Statement count — first N statements are skipped

Expiration

Checkpoints expire after 24 hours by default. An expired checkpoint will not be used on resume and must be deleted manually.

This prevents accidentally resuming from a very old checkpoint against a database that has changed significantly since the original run.


Manual Cleanup

If you want to discard a checkpoint and start fresh:

rm ./diff-output/.deepdiffdb_checkpoint.json

Then re-run without --resume.


Context Propagation

The checkpoint manager is passed through all operations via context.Context:

ctx = checkpoint.ToContext(ctx, mgr)

// Later, in any nested function:
mgr := checkpoint.FromContext(ctx)
mgr.Update(func(s *State) {
    s.HashTableState.CompletedTables = append(s.HashTableState.CompletedTables, tableName)
})

Clone this wiki locally