Checkpoint and Resume

Checkpoint & Resume

DeepDiff DB checkpoints progress during long-running operations so they can be safely resumed after interruption — network failure, process kill, machine restart, or timeout.

How It Works

Checkpoint File

A checkpoint file (.deepdiffdb_checkpoint.json) is created in the output directory at the start of any gen-pack or apply operation. It is deleted on successful completion.

{
  "version": "1.0",
  "operation": "hash_table",
  "config_hash": "sha256:abc123...",
  "output_dir": "./diff-output",
  "created_at": "2024-01-15T10:00:00Z",
  "last_updated": "2024-01-15T10:14:22Z",
  "hash_table_state": {
    "completed_tables": ["users", "products", "categories"],
    "table_hashes": {
      "users":      { "1": "aaa...", "2": "bbb..." },
      "products":   { "10": "ccc..." },
      "categories": { "1": "ddd..." }
    }
  }
}

State Types

type State struct {
    Version     string
    Operation   string        // "hash_table", "generate_pack", "apply_pack"
    ConfigHash  string        // SHA-256 of current config file
    OutputDir   string
    CreatedAt   time.Time
    LastUpdated time.Time

    HashTableState     *HashTableState
    GeneratePackState  *GeneratePackState
    ApplyPackState     *ApplyPackState
}

type HashTableState struct {
    CompletedTables []string
    TableHashes     map[string]map[string]string
}

type GeneratePackState struct {
    CompletedTables    []string
    GeneratedStatements []string
}

type ApplyPackState struct {
    ExecutedStatements int
}

Atomic Writes

Checkpoint files are written atomically to prevent corruption if the process is killed mid-write:

1. Serialize state to JSON
2. Write to <outputDir>/.deepdiffdb_checkpoint.json.tmp
3. os.Rename(.tmp → .json)   ← atomic on POSIX filesystems

If the process dies during step 2, the .tmp file is left behind but the valid .json is unchanged. If it dies during step 3, the rename either completes or doesn't — the file is never partially written.

Config Hash Validation

Before resuming, the current config file is hashed and compared against the checkpoint's stored config_hash. If they differ, the resume is rejected:

Error: checkpoint config hash mismatch
  checkpoint: sha256:abc123...
  current:    sha256:def456...
Suggestion: Delete the checkpoint file and re-run without --resume

This prevents incorrect results from resuming with different databases, batch sizes, or ignore rules than the original run.

Resume Usage

# Original run interrupted
deepdiff-db gen-pack --config deepdiffdb.yaml
# ... process killed after 8 of 20 tables ...

# Resume from where it stopped
deepdiff-db gen-pack --config deepdiffdb.yaml --resume

# Apply interrupted mid-way
deepdiff-db apply --pack ./diff-output/migration_pack.sql
# ... process killed after 450 of 1200 statements ...

# Resume
deepdiff-db apply --pack ./diff-output/migration_pack.sql --resume

What Gets Skipped on Resume

Operation	Resumed from
`gen-pack` (hash phase)	Last completed table — already-hashed tables are loaded from checkpoint
`gen-pack` (pack phase)	Last completed table — already-generated statements are loaded
`apply`	Statement count — first N statements are skipped

Expiration

Checkpoints expire after 24 hours by default. An expired checkpoint will not be used on resume and must be deleted manually.

This prevents accidentally resuming from a very old checkpoint against a database that has changed significantly since the original run.

Manual Cleanup

If you want to discard a checkpoint and start fresh:

rm ./diff-output/.deepdiffdb_checkpoint.json

Then re-run without --resume.

Context Propagation

The checkpoint manager is passed through all operations via context.Context:

ctx = checkpoint.ToContext(ctx, mgr)

// Later, in any nested function:
mgr := checkpoint.FromContext(ctx)
mgr.Update(func(s *State) {
    s.HashTableState.CompletedTables = append(s.HashTableState.CompletedTables, tableName)
})

Home · Problem Statement · Architecture · Data Flow · CLI Reference · Configuration · Contributing

GitHub · Issues · Releases

DeepDiff DB — safe, deterministic database synchronization

DeepDiff DB

Home

Getting Started

Architecture

Features

Development

Contributing

GitHub Issues Releases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Checkpoint and Resume

Checkpoint & Resume

How It Works

Checkpoint File

State Types

Atomic Writes

Config Hash Validation

Resume Usage

What Gets Skipped on Resume

Expiration

Manual Cleanup

Context Propagation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DeepDiff DB

Getting Started

Architecture

Features

Development

Clone this wiki locally