Computes a compact delta between two files so the new file can be reconstructed from the old file plus the (small) delta. Supports in-place reconstruction — the new version can be rebuilt directly in the buffer holding the old version, with no scratch space.
Two implementations — Python and Rust — producing identical binary deltas. The Rust version is ~20x faster thanks to O(1) rolling hash updates and zero-copy mmap I/O.
Implements the algorithms from two papers:
-
M. Ajtai, R. Burns, R. Fagin, D.D.E. Long, L. Stockmeyer. Compactly Encoding Unstructured Inputs with Differential Compression. Journal of the ACM, 49(3):318-367, May 2002.
-
R.C. Burns, D.D.E. Long, L. Stockmeyer. In-Place Reconstruction of Version Differences. IEEE Transactions on Knowledge and Data Engineering, 15(4):973-984, Jul/Aug 2003.
Both papers are in pubs/.
cd src/python
# Encode
python3 delta.py encode onepass old.bin new.bin delta.bin
# Decode
python3 delta.py decode old.bin delta.bin recovered.bincd src/rust/delta
cargo build --release
# Encode
cargo run --release -- encode onepass old.bin new.bin delta.bin
# Decode
cargo run --release -- decode old.bin delta.bin recovered.binThe two implementations produce byte-identical delta files. Encode with one, decode with the other.
| Algorithm | Time | Space | Best for |
|---|---|---|---|
onepass |
O(n) | O(1) | General use — fast, good compression |
correcting |
~O(n) | O(q) | Data with rearranged/moved blocks |
greedy |
O(n^2) | O(n) | Smallest possible delta (optimal) |
All three use Karp-Rabin rolling hashes with a Mersenne prime (2^61-1)
for fingerprinting and a polynomial base of 263 for good bit mixing.
Hash tables are auto-sized based on input length (--table-size acts
as a floor). Use --verbose to see hash table sizing and match
statistics on stderr.
Standard deltas require a separate output buffer. In-place mode produces a delta that reconstructs the new version directly in the buffer holding the reference, with no scratch space. Useful for space-constrained devices (embedded systems, IoT) where you want to apply a software update without needing 2x the file size in memory.
# Produce an in-place delta
delta encode onepass old.bin new.bin patch.delta --inplace
# Decode works the same (auto-detects format)
delta decode old.bin patch.delta recovered.binThe in-place converter builds a CRWI (Conflicting Read-Write Interval) digraph where an edge from command i to j means "i must execute before j" (because i reads from a region j will overwrite). A topological sort gives a safe execution order. Cycles are broken by converting copy commands to literal add commands.
Cycle-breaking policies:
localmin(default) — converts the smallest copy in each cycle. Minimizes compression loss.constant— converts any vertex in the cycle. Slightly faster, marginally worse compression.
Unified format used by both implementations:
Header (9 bytes):
DLT\x01 4-byte magic
flags 1 byte (bit 0 = in-place)
version_size uint32 big-endian
Commands (repeated):
type 0 END (1 byte)
type 1 COPY: src(u32) dst(u32) len(u32) — 13 bytes
type 2 ADD: dst(u32) len(u32) data — 9 + len bytes
All multi-byte integers are big-endian. Commands carry explicit
source and destination offsets. The flags byte distinguishes
standard deltas (flag 0x00) from in-place deltas (flag 0x01).
# Python — 154 tests
cd src/python
python3 -m unittest test_delta -v
# Rust — 48 tests (13 unit + 35 integration)
cd src/rust/delta
cargo testTests cover all three algorithms, binary round-trips, paper examples,
edge cases (empty/identical/completely different files), in-place
reconstruction with both cycle-breaking policies, variable-length block
transpositions (8 blocks, 200-5000 bytes each, permuted, reversed,
interleaved with junk, duplicated, halved and scrambled, plus 20
random subset trials), checkpointing correctness, and cross-language
compatibility. A kernel tarball benchmark (tests/kernel-delta-test.sh)
exercises onepass and correcting on ~871 MB inputs.
src/
python/
delta.py Library + CLI
test_delta.py Test suite (154 tests)
rust/delta/
src/
lib.rs Re-exports
main.rs CLI (clap)
types.rs Command, PlacedCommand, constants
hash.rs Karp-Rabin rolling hash (13 unit tests)
encoding.rs Unified binary format
apply.rs place_commands, apply_placed_to, apply_placed_inplace_to
inplace.rs CRWI digraph, topological sort, cycle breaking
algorithm/
mod.rs Dispatch
greedy.rs O(n^2) optimal
onepass.rs O(n) linear
correcting.rs 1.5-pass with tail correction
tests/
integration.rs 35 integration tests
Cargo.toml
tests/
kernel-delta-test.sh Kernel tarball benchmark (onepass + correcting)
pubs/
ajtai-et-al-jacm-2002-differential-compression.pdf
burns-et-al-tkde-2003-inplace-reconstruction.pdf
README.md
HOWTO.md
@article{ajtai2002differential,
author = {Ajtai, Mikl\'{o}s and Burns, Randal and Fagin, Ronald
and Long, Darrell D. E. and Stockmeyer, Larry},
title = {Compactly Encoding Unstructured Inputs with
Differential Compression},
journal = {Journal of the ACM},
volume = {49},
number = {3},
pages = {318--367},
year = {2002},
month = may,
publisher = {ACM},
doi = {10.1145/567112.567114},
}
@article{burns2003inplace,
author = {Burns, Randal C. and Long, Darrell D. E.
and Stockmeyer, Larry},
title = {In-Place Reconstruction of Version Differences},
journal = {IEEE Transactions on Knowledge and Data Engineering},
volume = {15},
number = {4},
pages = {973--984},
year = {2003},
month = {jul/aug},
publisher = {IEEE},
doi = {10.1109/TKDE.2003.1209013},
}
@article{karp1987efficient,
author = {Karp, Richard M. and Rabin, Michael O.},
title = {Efficient Randomized Pattern-Matching Algorithms},
journal = {IBM Journal of Research and Development},
volume = {31},
number = {2},
pages = {249--260},
year = {1987},
month = mar,
publisher = {IBM},
doi = {10.1147/rd.312.0249},
}
@article{rabin1980probabilistic,
author = {Rabin, Michael O.},
title = {Probabilistic Algorithm for Testing Primality},
journal = {Journal of Number Theory},
volume = {12},
number = {1},
pages = {128--138},
year = {1980},
month = feb,
publisher = {Elsevier},
doi = {10.1016/0022-314X(80)90084-0},
}
@article{levenshtein1966binary,
author = {Levenshtein, Vladimir I.},
title = {Binary Codes Capable of Correcting Deletions,
Insertions, and Reversals},
journal = {Soviet Physics Doklady},
volume = {10},
number = {8},
pages = {707--710},
year = {1966},
}
@article{miller1985file,
author = {Miller, Webb and Myers, Eugene W.},
title = {A File Comparison Program},
journal = {Software---Practice and Experience},
volume = {15},
number = {11},
pages = {1025--1040},
year = {1985},
publisher = {Wiley},
doi = {10.1002/spe.4380151102},
}
@inproceedings{reichenberger1991delta,
author = {Reichenberger, Christoph},
title = {Delta Storage for Arbitrary Non-Text Files},
booktitle = {Proceedings of the 3rd International Workshop on
Software Configuration Management},
pages = {144--152},
year = {1991},
publisher = {ACM},
doi = {10.1145/111062.111080},
}
@article{tichy1984string,
author = {Tichy, Walter F.},
title = {The String-to-String Correction Problem with Block Moves},
journal = {ACM Transactions on Computer Systems},
volume = {2},
number = {4},
pages = {309--321},
year = {1984},
month = nov,
publisher = {ACM},
doi = {10.1145/357401.357404},
}
@article{wagner1974string,
author = {Wagner, Robert A. and Fischer, Michael J.},
title = {The String-to-String Correction Problem},
journal = {Journal of the ACM},
volume = {21},
number = {1},
pages = {168--173},
year = {1974},
month = jan,
publisher = {ACM},
doi = {10.1145/321796.321811},
}The Karp-Rabin paper introduces the rolling hash / fingerprinting technique used by all three differencing algorithms. Wagner and Fischer formalized string-to-string correction (edit distance). Tichy extended it to block moves — the model solved by the algorithms here. Reichenberger and Miller-Myers are the prior O(n^2) optimal algorithms that Ajtai et al. improve upon. Rabin's paper describes the Miller-Rabin probabilistic primality test used for hash table auto-sizing.