Synthetic dataset for analyzing temporal divergence in code clones.
This dataset is inspired by real-world patterns observed in large codebases such as the Linux kernel.
In subsystems like device drivers, it is common to copy existing implementations when supporting new hardware generations. Over time, these copies evolve independently, leading to inconsistencies, outdated logic, and subtle bugs.
This repository attempts to reproduce that evolution pattern in a controlled environment, where:
- Code is duplicated to simulate reuse across "generations"
- One version evolves while others remain unchanged
- Divergence accumulates over time
The commit history of this repository may be rewritten (e.g., using git commit --amend or rebases) to refine test scenarios.
Therefore, commit hashes should not be considered stable identifiers. For reproducibility, prefer tagged versions.