Skip to content

arkanjo-tool/clone-drift-dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

clone-drift-dataset

Synthetic dataset for analyzing temporal divergence in code clones.

This dataset is inspired by real-world patterns observed in large codebases such as the Linux kernel.

In subsystems like device drivers, it is common to copy existing implementations when supporting new hardware generations. Over time, these copies evolve independently, leading to inconsistencies, outdated logic, and subtle bugs.

This repository attempts to reproduce that evolution pattern in a controlled environment, where:

  • Code is duplicated to simulate reuse across "generations"
  • One version evolves while others remain unchanged
  • Divergence accumulates over time

Notes on Git History

The commit history of this repository may be rewritten (e.g., using git commit --amend or rebases) to refine test scenarios.

Therefore, commit hashes should not be considered stable identifiers. For reproducibility, prefer tagged versions.

About

Synthetic dataset for analyzing temporal divergence in code clones

Resources

Stars

Watchers

Forks

Contributors

Languages