Duplicate Identification in restaurants.tsv data set
The Python script analyzes the provided .tsv file and filters duplicates before comparing its results to the gold standard and saving the cleaned data set into a new .tsv file.
To run the script, the "restaurants.tsv" and "restaurants_DPL.tsv" files must be present in the same directory.
restaurants.tsv: https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/projekte/repeatability/Restaurants/mdedup/restaurants.tsv
restaurants_DPL.tsv: https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/projekte/repeatability/Restaurants/restaurants_DPL.tsv