Data Wrangling

Duplicate Identification in restaurants.tsv data set

About

The Python script analyzes the provided .tsv file and filters duplicates before comparing its results to the gold standard and saving the cleaned data set into a new .tsv file.

Usage

To run the script, the "restaurants.tsv" and "restaurants_DPL.tsv" files must be present in the same directory.

restaurants.tsv: https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/projekte/repeatability/Restaurants/mdedup/restaurants.tsv

restaurants_DPL.tsv: https://hpi.de/fileadmin/user_upload/fachgebiete/naumann/projekte/repeatability/Restaurants/restaurants_DPL.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Wrangling

About

Usage

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Data Wrangling

About

Usage