https://guotong1988.github.io/core_research/2024/02/01/auto-re-label/
Step-1, Train the model on origin training dataset, train.py
Step-2, Predict the training/dev datasets, predict.py
Step-3, Prepare the candidate training datasets, get_dataset_list.py
Step-4, Find the best dataset by dev accuracy, explore_train.py
transformers 4.38.2 or 4.26.1
torch 2.2.1 or 1.11.0
scikit-learn 1.3.2
datasets 2.18.0
accelerate 0.27.2
Label Error Correction With Human Labor: The Re-Label Method For Data-Centric Machine Learning
Using LLMs To Re-Label: A Unified Framework for NLP Tasks by ReLabel Method
The methods proposed in this project (and its related works) can be applied to all manually annotated (or dataset annotated by LLMs) machine learning / deep learning tasks.
Not only NLP tasks, but can also be efficiently extended to CV(computer vision) tasks, ASR(speech recognition) tasks, TTS(text-to-speech) tasks, and more.

