This project processes large Discogs XML release dumps into structured JSON files.
- Extracts essential release data (title, artists, labels, genres, tracks, etc.)
- Handles malformed or incomplete records gracefully
- Logs rejected records with reasons
- Utilizes multiprocessing for efficient processing
Run the script with:
python prepare9D.pyEnsure that the Discogs XML file (e.g., discogs_YYYYMMDD_releases.xml.gz) is present in the script directory.
- Processed JSON files containing essential release data.
- A log file
rejected_log.txtcapturing rejection reasons. - A
rejected_discogs_datafolder with samples of rejected records.
This project is licensed under the MIT License.