Skip to content

etsabary/discogs_data_parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Discogs Data Parser

This project processes large Discogs XML release dumps into structured JSON files.

Features

  • Extracts essential release data (title, artists, labels, genres, tracks, etc.)
  • Handles malformed or incomplete records gracefully
  • Logs rejected records with reasons
  • Utilizes multiprocessing for efficient processing

Usage

Run the script with:

python prepare9D.py

Ensure that the Discogs XML file (e.g., discogs_YYYYMMDD_releases.xml.gz) is present in the script directory.

Output

  • Processed JSON files containing essential release data.
  • A log file rejected_log.txt capturing rejection reasons.
  • A rejected_discogs_data folder with samples of rejected records.

License

This project is licensed under the MIT License.

About

Python-based tool that parses Discogs XML release dumps into clean, structured JSON datasets

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published