Automated Pipeline for High-Purity Single-Event Audio Mining
The Single-Event Data Collection Pipeline is an automated framework for mining high-purity single-event audio segments from weakly-labeled datasets. It integrates multimodal AI models (Qwen3-Omni) and acoustic tagging (AudioTag) to eliminate event co-occurrence and weak label noise, producing semantically consistent training data for Universal Sound Separation.
Key Features:
- Coarse-to-Fine Labeling Strategy: Leverages ontology topology for precise label alignment - first predicting coarse-grained parent nodes via AudioTag, then refining to specific leaf nodes using Qwen3-Omni with restricted candidate subsets
- Multi-stage filtering for semantic-acoustic alignment
- Produces 44.1kHz high-fidelity audio outputs
pipeline/
├── code/
│ ├── 01_audio_chunking.py # Chunk long audio into 10s segments with overlap
│ ├── 02_filter_single_label.py # Filter samples with single label
│ ├── 03_filter_single_event_qwen.py # Qwen3-Omni based single-event audio filtering
│ ├── 04_audioset_label_audiotag.py # AudioSet ontology tagging using AudioTag model
│ ├── 05_leaf_label_qwen.py # Leaf-level label refinement with Qwen3-Omni
│ └── 06_superres_apollo.py # Audio super-resolution to 44.1kHz using Apollo
├── data/
│ ├── 01_source_data/
│ │ └── example.json # Example input format for raw data
│ ├── 02_single_label_data/
│ ├── 03_single_event_data/
│ ├── 04_audiotagged_data/
│ ├── 05_leaf_label_data/
│ └── 06_super_res_data/
├── ontology/
│ ├── audioset_ontology.json # Original AudioSet ontology
│ └── hive_ontology.json # Modified ontology for Hive dataset
├── icefall/ # AudioTag model repository
├── Apollo/ # Apollo model repository
├── requirements.txt # Pipeline dependencies
└── README.md # This file
Before running the pipeline, prepare your source audio data in JSON format and place it in the data/01_source_data/ directory.
Required JSON Format:
[
{
"text_label": ["Label1", "Label2"],
"audio_path": "/path/to/your/audio.wav"
},
{
"text_label": "SingleLabel",
"audio_path": "/path/to/another/audio.wav"
}
]Field Requirements:
text_label: Can be a string (single label) or array of strings (multiple labels)audio_path: Absolute path to the audio file
Example:
See data/01_source_data/example.json for reference format.
conda create -n hive_pipeline python=3.10
conda activate hive_pipeline
pip install -r requirements.txtNote: Steps 4 and 6 require separate environments. Refer to their respective sections for setup instructions.
Chunk long audio files into 10-second segments with 50% overlap, filtering out low-energy segments.
python code/01_audio_chunking.py \
--input_path data/01_source_data/example.json \
--output_path data/01_source_data/output.json \
--seg_output_path data/01_source_data/segments \
--energy_threshold 0.0005Input JSON Format (see data/01_source_data/example.json):
[
{
"text_label": ["Label1", "Label2"],
"audio_path": "/path/to/audio.wav"
}
]Filter samples containing only single label.
Note: Modify input_dir and output_dir in the script before running:
input_dir: Path to Step 1 output directory (e.g.,data/01_source_data/)output_dir: Output directory (e.g.,data/02_single_label_data/)
python code/02_filter_single_label.pyExample Output Format (see data/02_single_label_data/example.json):
[
{
"text_label": "Label",
"audio_path": "/path/to/segments/sample.wav"
}
]- Samples with multiple labels are filtered out
text_labelis now a single string- Output file:
data/02_single_label_data/output.json
Use Qwen3-Omni model to filter audio which contains only one type of sound event.
Model Download:
Download the Qwen3-Omni model from Qwen3-Omni-7B Hugging Face
python code/03_filter_single_event_qwen.py \
--model_path /path/to/Qwen3-Omni-7B \
--audios_path data/02_single_label_data/output.json \
--output_path data/03_single_event_data/output.json \
--batch_size 64Example Output Format (see data/03_single_event_data/example.json):
- Same format as Step 2, only verified single-event samples remain
- Acoustically pure single-event segments pass filtering
Assign AudioSet ontology labels using the AudioTag acoustic tagging model (icefall implementation).
Environment Setup:
Refer to the installation instructions in the icefall repository:
- Repository:
icefall/egs/audioset/AT/zipformer/ - Follow the environment setup guide provided by icefall
Model Download:
Download the AudioTag model checkpoint and label dictionary from Hugging Face:
- Checkpoint:
exp/pretrained.pt - Label dictionary:
data/class_labels_indices.csv
Setup:
cp code/04_audioset_label_audiotag.py icefall/egs/audioset/AT/
cd icefall/egs/audioset/AT/Run:
python 04_audioset_label_audiotag.py \
--checkpoint /path/to/audiotag_checkpoint.pt \
--label-dict /path/to/class_labels_indices.csv \
--input_path ../../../../../data/03_single_event_data/output.json \
--output_path ../../../../../data/04_audiotagged_data/output.json \
--sample-rate 48000Note:
- Adjust
--sample-rateto match your dataset's actual sample rate - Audio with mismatched sample rates will be automatically resampled
Example Output Format (see data/04_audiotagged_data/example.json):
- Updates
text_labelto AudioSet ontology categories - Same JSON structure maintained
Refine labels to leaf nodes in AudioSet ontology using Qwen3-Omni with confusion sets.
python code/05_leaf_label_qwen.py \
--model_path /path/to/Qwen3-Omni-7B \
--ontology_path ontology/audioset_ontology.json \
--modified_ontology_path ontology/hive_ontology.json \
--data_path data/04_audiotagged_data/output.json \
--output_path data/05_leaf_label_data/output.json \
--error_set_path data/05_leaf_label_data/error_set.json \
--batch_size 32Example Output Format (see data/05_leaf_label_data/example.json):
- Updates
text_labelto refined leaf node categories - Same JSON structure maintained
Upsample audio to 44.1kHz using Apollo super-resolution model for high-purity output.
Environment Setup:
Refer to the installation instructions in the Apollo repository
Model Download:
Download the Apollo checkpoint from Apollo Universal Model Release
- Place the downloaded checkpoint in
Apollo/directory
Setup:
cp code/06_superres_apollo.py Apollo/
cd Apollo/Run:
python 06_superres_apollo.py \
--input_json ../data/05_leaf_label_data/output.json \
--output_json ../data/06_super_res_data/output.json \
--output_audio_dir ../data/06_super_res_data/audio \
--config_path /path/to/apollo_config.yaml \
--checkpoint_path /path/to/apollo_checkpoint.pth \
--batch_size 16Example Output Format (see data/06_super_res_data/example.json):
- Updates
audio_pathto super-resolved audio files (44.1kHz) text_labelremains unchanged- Final high-purity dataset ready for training