This repository contains an implementation of several architectures and training configs to solve BSS (blind speech separation) problem.
Our best weights for ConvTasNet available: here
Report on the completed work: here
Installation may depend on your task. The general steps are the following:
-
(Optional) Create and activate new environment using
venv(+pyenv).# create env ~/.pyenv/versions/PYTHON_VERSION/bin/python3 -m venv project_env # alternatively, using default python version python3 -m venv project_env # activate env source project_env
-
Install all required packages
pip install -r requirements.txt
-
Install
pre-commit:pre-commit install
To train a model, run the following command:
python3 train.py -cn=convtasnet HYDRA_CONFIG_ARGUMENTSWhere HYDRA_CONFIG_ARGUMENTS are optional arguments.
To run inference (evaluate the model or save predictions):
python3 inference.py -cn=inference.yamlIn the inference.yaml you can specify:
model- name of model config and model itselfdatasets.test.dataset_path- path to theCustomDirDatasetof the following format:
NameOfTheDirectoryWithUtterances
├── audio
│ ├── mix
│ │ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ │ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ │ .
│ │ .
│ │ .
│ │ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│ ├── s1 # ground truth for the speaker s1, may not be given
│ │ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ │ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ │ .
│ │ .
│ │ .
│ │ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
│ └── s2 # ground truth for the speaker s2, may not be given
│ ├── FirstSpeakerID1_SecondSpeakerID1.wav # also may be flac or mp3
│ ├── FirstSpeakerID2_SecondSpeakerID2.wav
│ .
│ .
│ .
│ └── FirstSpeakerIDn_SecondSpeakerIDn.wav
└── mouths # contains video information for all speakers
├── FirstOrSecondSpeakerID1.npz # npz mouth-crop
├── FirstOrSecondSpeakerID2.npz
.
.
.
└── FirstOrSecondSpeakerIDn.npz
dataloader.batch_size- batch sizeinferencer.save_path- path to directory where to save predictions (in subfolderss1ands2with[name].wavfiles). If not absolute path is provided, they will be stored in./data/saved/[save_path]folder. By default,save_path=inference_result.inferencer.from_pretrained- path to the file with model weights
To calculate metrics:
python3 metrics_eval.py -cn=metrics_eval.yamlIn the metrics_eval.yaml you can specify:
metrics- metrics config name (e.g.audio_metrics- "SI-SNRi", "SDRi") Indefaults.metrics.inference._targetcan bePESQ, SDRi, SI-SNRi, STOI.pred_path- path to the directory with predictions (in subfolderss1ands2with[name].wavfiles).true_path- path to the directory with true sources (in subfolderss1ands2with[name].wavfiles).show_all- ifTrue, will show metrics for each file, otherwise will show mean value.
This repository is based on a heavily modified fork of pytorch-template and asr_project_template repositories.