Eval/tts multilingual parakeet metrics#15826
Conversation
Adds a reading-based CER for Japanese, computed on the Katakana reading (via pyopenjtalk g2p) of both reference and ASR hypothesis. Robust to kanji/kana spelling differences that inflate raw character CER. - text_to_katakana(): lazy-imports pyopenjtalk, returns '' if unavailable (graceful no-op for non-ja or environments without the dep). - katakana_cer / gt_katakana / pred_katakana computed only when language=='ja', saved per-utterance in filewise metrics. - katakana_cer_filewise_avg + katakana_cer_cumulative aggregated globally (only emitted for ja datasets), added to the results CSV header/rows. Signed-off-by: quanpham <youngkwan199@gmail.com>
Add target language mapping for multilingual Parakeet prompt ASR checkpoints during MagpieTTS evaluation. Local .nemo ASR models can now be used for non-English evalsets, while Whisper remains the fallback when no NeMo ASR model is provided. Japanese Katakana CER remains guarded to Japanese datasets only. Signed-off-by: quanpham <youngkwan199@gmail.com>
Add --root to MagpieTTS inference so evalset manifest_path and audio_dir entries can remain relative. Also use the evalset language field for evaluation, preserving whisper_language as a legacy fallback, which is required for multilingual Parakeet target_lang selection. Signed-off-by: quanpham <youngkwan199@gmail.com>
| if gt_audio_text is not None: | ||
| gt_audio_text = gt_audio_text.replace(" ", "") | ||
| else: | ||
| pred_text = pred_text |
| gt_audio_text = gt_audio_text.replace(" ", "") | ||
| else: | ||
| pred_text = pred_text | ||
| gt_text = gt_text |
| # Remove Hindi-specific punctuation (danda, double danda) | ||
| input_text = re.sub(r'[।॥॰]', '', input_text) | ||
| # Remove Mandarin-specific punctuation | ||
| input_text = re.sub(r'[,。!?;:""''()【】《》〈〉「」『』、…·~—–\u3000]', '', input_text) |
| # Validate that all evaluation datasets exist | ||
| for dataset_name, info in dataset_meta_info.items(): | ||
| manifest_path = Path(info["manifest_path"]) | ||
| audio_dir = Path(info["audio_dir"]) | ||
|
|
||
| if dataset_base_path: | ||
| # Replace relative paths with absolute paths where appropriate | ||
| if not manifest_path.is_absolute(): | ||
| manifest_path = dataset_base_path / manifest_path | ||
| info["manifest_path"] = str(manifest_path) | ||
|
|
||
| if not audio_dir.is_absolute(): | ||
| audio_dir = dataset_base_path / audio_dir | ||
| info["audio_dir"] = str(audio_dir) | ||
|
|
||
| if not manifest_path.exists(): | ||
| raise ValueError(f"Manifest does not exist for dataset {dataset_name}: {manifest_path}") | ||
|
|
||
| if not audio_dir.exists(): | ||
| raise ValueError(f"Audio directory does not exist for dataset {dataset_name}: {audio_dir}") |
There was a problem hiding this comment.
Why did we delete and rewrite this code? This existing implementing looks more readable and has better error handling.
| type=Path, | ||
| default=None, | ||
| help='Optional base path that paths in the "datasets_json_path" file are relative to', | ||
| '--root', |
There was a problem hiding this comment.
'datasets_base_path' is a more specific name than 'root'. We could rename it 'dataset_root_path' if we think that is clearer.
| logging.info(f"Doing batched ASR transcription with batch size {asr_batch_size}...") | ||
|
|
||
| # Transcribe predicted audios | ||
| text_processor = get_text_processor(language) |
There was a problem hiding this comment.
Please implement the new text processing within the get_text_processor(language) function, as new implementations of TextProcesor. https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/tts/parts/utils/tts_dataset_utils.py#L881
|
|
||
| gt_text = gt_texts_processed[ridx] | ||
|
|
||
| if language in ("zh", "zh-CN", "zh-TW"): |
There was a problem hiding this comment.
We should not be representing both languages (like "zh") and locales (like "zh-CN") interchangeably. I think referring to locales as 'language' is a misnomer and it is going to be very confusing. The language and locale should either be passed around as separate arguments/variables, or we should refactor the code and configs to only use locales.
Important
The
Update branchbutton must only be pressed in very rare occassions.An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.
What does this PR do ?
Adds multilingual MagpieTTS evaluation support by enabling Japanese Katakana CER, multilingual Parakeet ASR target-language routing
Collection: [TTS]
Changelog
Add Japanese Katakana CER for MagpieTTS evaluation.
- Computes CER on Katakana readings of Japanese reference and ASR hypothesis.
- Keeps Katakana CER guarded to Japanese datasets only.
- Adds filewise and aggregate Katakana CER outputs.
Usage
TESTSET_ROOT=/path/to/Magpietts_testset
# Add a code snippet demonstrating how to use thisGitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information