Eval/tts multilingual parakeet metrics by quapham · Pull Request #15826 · NVIDIA-NeMo/Speech

quapham · 2026-06-24T04:47:08Z

Important

The Update branch button must only be pressed in very rare occassions.
An outdated branch is never blocking the merge of a PR.
Please reach out to the automation team before pressing that button.

What does this PR do ?

Adds multilingual MagpieTTS evaluation support by enabling Japanese Katakana CER, multilingual Parakeet ASR target-language routing

Collection: [TTS]

Changelog

Add Japanese Katakana CER for MagpieTTS evaluation.
- Computes CER on Katakana readings of Japanese reference and ASR hypothesis.
- Keeps Katakana CER guarded to Japanese datasets only.
- Adds filewise and aggregate Katakana CER outputs.
- Add multilingual Parakeet prompt ASR support for MagpieTTS evaluation.
  - Supports local .nemo ASR checkpoints for multilingual evaluation.
  - Maps eval language metadata to Parakeet target_lang.
  - Keeps Whisper / existing ASR behavior as fallback where applicable.

Usage

TESTSET_ROOT=/path/to/Magpietts_testset

python examples/tts/magpietts_inference.py \
  --hparams_files /path/to/hparams.yaml \
  --checkpoint_files /path/to/checkpoint.ckpt \
  --codecmodel_path /path/to/codec_model.nemo \
  --datasets_json_path "$TESTSET_ROOT/evalset.json" \
  --root "$TESTSET_ROOT" \
  --datasets ja_JP_jvs_jsut \
  --out_dir /path/to/eval_outputs/ja_JP_jvs_jsut \
  --run_evaluation \
  --use_local_transformer \
  --asr_model_name /path/to/multilingual_parakeet.nemo

# Add a code snippet demonstrating how to use this

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

[ x] Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

[ x] New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

Related to # (issue)

Adds a reading-based CER for Japanese, computed on the Katakana reading (via pyopenjtalk g2p) of both reference and ASR hypothesis. Robust to kanji/kana spelling differences that inflate raw character CER. - text_to_katakana(): lazy-imports pyopenjtalk, returns '' if unavailable (graceful no-op for non-ja or environments without the dep). - katakana_cer / gt_katakana / pred_katakana computed only when language=='ja', saved per-utterance in filewise metrics. - katakana_cer_filewise_avg + katakana_cer_cumulative aggregated globally (only emitted for ja datasets), added to the results CSV header/rows. Signed-off-by: quanpham <youngkwan199@gmail.com>

Add target language mapping for multilingual Parakeet prompt ASR checkpoints during MagpieTTS evaluation. Local .nemo ASR models can now be used for non-English evalsets, while Whisper remains the fallback when no NeMo ASR model is provided. Japanese Katakana CER remains guarded to Japanese datasets only. Signed-off-by: quanpham <youngkwan199@gmail.com>

Add --root to MagpieTTS inference so evalset manifest_path and audio_dir entries can remain relative. Also use the evalset language field for evaluation, preserving whisper_language as a legacy fallback, which is required for multilingual Parakeet target_lang selection. Signed-off-by: quanpham <youngkwan199@gmail.com>

copy-pr-bot · 2026-06-24T04:47:12Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

+            if gt_audio_text is not None:
+                gt_audio_text = gt_audio_text.replace(" ", "")
+        else:
+            pred_text = pred_text


+                gt_audio_text = gt_audio_text.replace(" ", "")
+        else:
+            pred_text = pred_text
+            gt_text = gt_text


+    # Remove Hindi-specific punctuation (danda, double danda)
+    input_text = re.sub(r'[।॥॰]', '', input_text)
+    # Remove Mandarin-specific punctuation
+    input_text = re.sub(r'[，。！？；：""''（）【】《》〈〉「」『』、…·～—–\u3000]', '', input_text)


rlangman · 2026-06-24T21:40:54Z

-    # Validate that all evaluation datasets exist
-    for dataset_name, info in dataset_meta_info.items():
-        manifest_path = Path(info["manifest_path"])
-        audio_dir = Path(info["audio_dir"])
-
-        if dataset_base_path:
-            # Replace relative paths with absolute paths where appropriate
-            if not manifest_path.is_absolute():
-                manifest_path = dataset_base_path / manifest_path
-                info["manifest_path"] = str(manifest_path)
-
-            if not audio_dir.is_absolute():
-                audio_dir = dataset_base_path / audio_dir
-                info["audio_dir"] = str(audio_dir)
-
-        if not manifest_path.exists():
-            raise ValueError(f"Manifest does not exist for dataset {dataset_name}: {manifest_path}")
-
-        if not audio_dir.exists():
-            raise ValueError(f"Audio directory does not exist for dataset {dataset_name}: {audio_dir}")


Why did we delete and rewrite this code? This existing implementing looks more readable and has better error handling.

rlangman · 2026-06-24T21:42:29Z

-        type=Path,
-        default=None,
-        help='Optional base path that paths in the "datasets_json_path" file are relative to',
+        '--root',


'datasets_base_path' is a more specific name than 'root'. We could rename it 'dataset_root_path' if we think that is clearer.

rlangman · 2026-06-24T21:48:42Z

    logging.info(f"Doing batched ASR transcription with batch size {asr_batch_size}...")
-
    # Transcribe predicted audios
-    text_processor = get_text_processor(language)


Please implement the new text processing within the get_text_processor(language) function, as new implementations of TextProcesor. https://github.com/NVIDIA-NeMo/NeMo/blob/main/nemo/collections/tts/parts/utils/tts_dataset_utils.py#L881

rlangman · 2026-06-24T21:55:20Z


        gt_text = gt_texts_processed[ridx]

+        if language in ("zh", "zh-CN", "zh-TW"):


We should not be representing both languages (like "zh") and locales (like "zh-CN") interchangeably. I think referring to locales as 'language' is a misnomer and it is going to be very confusing. The language and locale should either be passed around as separate arguments/variables, or we should refactor the code and configs to only use locales.

quapham added 3 commits June 24, 2026 03:50

github-actions Bot added the TTS label Jun 24, 2026

github-advanced-security AI found potential problems Jun 24, 2026

View reviewed changes

rlangman reviewed Jun 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eval/tts multilingual parakeet metrics#15826

Eval/tts multilingual parakeet metrics#15826
quapham wants to merge 3 commits into
NVIDIA-NeMo:mainfrom
quapham:eval/tts-multilingual-parakeet-metrics

quapham commented Jun 24, 2026

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

rlangman Jun 24, 2026

Uh oh!

rlangman Jun 24, 2026

Uh oh!

rlangman Jun 24, 2026

Uh oh!

rlangman Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		gt_text = gt_texts_processed[ridx]

		if language in ("zh", "zh-CN", "zh-TW"):

Uh oh!

Conversation

quapham commented Jun 24, 2026

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 24, 2026

Uh oh!

rlangman Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

rlangman Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

rlangman Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

rlangman Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants