🎬 NemoScribe

AI Subtitle Generator — Convert Video to SRT with NVIDIA NeMo Speech-to-Text

Fast, local, GPU-accelerated automatic transcription with word-level timestamps.
A free, offline alternative to cloud captioning services.

English | 繁體中文

Quick Start • Installation • Configuration • Models • Tuning Guide

NemoScribe is a command-line speech-to-text subtitle generator that converts video files (MP4, MKV, AVI, MOV, WebM) into accurately timed SRT subtitles — entirely on your own machine. Built on the NVIDIA NeMo ASR framework with Parakeet-TDT as the default model, it delivers state-of-the-art English automatic speech recognition (ASR) with word-level timestamps, and handles long audio (up to 3 hours) through chunked inference.

video.mp4 ─► FFmpeg ─► VAD speech detection ─► NeMo ASR (Parakeet-TDT) ─► ITN / LLM correction ─► video.srt

💡 Why NemoScribe?


🏆 State-of-the-art accuracy	Parakeet-TDT-0.6B-v2 is a top-ranked English model on the HuggingFace Open ASR Leaderboard, ahead of much larger models including Whisper-large-v3
⚡ Blazing fast	Up to ~240× realtime on a consumer GPU — transcribe a full TV episode in well under a minute
🔒 100% local & private	Your audio never leaves your machine. No cloud upload, no subscription, no per-minute fees
🎯 Accurate timestamps	Word-level and segment-level timestamps straight from the model — no forced alignment hacks
🎭 Tuned for real content	VAD presets and punctuation-based segmentation optimized for movies, TV drama, and dialogue-dense audio
🤖 Optional AI cleanup	LLM post-processing (OpenAI / Anthropic) fixes character names, proper nouns, and homophones

Use cases: subtitling movies and TV shows, transcribing lectures and tutorials, captioning YouTube videos, interview and podcast transcription, accessibility captions (SDH/CC).

✨ Features

Accurate Timestamps: Word-level and segment-level timestamps from NeMo ASR models
Long Audio Support: Process videos up to 3 hours with automatic chunking
Voice Activity Detection (VAD): Filter non-speech content to reduce hallucinations
Smart Segmentation: Split audio at silence boundaries, not mid-speech
Inverse Text Normalization (ITN): Convert spoken forms to written forms ("twenty five" → "25")
LLM Post-processing: Fix character names and transcription errors using AI (OpenAI/Anthropic)
CUDA Optimized: CUDA graphs enabled by default for faster inference
Batch Processing: Process entire directories of videos

📋 Requirements

Requirement	Details
OS	Windows 10/11, Linux
Python	3.10+ (3.12 recommended, avoid 3.13)
Package Manager	uv (recommended)
CUDA Toolkit	Default cu130 (13.0). PyTorch also supports 12.6/12.8.
FFmpeg	Required for audio extraction
Hardware	NVIDIA GPU with CUDA (recommended)

FFmpeg Installation

Windows: Download from gyan.dev, extract, add bin folder to PATH
Linux: sudo apt install ffmpeg

📦 Installation

1. Install uv

# Windows
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

# Linux/macOS
curl -LsSf https://astral.sh/uv/install.sh | sh

2. Clone the Repository

git clone https://github.com/charles1018/NemoScribe.git
cd NemoScribe

3. Install Dependencies

uv sync --python 3.12

The lockfile currently resolves NeMo ASR to nemo-toolkit[asr] 2.7.3 with PyTorch 2.11 CUDA 13.0 wheels. The version constraints stay on the NeMo 2.7.x line (>=2.7.3,<2.8) for patch-level compatibility.

4. Configure CUDA (Strongly Recommended)

By default, uv sync may install CPU-only PyTorch. GPU acceleration is strongly recommended for reasonable transcription speed. The project is pre-configured to use CUDA 13.0, so GPU users only need to run uv sync.

Note: PyTorch officially supports CUDA 12.6, 12.8, and 13.0. See PyTorch Get Started for details.

If you need a different CUDA version, modify pyproject.toml:

CUDA 13.0 (Default, Recommended):

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu130"
explicit = true

[tool.uv.sources]
torch = { index = "pytorch" }
torchvision = { index = "pytorch" }
torchaudio = { index = "pytorch" }

CUDA 12.8:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu128"
explicit = true

CUDA 12.6:

[[tool.uv.index]]
name = "pytorch"
url = "https://download.pytorch.org/whl/cu126"
explicit = true

Then re-sync:

uv sync

Optional: LLM Post-processing

To enable AI-powered subtitle correction (fixes character names, proper nouns):

uv sync --extra llm

Then create a .env file with your API key:

cp .env.example .env
# Edit .env: OPENAI_API_KEY=sk-... or ANTHROPIC_API_KEY=sk-ant-...

5. Verify Setup

uv run python scripts/check_cuda.py
# Expected output: CUDA available: True

🚀 Quick Start

# Basic usage
uv run nemoscribe video_path="video.mp4"

# With VAD (useful for noisy audio, but not always best)
uv run nemoscribe video_path="video.mp4" vad.enabled=true

# Generate both VAD and no-VAD candidates (recommended when unsure)
uv run nemoscribe video_path="video.mp4" ab_test.vad=true

# Batch processing
uv run nemoscribe video_dir=/path/to/videos/ output_dir=/path/to/subtitles/

📖 Advanced Tuning: For optimal parameter configurations for different scenarios (drama, news, technical tutorials), see TUNING_GUIDE.md.

🎯 Usage Examples

Subtitle Formatting

uv run nemoscribe video_path=video.mp4 \
  subtitle.max_chars_per_line=32 \
  subtitle.max_segment_duration=3.0 \
  subtitle.word_gap_threshold=0.5

# Disable word gap splitting
uv run nemoscribe video_path=video.mp4 subtitle.word_gap_threshold=null

Device and Precision

# Force CPU
uv run nemoscribe video_path=video.mp4 cuda=-1

# Specific GPU
uv run nemoscribe video_path=video.mp4 cuda=0

# Force float32 precision
uv run nemoscribe video_path=video.mp4 compute_dtype=float32

VAD Configuration

# Enable VAD with smart segmentation
uv run nemoscribe video_path=video.mp4 \
  vad.enabled=true \
  audio.smart_segmentation=true

# Adjust VAD sensitivity (optimized for drama/movie)
uv run nemoscribe video_path=video.mp4 \
  compute_dtype=float32 \
  vad.enabled=true \
  vad.onset=0.2 \
  vad.offset=0.1 \
  vad.min_duration_off=0.05 \
  vad.pad_onset=0.1 \
  vad.pad_offset=0.1 \
  decoding.rnnt_fused_batch_size=0 \
  decoding.segment_gap_threshold=20

Chicago Fire S12E01 validation on 2026-05-05 with NeMo 2.7.3 showed that compute_dtype=float32 and decoding.rnnt_fused_batch_size=0 are still the stable CUDA settings on an RTX 3070 Laptop GPU. In that sample, no-VAD produced slightly more complete output than VAD, so use ab_test.vad=true when you want the safer choice without manually running two commands.

VAD A/B Test

uv run nemoscribe video_path=video.mp4 \
  output_path=tmp_outputs/video.srt \
  compute_dtype=float32 \
  decoding.rnnt_fused_batch_size=0 \
  ab_test.vad=true

This writes both video.vad.srt and video.no_vad.srt using the same ASR settings. Use this when you want two candidate subtitles without manually running the command twice. VAD can reduce hallucinations on noisy audio, while no-VAD may preserve more dialogue on clean drama/movie audio.

ITN (Inverse Text Normalization)

# Enable ITN (requires nemo_text_processing)
uv run nemoscribe video_path=video.mp4 postprocessing.enable_itn=true

# For models with auto-capitalization
uv run nemoscribe video_path=video.mp4 \
  postprocessing.enable_itn=true \
  postprocessing.itn_input_case=cased

# Install ITN dependency
uv add nemo_text_processing

ITN Examples:

"twenty five dollars" → "$25"
"january first twenty twenty five" → "January 1, 2025"
"three point one four" → "3.14"
"the meeting is at ten thirty am" → "the meeting is at 10:30 a.m."

LLM Post-processing

Fix transcription errors (character names, proper nouns) using an LLM:

# Using OpenAI GPT-4o-mini (recommended: best cost/quality ratio, ~$0.06/episode)
uv run nemoscribe video_path=video.mp4 \
  vad.enabled=true \
  llm_postprocess.enabled=true \
  llm_postprocess.provider=openai \
  llm_postprocess.model=gpt-4o-mini

# Using Anthropic Claude 3.5 Sonnet (higher quality, ~$0.24/episode)
uv run nemoscribe video_path=video.mp4 \
  vad.enabled=true \
  llm_postprocess.enabled=true \
  llm_postprocess.provider=anthropic \
  llm_postprocess.model=claude-3-5-sonnet-20241022

What it fixes:

Character names: "Alias of us" → "Kylie Estevez", "Herman" → "Herrmann"
Proper nouns and technical terms
Homophones: their/there, to/too

Known limitations:

May over-correct ~10% of segments (mostly minor changes)
Semantic errors remain challenging
Requires API key and internet connection

Performance Measurement

uv run nemoscribe video_path=video.mp4 performance.calculate_rtfx=true
# Example output: RTFx=15.2x realtime (transcribed 600s in 39.5s)

🔧 Configuration Reference

Main Options

Option	Default	Description
`video_path`	-	Path to input video file
`video_dir`	-	Path to directory containing videos
`output_path`	auto	Output SRT file path
`output_dir`	auto	Output directory for batch processing
`pretrained_name`	`nvidia/parakeet-tdt-0.6b-v2`	Pretrained ASR model
`model_path`	-	Path to local .nemo checkpoint
`cuda`	auto	CUDA device ID (None=auto, negative=CPU)
`compute_dtype`	auto	`float32`, `bfloat16`, or `float16`
`overwrite`	true	Overwrite existing SRT files

Subtitle Formatting (`subtitle.*`)

Option	Default	Description
`max_chars_per_line`	42	Maximum characters per subtitle line
`max_segment_duration`	5.0	Maximum seconds per subtitle segment
`word_gap_threshold`	0.8	New segment if word gap >= this (seconds)

Audio Processing (`audio.*`)

Option	Default	Description
`sample_rate`	16000	Audio sample rate for ASR
`max_chunk_duration`	300.0	Max chunk size (5 min, safe for 8GB GPU)
`chunk_overlap`	2.0	Overlap between chunks (seconds)
`smart_segmentation`	true	Use VAD-based optimal split points
`min_silence_for_split`	0.3	Minimum silence duration for split point
`prefer_longer_silence`	true	Prefer splitting at longer silences

VAD Configuration (`vad.*`)

Option	Default	Description
`enabled`	false	Enable Voice Activity Detection
`model`	`vad_multilingual_frame_marblenet`	VAD model name
`onset`	0.3	Speech detection onset threshold (0-1)
`offset`	0.3	Speech detection offset threshold (0-1)
`pad_onset`	0.2	Padding before speech segments (seconds)
`pad_offset`	0.2	Padding after speech segments (seconds)
`min_duration_on`	0.2	Minimum speech segment duration
`min_duration_off`	0.2	Minimum non-speech gap to merge

Decoding Optimization (`decoding.*`)

Option	Default	Description
`rnnt_fused_batch_size`	-1	CUDA graphs: -1=enabled, 0=disabled
`rnnt_timestamp_type`	"all"	Timestamp type: "char", "word", "segment", "all"
`ctc_timestamp_type`	"all"	CTC timestamp type
`segment_separators`	`[".", "?", "!"]`	Split segments at punctuation marks
`segment_gap_threshold`	None	Positive integer in frames; splits on large inter-word gaps and remains compatible with `segment_separators`

Internally, NemoScribe maps segment_separators to whichever NeMo decoding config field is available (segment_separators or the historical segment_seperators spelling).

Post-processing (`postprocessing.*`)

Option	Default	Description
`enable_itn`	false	Enable Inverse Text Normalization
`itn_lang`	"en"	Language for ITN
`itn_input_case`	"lower_cased"	Input case: "lower_cased" or "cased"

A/B Test (`ab_test.*`)

Option	Default	Description
`vad`	false	Generate both `.vad.srt` and `.no_vad.srt` candidates

LLM Post-processing (`llm_postprocess.*`)

Option	Default	Description
`enabled`	false	Enable LLM-based subtitle correction
`provider`	"anthropic"	LLM provider: "anthropic" or "openai"
`model`	"claude-3-5-sonnet-20241022"	Model name (provider-specific)
`api_key`	None	API key (None = read from environment)
`batch_size`	20	Segments per LLM request
`max_retries`	3	Max validation/retry attempts per batch
`timeout`	30	API request timeout (seconds)

Performance (`performance.*`)

Option	Default	Description
`calculate_rtfx`	false	Calculate Real-Time Factor (RTFx)
`warmup_steps`	1	Warmup iterations before timing

Logging (`logging.*`)

Option	Default	Description
`verbose`	false	Show all NeMo internal logs (useful for debugging)
`suppress_repetitive_logs`	true	Suppress repetitive NeMo logs during chunk processing

🤖 Recommended Models

Model	Speed	Accuracy	Features
`nvidia/parakeet-tdt-0.6b-v2`	Fast	Best (EN)	Default. 1.69% WER, auto-punctuation
`nvidia/parakeet-tdt-0.6b-v3`	Fast	Excellent	Multilingual (25 languages), auto language detection
`nvidia/parakeet-tdt-1.1b`	Medium	Best	Highest accuracy, no auto-punctuation
`nvidia/parakeet-ctc-1.1b`	Fastest	Good	Fastest inference
`nvidia/canary-1b-v2`	Medium	Good	Multilingual, supports translation

Model Selection Guide

English subtitles: parakeet-tdt-0.6b-v2 (default, best out-of-box experience)
Multilingual: parakeet-tdt-0.6b-v3 (25 languages, auto-detection)
Highest accuracy: parakeet-tdt-1.1b (lowest WER, but no punctuation)
Fastest speed: parakeet-ctc-1.1b
Translation: canary-1b-v2 (25 languages, transcription + translation)

Note: parakeet-tdt-1.1b produces lowercase output without punctuation. The script automatically uses word-level timestamps to generate fine-grained subtitles.

Known model limitation: parakeet-tdt-0.6b-v2 may drop repeated words and false starts (disfluencies) — a known regression vs the 1.1b model (discussion #8). If verbatim disfluencies matter (e.g. stuttered or repeated lines), consider parakeet-tdt-1.1b.

Tip: You can try the default model without a GPU via the free hosted API on build.nvidia.com.

🎵 Long Audio Support

The script uses audio chunking to handle videos of any length:

Automatically splits long audio into smaller chunks (default: 5 minutes)
Chunks overlap (default: 2 seconds) to ensure accurate boundaries
Merges subtitles from all chunks, handling duplicates automatically
Long-audio attention tweaks are gated by audio.long_audio_threshold (default disables; lower to enable)

GPU Memory Recommendations:

GPU VRAM	`max_chunk_duration`
8GB	120–300 (default 300)
16GB	600
24GB+	0 (no chunking)

Note: On 8GB GPUs with compute_dtype=float32, dialogue-dense content can OOM at the default 300s because smart segmentation may produce longer continuous-speech chunks (verified on Yellowstone S03E01 with an RTX 3070 8GB). If you hit CUDA out of memory, set audio.max_chunk_duration=120.

🕐 Timestamp Priority

The script obtains timestamps in this priority order:

Segment-level: Direct segment timestamps from model (most accurate)
Word-level: Word timestamps grouped by line length/duration/gaps
Fallback: Estimated by speech rate (~150 words/min) when no timestamps available

Auto Fallback: If average segment length exceeds max_segment_duration * 2 (e.g., models without punctuation), the script automatically switches to word-level timestamps.

📁 Project Structure

nemoscribe/
├── __init__.py        # Package entry, version info
├── __main__.py        # python -m nemoscribe support
├── cli.py             # CLI parsing and entry point
├── config.py          # All dataclass configurations
├── audio.py           # Audio processing with ffmpeg
├── vad.py             # Voice Activity Detection
├── transcriber.py     # ASR model and transcription
├── srt.py             # SRT formatting and output
├── postprocess.py     # ITN, segment merging
├── llm_postprocess.py # LLM-based subtitle correction
└── log_utils.py       # Log filtering

📹 Supported Video Formats

.mp4, .mkv, .avi, .mov, .webm, .m4v

📝 Example Output

1
00:00:00,120 --> 00:00:03,450
Welcome to our show today.

2
00:00:03,680 --> 00:00:07,200
We have an exciting episode planned for you.

3
00:00:07,450 --> 00:00:11,800
Let's get started with our first topic.

🧪 Testing

# Run all tests
uv run python tests/test_improvements.py

# Run specific test
uv run python tests/test_improvements.py --test vad
uv run python tests/test_improvements.py --test itn
uv run python tests/test_improvements.py --test segmentation
uv run python tests/test_improvements.py --test metrics

# Available tests: baseline, vad, itn, decoding, nemo_api, segmentation, merging, performance, ab_test, metrics, srt, srt_edge, path, cli, cli_list, llm, llm_cli, llm_validation, llm_parsing, llm_fallback, llm_validation_fallback, full

Test Coverage

baseline_config: Default configuration backward compatibility
vad_config: VAD configuration correctness
itn_functions: ITN normalization functionality
decoding_config: Decoding configuration (CUDA graphs)
nemo_api_compatibility: NeMo decoding config alias compatibility
smart_segmentation: Smart segmentation logic
segment_merging: Overlapping segment merging
performance_config: Performance configuration
ab_test_config: VAD A/B test configuration and output path helpers
quality_metrics: WER/CER calculation
srt_formatting: SRT formatting
srt_edge_cases: SRT edge case handling (empty segments, special characters)
path_validation: Path validation and security checks
cli_config_override: CLI configuration override functionality
llm_config: LLM post-processing configuration defaults
llm_cli_override: LLM CLI parameter overrides
llm_validation: Batch result similarity validation
llm_parsing: JSON response parsing and prompt building
llm_fallback: Graceful fallback when disabled or no API key
llm_validation_fallback: Invalid LLM corrections fall back to the original batch
full_config: Complete configuration combination

📊 Quality Metrics

Calculate transcription quality using NeMo's official tools:

from tests.test_improvements import calculate_transcription_quality

result = calculate_transcription_quality(
    hypothesis="transcribed text",
    reference="ground truth text"
)
print(f"WER: {result['wer']:.2%}")
print(f"CER: {result['cer']:.2%}")

Output includes: wer, cer, insertion_rate, deletion_rate, substitution_rate

🆘 Troubleshooting

CUDA Out of Memory

Reduce chunk size:

uv run nemoscribe video_path=video.mp4 audio.max_chunk_duration=180.0

Timestamps Not Accurate

Use a model with timestamp support (parakeet-tdt-* recommended) and adjust segmentation parameters:

uv run nemoscribe video_path=video.mp4 \
  subtitle.max_segment_duration=3.0 \
  subtitle.word_gap_threshold=0.5

Model Download Slow

Models are automatically downloaded from HuggingFace/NGC on first use. For slow connections:

# Use HuggingFace mirror (China mainland)
export HF_ENDPOINT=https://hf-mirror.com

❓ FAQ

How is NemoScribe different from OpenAI Whisper? NemoScribe uses NVIDIA NeMo's Parakeet-TDT models, which rank above Whisper-large-v3 in English word error rate on the HuggingFace Open ASR Leaderboard while being far smaller and faster (up to ~240× realtime on GPU). Whisper supports more languages out of the box; for English video subtitling, Parakeet-TDT typically gives better accuracy per second of compute, plus native word-level timestamps.

Does it work offline? Yes. After the first model download, transcription runs fully offline on your machine. Only the optional LLM post-processing step requires an internet connection.

Can it generate subtitles for languages other than English? Yes — switch to nvidia/parakeet-tdt-0.6b-v3 (25 languages with auto-detection) or nvidia/canary-1b-v2 (transcription + translation) via pretrained_name=....

Do I need a GPU? A CUDA-capable NVIDIA GPU is strongly recommended (8GB VRAM is enough). CPU mode works (cuda=-1) but is much slower.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository at github.com/charles1018/NemoScribe
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

For bug reports and feature requests, please open an issue.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

NemoScribe is built upon the following open-source projects:

NVIDIA NeMo - Neural Modules toolkit for conversational AI (Apache 2.0 License)
Parakeet-TDT - NVIDIA's state-of-the-art ASR model (CC-BY-4.0 License)

We thank NVIDIA for making these excellent tools and models available to the community.

📚 References

Model Resources

Resource	Description
nvidia/parakeet-tdt-0.6b-v2	Default model, architecture and best practices
nvidia/parakeet-tdt-0.6b-v3	Multilingual version, 25 languages
nvidia/canary-1b-v2	Multilingual with translation support
HuggingFace Space Demo	Official demo with long audio handling

NeMo Framework References

File Path	Description
`examples/asr/transcribe_speech.py`	Main architecture reference
`nemo/collections/asr/parts/utils/transcribe_utils.py`	Core utilities: `get_inference_device()`, `get_inference_dtype()`
`nemo/collections/asr/parts/utils/rnnt_utils.py`	`Hypothesis` class, timestamp data structure

Key Implementation Details

Long Audio Optimization (from HuggingFace Space):

# Switch to local attention for memory efficiency on audio >8 minutes
model.change_attention_model("rel_pos_local_attn", [256, 256])
model.change_subsampling_conv_chunking_factor(1)  # 1 = auto select

Timestamp Data Structure (from Hypothesis):

{
    'segment': [{'start': float, 'end': float, 'segment': str}, ...],
    'word': [{'start': float, 'end': float, 'word': str}, ...],
    'char': [...]  # character-level timestamps
}

Documentation

If NemoScribe saves you time, consider giving it a ⭐ — it helps others find the project!

NemoScribe — automatic subtitle generation, video transcription, and speech-to-text for everyone.

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
docs		docs
nemoscribe		nemoscribe
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
README.zh-TW.md		README.zh-TW.md
UPGRADE_NEMO.md		UPGRADE_NEMO.md
V2_TEST_REPORT.md		V2_TEST_REPORT.md
__init__.py		__init__.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

🎬 NemoScribe

AI Subtitle Generator — Convert Video to SRT with NVIDIA NeMo Speech-to-Text

💡 Why NemoScribe?

📑 Table of Contents

✨ Features

📋 Requirements

FFmpeg Installation

📦 Installation

1. Install uv

2. Clone the Repository

3. Install Dependencies

4. Configure CUDA (Strongly Recommended)

Optional: LLM Post-processing

5. Verify Setup

🚀 Quick Start

🎯 Usage Examples

Subtitle Formatting

Device and Precision

VAD Configuration

VAD A/B Test

ITN (Inverse Text Normalization)

LLM Post-processing

Performance Measurement

🔧 Configuration Reference

Main Options

Subtitle Formatting (subtitle.*)

Audio Processing (audio.*)

VAD Configuration (vad.*)

Decoding Optimization (decoding.*)

Post-processing (postprocessing.*)

A/B Test (ab_test.*)

LLM Post-processing (llm_postprocess.*)

Performance (performance.*)

Logging (logging.*)

🤖 Recommended Models

Model Selection Guide

🎵 Long Audio Support

🕐 Timestamp Priority

📁 Project Structure

📹 Supported Video Formats

📝 Example Output

🧪 Testing

Test Coverage

📊 Quality Metrics

🆘 Troubleshooting

CUDA Out of Memory

Timestamps Not Accurate

Model Download Slow

❓ FAQ

🤝 Contributing

📄 License

🙏 Acknowledgments

📚 References

Model Resources

NeMo Framework References

Key Implementation Details

Documentation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Subtitle Formatting (`subtitle.*`)

Audio Processing (`audio.*`)

VAD Configuration (`vad.*`)

Decoding Optimization (`decoding.*`)

Post-processing (`postprocessing.*`)

A/B Test (`ab_test.*`)

LLM Post-processing (`llm_postprocess.*`)

Performance (`performance.*`)

Logging (`logging.*`)

Packages