A small Python tool that turns a Chinese-language film into a trilingual subtitle for language learning. Each subtitle cue shows three lines:
汉字
hànzì
English
— Simplified Chinese characters on top, pinyin in the middle, English on the bottom — sharing a single timestamp so all three appear together in any standard player (VLC, MPV, Plex, Subtitle Edit). If your source .srt is Traditional, PinSub converts it to Simplified before merging.
If you're learning Mandarin and watching wuxia, dramas, or animation, you usually have to choose between English subs (you understand the plot) or Chinese subs (you practice reading) — and even with both, you can't always tell how a character should be pronounced. PinSub merges everything into one track so you can read characters, check pinyin when you don't recognize them, and fall back to English when you're lost.
- Pulls the English subtitle out of an
.mkv(Bluray rips usually carry an English track). - Reads an existing Chinese
.srt(Traditional or Simplified) and converts to Simplified if needed. - Generates pinyin for the Chinese line — lowercase tone-marked vowels by default (
nǐ hǎo), with a fallback to numeric tones (ni3 hao3) if your player's font drops the diacritics. Capitals only at sentence start and on known proper nouns (Lǐ Mùbái). - Aligns the two tracks if their timing differs (Bluray English vs. fan-source Chinese rarely line up exactly).
- Writes a single trilingual file you can drop next to the video — both
.srtand.assare supported (.asslets PinSub size each row independently and manage word wrap).
- Python 3.10+
- ffmpeg and
ffprobeonPATH(for inspecting and extracting subtitle streams). - Optional: Subtitle Edit if your Bluray English subtitle is image-based (PGS) and needs OCR before PinSub can read it.
Install the two Python deps:
pip install pypinyin opencc-python-reimplemented
Developed on Windows 11; Linux/macOS should also work.
Inspect the subtitle streams in a video:
python PinSub.py --inspect --mkv "Movie.mkv"
Generate trilingual subtitles:
python PinSub.py \
--mkv "Movie.mkv" \
--zh "Movie.zh.srt" \
--out "Movie.zh-en-pinyin.ass"
If the English subtitle stream is image-based (PGS, common on Bluray rips), PinSub will write a .sup file and stop with a message asking you to OCR it in Subtitle Edit. After OCR, re-run with --english:
python PinSub.py \
--mkv "Movie.mkv" \
--zh "Movie.zh.srt" \
--english "Movie.english.srt" \
--out "Movie.zh-en-pinyin.ass"
| Flag | Default | What it does |
|---|---|---|
--mkv <path> |
required | source video |
--zh <path> |
required | Chinese .srt (Traditional or Simplified) |
--out <path> |
required | output trilingual file (.ass recommended; .srt also supported) |
--english <path> |
— | pre-existing English .srt; skip ffmpeg extraction |
--names <path> |
auto | per-film proper-noun JSON (see below) |
--inspect |
— | list subtitle streams in --mkv and exit |
--pinyin-style tone|number |
tone | tone marks (Lǐ) or numeric (Li3) |
--no-simplify |
— | skip OpenCC Trad→Simp (use if --zh is already Simplified) |
--window-ms <int> |
1500 | per-cue alignment tolerance |
--no-bom |
— | write output without UTF-8 BOM |
Different films have different proper nouns, and PinSub leaves it to the user to supply them per film. The format is one JSON file per film:
{
"imdb_id": "tt0190332",
"title": "Crouching Tiger, Hidden Dragon",
"name_map": {
"李慕白": "Lǐ Mùbái",
"俞秀莲": "Yú Xiùlián"
},
"english_name_map": {
"Shu Lien": "Xiu Lian"
}
}name_mapmaps Hanzi (Simplified, as it appears in the cue) to the capitalized pinyin output. Tone marks live on the lowercase vowel; the capital sits on the consonant —Lǐ, notLǏ. PinSub processes longest keys first, so include both full names and given-name fragments if the dialogue uses both (李慕白AND慕白).english_name_mapmaps old or Wade-Giles English spellings to the Hanyu Pinyin form used in the pinyin row, so the English line matches (Shu Lien→Xiu Lian). Word-boundary regex, case-sensitive.- Top-level fields like
imdb_idandtitleare for your reference only — PinSub ignores them. Underscore-prefixed keys inside the maps (e.g._help) are also ignored, useful for in-file comments.
PinSub looks for the names file in this order:
--names <path>— explicit, always wins.- Auto-detect: if your
.mkvfilename contains an IMDb tag like{imdb-tt0190332}(the Plex / TRaSH-Guides convention), PinSub looks fornames/tt0190332.jsonnext toPinSub.py. - No file found: the pipeline still runs; pinyin won't get name capitalization and English won't get Wade-Giles fixes, but output is still readable.
A starter file is at names/example.json. Copy it, rename to names/<your-imdb-id>.json, and fill in the maps for your film.
- MT-Chinese for English-only films. Today PinSub needs an existing Chinese
.srt. Generating a Chinese row by machine-translating the English would let PinSub work on any film. Engine choice (API vs local model) and trilingual layout are still being designed.
MIT.