PinSub

A small Python tool that turns a Chinese-language film into a trilingual subtitle for language learning. Each subtitle cue shows three lines:

汉字
hànzì
English

— Simplified Chinese characters on top, pinyin in the middle, English on the bottom — sharing a single timestamp so all three appear together in any standard player (VLC, MPV, Plex, Subtitle Edit). If your source .srt is Traditional, PinSub converts it to Simplified before merging.

Why

If you're learning Mandarin and watching wuxia, dramas, or animation, you usually have to choose between English subs (you understand the plot) or Chinese subs (you practice reading) — and even with both, you can't always tell how a character should be pronounced. PinSub merges everything into one track so you can read characters, check pinyin when you don't recognize them, and fall back to English when you're lost.

What it does

Pulls the English subtitle out of an .mkv (Bluray rips usually carry an English track).
Reads an existing Chinese .srt (Traditional or Simplified) and converts to Simplified if needed.
Generates pinyin for the Chinese line — lowercase tone-marked vowels by default (nǐ hǎo), with a fallback to numeric tones (ni3 hao3) if your player's font drops the diacritics. Capitals only at sentence start and on known proper nouns (Lǐ Mùbái).
Aligns the two tracks if their timing differs (Bluray English vs. fan-source Chinese rarely line up exactly).
Writes a single trilingual file you can drop next to the video — both .srt and .ass are supported (.ass lets PinSub size each row independently and manage word wrap).

Requirements

Python 3.10+
ffmpeg and ffprobe on PATH (for inspecting and extracting subtitle streams).
Optional: Subtitle Edit if your Bluray English subtitle is image-based (PGS) and needs OCR before PinSub can read it.

Install the two Python deps:

pip install pypinyin opencc-python-reimplemented

Developed on Windows 11; Linux/macOS should also work.

Usage

Inspect the subtitle streams in a video:

python PinSub.py --inspect --mkv "Movie.mkv"

Generate trilingual subtitles:

python PinSub.py \
  --mkv "Movie.mkv" \
  --zh  "Movie.zh.srt" \
  --out "Movie.zh-en-pinyin.ass"

If the English subtitle stream is image-based (PGS, common on Bluray rips), PinSub will write a .sup file and stop with a message asking you to OCR it in Subtitle Edit. After OCR, re-run with --english:

python PinSub.py \
  --mkv "Movie.mkv" \
  --zh  "Movie.zh.srt" \
  --english "Movie.english.srt" \
  --out "Movie.zh-en-pinyin.ass"

CLI flags

Flag	Default	What it does
`--mkv <path>`	required	source video
`--zh <path>`	required	Chinese .srt (Traditional or Simplified)
`--out <path>`	required	output trilingual file (`.ass` recommended; `.srt` also supported)
`--english <path>`	—	pre-existing English .srt; skip ffmpeg extraction
`--names <path>`	auto	per-film proper-noun JSON (see below)
`--inspect`	—	list subtitle streams in `--mkv` and exit
`--pinyin-style tone\|number`	tone	tone marks (`Lǐ`) or numeric (`Li3`)
`--no-simplify`	—	skip OpenCC Trad→Simp (use if `--zh` is already Simplified)
`--window-ms <int>`	1500	per-cue alignment tolerance
`--no-bom`	—	write output without UTF-8 BOM

Names files

Different films have different proper nouns, and PinSub leaves it to the user to supply them per film. The format is one JSON file per film:

{
  "imdb_id": "tt0190332",
  "title": "Crouching Tiger, Hidden Dragon",
  "name_map": {
    "李慕白": "Lǐ Mùbái",
    "俞秀莲": "Yú Xiùlián"
  },
  "english_name_map": {
    "Shu Lien": "Xiu Lian"
  }
}

name_map maps Hanzi (Simplified, as it appears in the cue) to the capitalized pinyin output. Tone marks live on the lowercase vowel; the capital sits on the consonant — Lǐ, not LǏ. PinSub processes longest keys first, so include both full names and given-name fragments if the dialogue uses both (李慕白 AND 慕白).
english_name_map maps old or Wade-Giles English spellings to the Hanyu Pinyin form used in the pinyin row, so the English line matches (Shu Lien → Xiu Lian). Word-boundary regex, case-sensitive.
Top-level fields like imdb_id and title are for your reference only — PinSub ignores them. Underscore-prefixed keys inside the maps (e.g. _help) are also ignored, useful for in-file comments.

PinSub looks for the names file in this order:

--names <path> — explicit, always wins.
Auto-detect: if your .mkv filename contains an IMDb tag like {imdb-tt0190332} (the Plex / TRaSH-Guides convention), PinSub looks for names/tt0190332.json next to PinSub.py.
No file found: the pipeline still runs; pinyin won't get name capitalization and English won't get Wade-Giles fixes, but output is still readable.

A starter file is at names/example.json. Copy it, rename to names/<your-imdb-id>.json, and fill in the maps for your film.

Roadmap

MT-Chinese for English-only films. Today PinSub needs an existing Chinese .srt. Generating a Chinese row by machine-translating the English would let PinSub work on any film. Engine choice (API vs local model) and trilingual layout are still being designed.

License

MIT.

Author

boladi

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
names		names
LICENSE		LICENSE
PinSub.py		PinSub.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PinSub

Why

What it does

Requirements

Usage

CLI flags

Names files

Roadmap

License

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PinSub

Why

What it does

Requirements

Usage

CLI flags

Names files

Roadmap

License

Author

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages