This repository packages hama inference runtimes. It ships:
- a Python package built with
uv, powered by ONNX Runtime - a Bun/TypeScript package that runs under Node.js/Bun and in browsers
- shared tokenizer + Hangul jamo helpers
- a waveform-input phoneme ASR runtime over exported ONNX
- reproducible tests for both runtimes
Package assets (under python/src/hama/assets and ts/src/assets) contain:
encoder.onnx+decoder_step.onnx(split runtime, recommended)asr_waveform_fp16.onnx(canonical ASR waveform model)g2p_vocab.json
Both runtimes use split assets by default. A legacy single-file ONNX is still
supported only when you explicitly provide model_path/modelPath.
Requirements: uv>=0.3, Python 3.9+.
cd python
uv sync --extra test
uv run pytest
uv run pytest tests/test_split_assets.py -q
uv run pytest tests/test_asr.py -qQuick demo script (python/example.py):
from hama import G2PModel
def main() -> None:
model = G2PModel()
result = model.predict("Really? What's the orbital velocity of the moon?", preserve_literals="punct")
print("IPA:", result.ipa)
print("Display IPA:", result.display_ipa)
print("Alignments:", result.alignments)
if __name__ == "__main__":
main()Run it with:
uv run python python/example.pyASR demo script (examples/python_asr.py):
cd python
uv sync --extra test
uv run python ../examples/python_asr.py --wav /path/to/audio.wavYou can omit --wav to run a synthetic smoke input.
Live mic ASR with Silero VAD 6.2 (examples/python_live_asr_silero_vad.py):
cd python
uv sync --extra test
uv run pip install sounddevice "silero-vad==6.2.0"
uv run python ../examples/python_live_asr_silero_vad.py
# if nothing is detected, list/select your input device:
uv run python ../examples/python_live_asr_silero_vad.py --list-devices
uv run python ../examples/python_live_asr_silero_vad.py --input-device "MacBook Air Microphone"
# reduce <unk> noise and heartbeat logs:
uv run python ../examples/python_live_asr_silero_vad.py --unk-bias -2.0 --listening-log-interval-sec 8
# default VAD matches iOS: threshold=0.6 and dynamic silence duration
# (<3s: 1000ms, 5s: 500ms, 12s: 200ms, 17s: 100ms, >17s: immediate)This is example-only; no new runtime dependencies were added to hama.
The public API lives in hama.__init__:
split_text_to_jamo/join_jamo_tokensβ reversible Hangul disassemblyG2PModel.predict(text)β returns canonical IPA, a display-friendly IPA string, andphoneme -> char_indexalignments derived from attention weightspredict(..., split_delimiter=r"\s+", output_delimiter=" ", preserve_literals="none" | "punct")can segment input before inference, join segment IPA outputs with a delimiter, and optionally preserve punctuation inresult.display_ipawithout changing canonicalresult.ipachar_indexis-1only for whitespace-only inputASRModel.transcribe_file(path)/ASRModel.transcribe_waveform(waveform, sample_rate)return collapsed phoneme output fromasr_waveform_fp16.onnxASRResultincludesphonemes,phoneme_text,word_phoneme_text,token_ids, and frame-levelframe_token_ids
Pass encoder_model_path + decoder_step_model_path (recommended split mode),
or model_path (single-file fallback), plus optional vocab_path for custom assets.
For ASR, pass model_path if you want a non-default ONNX file.
Requirements: bun>=1.1.
cd ts
bun install
bun run build
bun test
bun test tests/asr.test.ts
bun run validate:model:split
bun run validate:asr
bun run validate:browser
bun run validate:browser:asr
# Install published package (instead of local dist/)
bun add hama-js
# or
npm install hama-jsLive mic ASR with Silero VAD in TS/Node (ts/scripts/live-asr-silero.ts):
cd ts
bun add -d @ricky0123/vad-node node-record-lpcm16
# macOS recorder dependency:
brew install sox
bun run live:asr:silero
# default VAD matches iOS: threshold=0.6 and dynamic silence duration
# (<3s: 1000ms, 5s: 500ms, 12s: 200ms, 17s: 100ms, >17s: immediate)
# optionally:
bun run live:asr:silero --input-device "default" --unk-bias -2.0
bun run live:asr:silero --record-program soxThis TS live script is example-only and uses optional dev dependencies.
Node/Bun demo (ts/example.js):
import { G2PNodeModel } from "./dist/node/index.js";
const run = async () => {
const model = await G2PNodeModel.create();
const result = await model.predict("Really? What's the orbital velocity of the moon?", {
preserveLiterals: "punct",
});
console.log("IPA:", result.ipa);
console.log("Display IPA:", result.displayIpa);
console.log("Alignments:", result.alignments);
};
run().catch((err) => {
console.error(err);
process.exit(1);
});Execute it after building:
node ts/example.jsUsing the published package instead of the local dist:
import { G2PNodeModel } from "hama-js/g2p";API overview:
G2PNodeModel.create({ modelPath?, encoderModelPath?, decoderStepModelPath?, maxInputLen?, maxOutputLen? })model.predict(text, { splitDelimiter?: /\s+/u by default, outputDelimiter?: " ", preserveLiterals?: "none" | "punct" })β{ ipa, displayIpa, alignments }displayIpapreserves punctuation only when requested; canonicalipastays punctuation-freealignments[].charIndexis-1only for whitespace-only inputASRNodeModel.create({ modelPath?, vocabPath?, sampleRate?, blankToken?, unkToken?, wordBoundaryToken?, blankBias?, unkBias? })model.transcribeWavFile(path)andmodel.transcribeWaveform(samples, sampleRate)for zero-dependency WAV/waveform inference- waveform-input ASR is the only supported public path in both runtimes
ASRNodeModel.inputFormatis always"waveform"ASRResultβ{ phonemes, phonemeText, wordPhonemeText, tokenIds, frameTokenIds, numFrames }decodeCtcTokens(...)is exported for deterministic CTC post-processing tests- Browser bundle:
import { G2PBrowserModel } from "hama-js/g2p/browser";import { ASRBrowserModel } from "hama-js/asr/browser";import { G2PBrowserModel, ASRBrowserModel } from "hama-js/browser";G2PBrowserModel.create({ modelUrl?, encoderUrl?, decoderStepUrl?, ... })ASRBrowserModel.create({ modelUrl?, vocabUrl?, sampleRate?, blankToken?, unkToken?, wordBoundaryToken?, blankBias?, unkBias?, collapseRepeats? })
The package copies assets/*.onnx + g2p_vocab.json into dist so Node/Bun
resolves them via import.meta.url. For browser deployments, host the ONNX
assets next to the bundle (default URLs resolve relative to the built module),
and pass vocabUrl when you want a browser-specific decoder vocab JSON.
Release notes live in CHANGELOG.md.
- Both runtimes use identical Hangul jamo logic so character indices map back to the original graphemes, even after jamo expansion.
- ASR uses the same decoder vocabulary base (
g2p_vocab.jsondecoder +<wb>+<blank>). - ASR expects a waveform-input ONNX named
asr_waveform_fp16.onnx. - TS file input currently supports WAV only (PCM 8/16/24/32-bit int, and 32-bit float WAV).
- Inputs are case-normalized (lowercased in both Python and TS) and whitespace is ignored during tokenization.
- Input length defaults to 128 time steps to accommodate Korean + mixed tokens.
maxOutputLencontrols host-side greedy decoding in split mode, and remains a compatibility option for single-file mode.- Output alignment is derived from attention argmax, mirroring the training scripts.
- For whitespace-only inputs, alignments use
char_index = -1sentinel.
assets/ # Shared vocab
python/src/hama/ # Python runtime
python/tests/ # pytest suite
ts/src/ # TypeScript runtime (Node + browser)
ts/tests/ # bun test suite
examples/ # root-level usage examples
- Publish
python/viauv publish/ PyPI, andts/ashama-js. - Run local split smoke checks:
cd python && uv run pytest tests/test_split_assets.py -qandcd ../ts && bun run validate:model:split. - Wire up docs/examples + simple CLI wrappers if needed.