A toolkit for Japanese learners — dictionary, OCR, translator, and Anki right in your browser. No extensions. No servers. No data sent anywhere.
Yomikomi is a web app that solves the typical pain of reading Japanese:
- Take a photo of a manga or book page
- Run it through OCR directly in the browser (no Google OCR)
- Tap an unknown word — get the translation from your own dictionary or kanji readings
- Add the word to favorites
- Repeat
The idea is similar to the Yomitan browser extension, but works without any extension — which means it works on mobile too. On iPhone you can add it to your Home Screen and use it like a native app.
- Search JMdict (English, Russian, Spanish, Dutch) and kanji dictionaries
- Custom SQL queries — full control over how your dictionary is searched
- Custom meaning parsers (JSON, plain string, custom JS function)
- Powered by sql.js (SQLite compiled to WASM) — no backend required
- Client-side OCR — images never leave your device
- PaddleOcr (ONNX, recommended)
- Tesseract.js
- Server-side OCR — connect your own Docker container with PaddleOCR or YomiToku
- Albums: upload a batch of pages, process them all, browse with dictionary lookup
- Area selection on images for precise OCR of specific regions
- Local machine translation models running in the browser (@xenova/transformers)
- Japanese → English (opus-mt-ja-en)
- Japanese → Russian (chained: ja→en→ru)
- Models are downloaded once, cached, and work offline
- Parse
.apkgfiles directly in the browser - Browse decks without Anki Desktop
- Kuromoji for morphological analysis
- Tokens are enriched from loaded dictionaries via n-gram lookup
- Node.js 20+
- pnpm 10+
git clone https://github.com/sieugene/yomikomi
cd yomikomi
pnpm install
pnpm devpnpm dev # Start dev server (Turbopack)
pnpm build # Production build
pnpm start # Start production server
pnpm lint # ESLint checkThe app works with the dictionary format used by Yomichan/Yomitan. Built-in templates:
| Template | Language | Format |
|---|---|---|
jmdict_en |
English | JMdict structured-content |
jmdict_ru |
Russian | JMdict plain |
jmdict_es |
Spanish | JMdict plain |
jmdict_nl |
Dutch | JMdict plain |
nyars |
Russian | Nyars structured-content |
kanji_dict |
— | Kanji + onyomi/kunyomi |
- Go to Dictionary → dictionary management
- Upload a
.dbdictionary file - Select a template or configure a custom SQL query
Full control over search logic. Basic example:
SELECT DISTINCT *
FROM terms
WHERE "0" = ? OR "0" LIKE ? || '%'
ORDER BY CASE WHEN "0" = ? THEN 1 ELSE 2 END
LIMIT ?;You can write any search logic — by reading, by kanji, partial match, etc.
By default, OCR runs entirely in the browser. Images are never sent anywhere.
PaddleOcr (recommended) — fast and accurate for Japanese:
- Models: ONNX files in
/public/ocr/ - Loaded automatically on first use
Tesseract.js — classic option, slower but stable:
- Dictionary files in
/public/kuromoji/
OCR settings: Settings → OCR Settings
- Text orientation (horizontal / vertical / auto)
- Japanese vertical mode
- Engine selection
For higher quality recognition of complex pages, you can run a local OCR server.
cd server
docker-compose --env-file .env.paddle up --buildcd server
docker-compose --env-file .env.yomitoku up --buildThe server runs on http://localhost:8000. In app settings:
- Disable "Client side OCR"
- Set API endpoint:
http://localhost:8000
POST /ocr/ # Extract text from image
POST /ocr/with-positions/ # Extract text with block coordinates
GET /health # Server status
Both engines return the same response format — the frontend doesn't know the difference.
For reading manga or books:
- Albums → New Album
- Upload pages (up to 500 images, sorted by filename)
- Click Process — OCR runs in batches
- Browse pages, tap words to look them up in the dictionary
- Use Select Area to run OCR on a specific region of the image
Models run locally via WebAssembly (@xenova/transformers).
Activation:
- Go to Translator
- Click "Activate Models"
- First load: 30–60 seconds (models are cached after that)
Available pairs:
- 日本語 → English (opus-mt-ja-en)
- 日本語 → Russian (opus-mt-ja-en + opus-mt-en-ru)
iOS has browser memory limits and WebGPU support issues. Crashes may occur when using heavy translation models. Client-side OCR (PaddleOcr) works stably, but can have some memory issues.
Tip: Add the site to your Home Screen via Safari → Share → Add to Home Screen. The app opens fullscreen without the browser UI.
Translation models are 75–150MB. On memory-constrained devices crashes are possible — this is a known iOS Safari issue with large WebAssembly/ONNX models.
- Next.js 16 (App Router) + TypeScript
- Tailwind CSS v4 + Radix UI + shadcn/ui
- sql.js — SQLite compiled to WASM, runs in the browser
- kuromoji.js — Japanese morphological analyzer (WASM)
- @xenova/transformers — Hugging Face Transformers.js (ONNX Runtime Web)
- SWR for data fetching and state synchronization
- IndexedDB for storing albums, images, and dictionaries
- protobufjs — parsing Anki
.apkgfiles
Working with WASM in Next.js is non-trivial due to SSR. Here's how each dependency is handled:
sql.js is loaded via dynamic import inside a context provider (SqlJsProvider) to prevent SSR crashes. The WASM binary is served from /public/.
Kuromoji dictionary files are placed in /public/kuromoji/ and loaded via standard fetch — Kuromoji supports URL-based loading natively.
@xenova/transformers is connected through a custom adapter (/public/transformers/transformers-adapter.js) and injected as a <Script> tag with strategy="afterInteractive". This sidesteps SSR issues and makes window.__transformers available only on the client.
PaddleOcr (ONNX Runtime Web) follows the same pattern — loaded via Script tag, with model files in /public/ocr/.
General principle: all heavy WASM dependencies are isolated behind context providers and loaded lazily on the client only. Nothing runs during SSR.
src/
├── app/ # Next.js App Router pages
│ └── app/
│ ├── albums/ # Album list
│ ├── album/[id]/ # Album viewer
│ ├── dict/ # Dictionary page
│ ├── translator/ # Translator
│ ├── ocr-capture/ # OCR from photo
│ ├── favorites/ # Saved words
│ ├── anki-import/ # Anki import
│ └── settings/ # Settings
│
├── features/ # Business logic (Feature-Sliced Design)
│ ├── dictionary/ # Dictionary management, SQL search
│ ├── ocr/ # OCR logic and adapters
│ ├── ocr-album/ # Albums + batch processing
│ ├── ocr-capture/ # Area selection + OCR
│ ├── ocr-client/ # Client OCR engines (PaddleOcr, Tesseract)
│ ├── ocr-settings/ # OCR settings
│ ├── translation/ # Translator (Transformers.js)
│ ├── tokenizer/ # Kuromoji + dictionary enrichment
│ ├── favorite-words/ # Favorite words
│ └── storage/ # BaseStoreManager (IndexedDB abstraction)
│
├── entities/ # Business-aware components
│ ├── OcrViewer/ # OCR result viewer
│ ├── DictionaryLookup/ # Dictionary search UI
│ └── OcrCompactDictionary/
│
├── views/ # Page compositions
└── shared/ # Utilities, UI kit, routes
server/ # Python FastAPI OCR server
├── src/
│ ├── main.py
│ ├── ocr_paddle.py
│ ├── ocr_yomitoku.py
│ └── schemas/
├── Dockerfile-paddle
├── Dockerfile-yomitoku
└── docker-compose.yml
Everything is stored locally in the browser:
| Data | Storage |
|---|---|
| Albums and images | IndexedDB (OCRAlbumDB) |
| Dictionary files | IndexedDB (DictionaryManagerDB) |
| Custom dictionary templates | localStorage |
| OCR settings | localStorage |
| Translation settings | localStorage |
| Favorite words | localStorage |
| Translation models | Browser Cache API |
Image file
│
├─ isClientSide=true ──► PaddleOcr (ONNX) ──► adaptPaddleOCR()
│ └─► Tesseract.js ──► adaptTesseractResult()
│
└─ isClientSide=false ─► fetch → Docker API ──► OCRResponse (native format)
│
PaddleOCR / YomiToku
OCRResponse { full_text, text_blocks[], image_info }
All sources return the same OCRResponse format — the UI layer doesn't know or care where the result came from.
PRs and issues are welcome. If you found a bug or have an idea, open an Issue.
pnpm dev # Turbopack, hot reload
pnpm lint # Run before committingMIT Copyright (c) 2026 sieugene