読み込み Yomikomi

A toolkit for Japanese learners — dictionary, OCR, translator, and Anki right in your browser. No extensions. No servers. No data sent anywhere.

🇯🇵 日本語版 README はこちら

What is this?

Yomikomi is a web app that solves the typical pain of reading Japanese:

Take a photo of a manga or book page
Run it through OCR directly in the browser (no Google OCR)
Tap an unknown word — get the translation from your own dictionary or kanji readings
Add the word to favorites
Repeat

The idea is similar to the Yomitan browser extension, but works without any extension — which means it works on mobile too. On iPhone you can add it to your Home Screen and use it like a native app.

Features

📚 Dictionary

Search JMdict (English, Russian, Spanish, Dutch) and kanji dictionaries
Custom SQL queries — full control over how your dictionary is searched
Custom meaning parsers (JSON, plain string, custom JS function)
Powered by sql.js (SQLite compiled to WASM) — no backend required

🔍 OCR

Client-side OCR — images never leave your device
- PaddleOcr (ONNX, recommended)
- Tesseract.js
Server-side OCR — connect your own Docker container with PaddleOCR or YomiToku
Albums: upload a batch of pages, process them all, browse with dictionary lookup
Area selection on images for precise OCR of specific regions

🌐 Translator

Local machine translation models running in the browser (@xenova/transformers)
Japanese → English (opus-mt-ja-en)
Japanese → Russian (chained: ja→en→ru)
Models are downloaded once, cached, and work offline

📦 Anki

Parse .apkg files directly in the browser
Browse decks without Anki Desktop

🔤 Tokenization

Kuromoji for morphological analysis
Tokens are enriched from loaded dictionaries via n-gram lookup

Quick Start

Requirements

Node.js 20+
pnpm 10+

Setup

git clone https://github.com/sieugene/yomikomi
cd yomikomi
pnpm install
pnpm dev

Open http://localhost:3000

Commands

pnpm dev        # Start dev server (Turbopack)
pnpm build      # Production build
pnpm start      # Start production server
pnpm lint       # ESLint check

Dictionaries

The app works with the dictionary format used by Yomichan/Yomitan. Built-in templates:

Template	Language	Format
`jmdict_en`	English	JMdict structured-content
`jmdict_ru`	Russian	JMdict plain
`jmdict_es`	Spanish	JMdict plain
`jmdict_nl`	Dutch	JMdict plain
`nyars`	Russian	Nyars structured-content
`kanji_dict`	—	Kanji + onyomi/kunyomi

Adding a dictionary

Go to Dictionary → dictionary management
Upload a .db dictionary file
Select a template or configure a custom SQL query

Custom SQL queries

Full control over search logic. Basic example:

SELECT DISTINCT *
FROM terms
WHERE "0" = ? OR "0" LIKE ? || '%'
ORDER BY CASE WHEN "0" = ? THEN 1 ELSE 2 END
LIMIT ?;

You can write any search logic — by reading, by kanji, partial match, etc.

OCR: Client-side mode

By default, OCR runs entirely in the browser. Images are never sent anywhere.

PaddleOcr (recommended) — fast and accurate for Japanese:

Models: ONNX files in /public/ocr/
Loaded automatically on first use

Tesseract.js — classic option, slower but stable:

Dictionary files in /public/kuromoji/

OCR settings: Settings → OCR Settings

Text orientation (horizontal / vertical / auto)
Japanese vertical mode
Engine selection

OCR: Server mode (Docker)

For higher quality recognition of complex pages, you can run a local OCR server.

PaddleOCR (lightweight, ~2GB)

cd server
docker-compose --env-file .env.paddle up --build

YomiToku (heavier, ~4GB, better for complex layouts)

cd server
docker-compose --env-file .env.yomitoku up --build

The server runs on http://localhost:8000. In app settings:

Disable "Client side OCR"
Set API endpoint: http://localhost:8000

Server API endpoints

POST /ocr/                 # Extract text from image
POST /ocr/with-positions/  # Extract text with block coordinates
GET  /health               # Server status

Both engines return the same response format — the frontend doesn't know the difference.

Albums (batch OCR)

For reading manga or books:

Albums → New Album
Upload pages (up to 500 images, sorted by filename)
Click Process — OCR runs in batches
Browse pages, tap words to look them up in the dictionary
Use Select Area to run OCR on a specific region of the image

Translator

Models run locally via WebAssembly (@xenova/transformers).

Activation:

Go to Translator
Click "Activate Models"
First load: 30–60 seconds (models are cached after that)

Available pairs:

日本語 → English (opus-mt-ja-en)
日本語 → Russian (opus-mt-ja-en + opus-mt-en-ru)

Known Limitations

iOS / iPhone

iOS has browser memory limits and WebGPU support issues. Crashes may occur when using heavy translation models. Client-side OCR (PaddleOcr) works stably, but can have some memory issues.

Tip: Add the site to your Home Screen via Safari → Share → Add to Home Screen. The app opens fullscreen without the browser UI.

Translator on mobile

Translation models are 75–150MB. On memory-constrained devices crashes are possible — this is a known iOS Safari issue with large WebAssembly/ONNX models.

Architecture (for developers)

Tech Stack

Next.js 16 (App Router) + TypeScript
Tailwind CSS v4 + Radix UI + shadcn/ui
sql.js — SQLite compiled to WASM, runs in the browser
kuromoji.js — Japanese morphological analyzer (WASM)
@xenova/transformers — Hugging Face Transformers.js (ONNX Runtime Web)
SWR for data fetching and state synchronization
IndexedDB for storing albums, images, and dictionaries
protobufjs — parsing Anki .apkg files

WASM in Next.js

Working with WASM in Next.js is non-trivial due to SSR. Here's how each dependency is handled:

sql.js is loaded via dynamic import inside a context provider (SqlJsProvider) to prevent SSR crashes. The WASM binary is served from /public/.

Kuromoji dictionary files are placed in /public/kuromoji/ and loaded via standard fetch — Kuromoji supports URL-based loading natively.

@xenova/transformers is connected through a custom adapter (/public/transformers/transformers-adapter.js) and injected as a <Script> tag with strategy="afterInteractive". This sidesteps SSR issues and makes window.__transformers available only on the client.

PaddleOcr (ONNX Runtime Web) follows the same pattern — loaded via Script tag, with model files in /public/ocr/.

General principle: all heavy WASM dependencies are isolated behind context providers and loaded lazily on the client only. Nothing runs during SSR.

Project Structure

src/
├── app/                    # Next.js App Router pages
│   └── app/
│       ├── albums/         # Album list
│       ├── album/[id]/     # Album viewer
│       ├── dict/           # Dictionary page
│       ├── translator/     # Translator
│       ├── ocr-capture/    # OCR from photo
│       ├── favorites/      # Saved words
│       ├── anki-import/    # Anki import
│       └── settings/       # Settings
│
├── features/               # Business logic (Feature-Sliced Design)
│   ├── dictionary/         # Dictionary management, SQL search
│   ├── ocr/                # OCR logic and adapters
│   ├── ocr-album/          # Albums + batch processing
│   ├── ocr-capture/        # Area selection + OCR
│   ├── ocr-client/         # Client OCR engines (PaddleOcr, Tesseract)
│   ├── ocr-settings/       # OCR settings
│   ├── translation/        # Translator (Transformers.js)
│   ├── tokenizer/          # Kuromoji + dictionary enrichment
│   ├── favorite-words/     # Favorite words
│   └── storage/            # BaseStoreManager (IndexedDB abstraction)
│
├── entities/               # Business-aware components
│   ├── OcrViewer/          # OCR result viewer
│   ├── DictionaryLookup/   # Dictionary search UI
│   └── OcrCompactDictionary/
│
├── views/                  # Page compositions
└── shared/                 # Utilities, UI kit, routes

server/                     # Python FastAPI OCR server
├── src/
│   ├── main.py
│   ├── ocr_paddle.py
│   ├── ocr_yomitoku.py
│   └── schemas/
├── Dockerfile-paddle
├── Dockerfile-yomitoku
└── docker-compose.yml

Data Storage

Everything is stored locally in the browser:

Data	Storage
Albums and images	IndexedDB (`OCRAlbumDB`)
Dictionary files	IndexedDB (`DictionaryManagerDB`)
Custom dictionary templates	localStorage
OCR settings	localStorage
Translation settings	localStorage
Favorite words	localStorage
Translation models	Browser Cache API

OCR Pipeline

Image file
    │
    ├─ isClientSide=true ──► PaddleOcr (ONNX) ──► adaptPaddleOCR()
    │                    └─► Tesseract.js        ──► adaptTesseractResult()
    │
    └─ isClientSide=false ─► fetch → Docker API  ──► OCRResponse (native format)
                                          │
                               PaddleOCR / YomiToku

OCRResponse { full_text, text_blocks[], image_info }

All sources return the same OCRResponse format — the UI layer doesn't know or care where the result came from.

Contributing

PRs and issues are welcome. If you found a bug or have an idea, open an Issue.

pnpm dev    # Turbopack, hot reload
pnpm lint   # Run before committing

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
prisma		prisma
public		public
scripts		scripts
server		server
src		src
.env.example		.env.example
.gitignore		.gitignore
.prettierrc		.prettierrc
README-ja.md		README-ja.md
README.md		README.md
components.json		components.json
docker-compose.yml		docker-compose.yml
eslint.config.mjs		eslint.config.mjs
global.d.ts		global.d.ts
next.config.ts		next.config.ts
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json
tsconfig.scripts.json		tsconfig.scripts.json

Folders and files

Latest commit

History

Repository files navigation

読み込み Yomikomi

What is this?

Features

📚 Dictionary

🔍 OCR

🌐 Translator

📦 Anki

🔤 Tokenization

Quick Start

Requirements

Setup

Commands

Dictionaries

Adding a dictionary

Custom SQL queries

OCR: Client-side mode

OCR: Server mode (Docker)

PaddleOCR (lightweight, ~2GB)

YomiToku (heavier, ~4GB, better for complex layouts)

Server API endpoints

Albums (batch OCR)

Translator

Known Limitations

iOS / iPhone

Translator on mobile

Architecture (for developers)

Tech Stack

WASM in Next.js

Project Structure

Data Storage

OCR Pipeline

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages