Skip to content

Feature/issue 136 wasm ocr#261

Open
Subhra-Nandi wants to merge 5 commits into
sahoo-tech:mainfrom
Subhra-Nandi:feature/issue-136-wasm-ocr
Open

Feature/issue 136 wasm ocr#261
Subhra-Nandi wants to merge 5 commits into
sahoo-tech:mainfrom
Subhra-Nandi:feature/issue-136-wasm-ocr

Conversation

@Subhra-Nandi
Copy link
Copy Markdown

🔗 Related Issue

Closes #136


📝 Summary of Changes

Implements a complete WebAssembly-based OCR pipeline that runs entirely
in the browser using tesseract.js@5, with no backend call required.

  • frontend/workers/ocr_worker.js — Web Worker that loads Tesseract WASM,
    caches English language data in IndexedDB, and processes ImageData frames.
    Message protocol: recognizeresult | error. Sends ready on init.

  • frontend/utils/ocr_client.jsOCRClient class wrapping the worker
    with a Promise-based API. recognize(imageData) returns a Promise matched
    by UUID. isReady(), waitUntilReady(), and terminate() manage lifecycle.

  • frontend/renderer/app.js — Connects to backend WebSocket. On disconnect,
    falls back to local OCR every 2s. Overlay shows "OCR: Backend (online)" or
    "OCR: Local (offline)".

  • docs/browser_ocr_compatibility.md — Full compatibility matrix,
    performance benchmarks, IndexedDB cache behaviour, and fallback strategy.


🔍 Type of Change

  • ✨ New feature (non-breaking change that adds functionality)
  • 📖 Documentation update
  • 🧪 Test addition or improvement

🧪 How Was This Tested?

cd frontend
npm test

Test environment:

  • OS: Windows 11
  • Python version: 3.12.5
  • Node version: (run node --version and fill in)

📸 Screenshots / Recordings (if applicable)

Screenshot 2026-05-28 121954

…-tech#136)

- Load tesseract.js inside Web Worker (off main thread)
- Cache English language data in IndexedDB on first load
- Message protocol: recognize -> result | error
- Sends ready/init_error on startup
- Handles ImageData input; returns text, confidence, per-word bbox
- recognize(imageData) -> Promise<OCRResult> matched by UUID
- isReady() / waitUntilReady() track worker init state
- terminate() cleans up worker and rejects all pending promises
- UUID fallback for browsers without crypto.randomUUID()
- Connect to backend WebSocket ws://localhost:8000/ws/guidance
- On disconnect: switch to local OCR polling every 2s
- Overlay status: OCR Backend online / OCR Local offline
- Auto-reconnect with 3s delay
- Mock Worker via FakeWorker class
- Tests: isReady, waitUntilReady, recognize success/error/concurrent
- Tests: terminate worker.terminate called, pending rejected
- No real WASM or network calls
- 12 passed, 0 failed
…hoo-tech#136)

- Chrome 88+, Firefox 79+, Edge 88+, Safari 15.2+ supported
- Required APIs table: Worker, WASM, IndexedDB, ImageData, crypto
- Performance: warm cache target 800ms on 1920x1080
- IndexedDB cache behaviour and fallback strategy documented
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build a WebAssembly-Based OCR Worker for Browser Compatibility

1 participant