Real-time on-device glasses detection for the browser and Node.js
FrameFind detects whether someone is wearing glasses in real time, running entirely on-device. No frames are sent to any server — the ONNX model runs locally in the browser via WASM or in Node.js via the native runtime.
Most vision APIs need a round-trip to a server: your frame leaves the device, gets processed, and comes back. That adds latency, costs money per call, exposes biometric data, and breaks offline.
FrameFind runs the model in the browser itself:
- Zero latency from network — inference happens on the same machine that captured the frame
- Privacy by default — camera data never leaves the device
- No usage costs — once the model is cached (~6.2 MB), every inference is free
- Works offline — no connection required after first load
Measured on Chrome 124 / MacBook M2 with WASM backend, 200+ frames:
| Metric | Value |
|---|---|
| Median inference | ~27 ms |
| p95 inference | ~35 ms |
| Model size | 6.2 MB |
| First load (cached) | <50 ms |
| Input resolution | 112 × 112 |
| Browser | WASM | WebGPU |
|---|---|---|
| Chrome 112+ | ✅ | ✅ |
| Firefox 110+ | ✅ | 🚧 |
| Safari 16.4+ | ✅ | ✅ |
| Edge 112+ | ✅ | ✅ |
WASM works everywhere. WebGPU accelerates inference where supported.
| WebGPU | WASM | |
|---|---|---|
| Inference speed | ~8 ms | ~27 ms |
| Compatibility | Chrome/Safari TP | All modern browsers |
| GPU required | Yes | No |
| Fallback | → WASM | — |
FrameFind uses WASM by default (via onnxruntime-web) and falls back gracefully. Switch to WebGPU by passing the executor option to onnxruntime-web.
Frame / Image
│
▼
Face Landmarker (MediaPipe)
│
├─ landmarks found → crop eye region (34 keypoints, 112×112)
│
└─ no landmarks → centered crop fallback
│
▼
ONNX Model (6.2 MB)
logit → sigmoid → probability
│
▼
Temporal smoothing (N frames)
│
▼
{ glasses, probability, faceDetected }
packages/
core/ → GlassesDetector (browser) and GlassesDetectorNode (Node.js)
react/ → useGlassesDetector hook
utils/ → shared types, constants, and helpers
# Browser / React
npm install @framefind/core onnxruntime-web
npm install @framefind/react onnxruntime-web react
# Node.js
npm install @framefind/core onnxruntime-nodeimport { GlassesDetector } from "@framefind/core";
const detector = new GlassesDetector({
modelUrl: "https://cdn.framefind.moraxh.dev/glasses/v1/glasses.onnx",
});
await detector.load();
const result = await detector.detectFromCanvas(canvas, landmarks);
console.log(result.glasses, result.probability);import { useGlassesDetector } from "@framefind/react";
function Camera() {
const { result, loading, detect } = useGlassesDetector({
modelUrl: "https://cdn.framefind.moraxh.dev/glasses/v1/glasses.onnx",
});
// call detect() each frame via requestAnimationFrame
return <p>{result?.glasses ? "Wearing glasses" : "No glasses"}</p>;
}import { GlassesDetectorNode } from "@framefind/core/node";
const detector = new GlassesDetectorNode({
modelPath: "./glasses.onnx",
});
await detector.load();
const result = await detector.detectFromImagePath("./photo.jpg"); // requires sharp- Receives a video frame, canvas, or image buffer
- If MediaPipe landmarks are provided, extracts the eye region using 34 keypoints
- Resizes to 112×112 and normalizes with ImageNet mean/std
- Runs the ONNX model, gets a logit → sigmoid → probability
- Smooths the last N predictions to avoid flickering
- Returns
{ glasses, probability, faceDetected }
Glasses detection is the starting point, not the ceiling. The name isn't tied to any single task — FrameFind is about understanding what's on and around a face, frame by frame.
Planned detectors:
- Mask — is the person wearing a face mask?
- Eyes open/closed — blink detection, drowsiness
- Face attributes — age range, expression, skin tone-agnostic attributes
- Head pose — yaw, pitch, roll estimation
- Attention — is the person looking at the screen?
- Iris tracking — gaze direction without eye-tracking hardware
Same architecture, same on-device approach. Each detector ships as its own model and package so you only pull in what you need.
