Prerequisites:
Core Development:
# Install dependencies
bun install
# Run in development mode
bun run tauri dev
# Build for production
bun run tauri build
# Frontend only development
bun run dev # Start Vite dev server
# Checck for issues (rust and typescript linting and auto formatting and build check, replaces tscx, cargo check, cargo fmt, prettier)
bun checkUse bun check instead of tscx for linting and formatting. It runs both Rust and TypeScript checks in one command.
Handy is a cross-platform desktop speech-to-text application built with Tauri (Rust backend + React/TypeScript frontend).
Backend (Rust - src-tauri/src/):
lib.rs- Main application entry point with Tauri setup, tray menu, and managersmanagers/- Core business logic managers:audio.rs- Audio recording and device managementmodel.rs- Whisper model downloading and managementtranscription.rs- Speech-to-text processing pipeline
audio_toolkit/- Low-level audio processing:audio/- Device enumeration, recording, resamplingvad/- Voice Activity Detection using Silero VAD
command.rs- Tauri command handlers for frontend communicationcommands/- More tauri command handlers for frontend communication divided by featureshortcut.rs- Global keyboard shortcut handlingsettings.rs- Application settings management
Frontend (React/TypeScript - src/):
App.tsx- Main application component with onboarding flowcomponents/settings/- Settings UI componentscomponents/model-selector/- Model management interfacehooks/- React hooks for settings and model managementlib/types.ts- Shared TypeScript type definitions
Manager Pattern: Core functionality is organized into managers (Audio, Model, Transcription) that are initialized at startup and managed by Tauri's state system.
Command-Event Architecture: Frontend communicates with backend via Tauri commands, backend sends updates via events.
Pipeline Processing: Audio → VAD → Whisper → Text output with configurable components at each stage.
Core Libraries:
whisper-rs- Local Whisper inference with GPU accelerationcpal- Cross-platform audio I/Ovad-rs- Voice Activity Detectionrdev- Global keyboard shortcutsrubato- Audio resamplingrodio- Audio playback for feedback sounds
Platform-Specific Features:
- macOS: Metal acceleration for Whisper, accessibility permissions
- Windows: Vulkan acceleration, code signing
- Linux: OpenBLAS + Vulkan acceleration
- Initialization: App starts minimized to tray, loads settings, initializes managers
- Model Setup: First-run downloads preferred Whisper model (Small/Medium/Turbo/Large)
- Recording: Global shortcut triggers audio recording with VAD filtering
- Processing: Audio sent to Whisper model for transcription
- Output: Text pasted to active application via system clipboard
Settings are stored using Tauri's store plugin with reactive updates:
- Keyboard shortcuts (configurable, supports push-to-talk)
- Audio devices (microphone/output selection)
- Model preferences (Small/Medium/Turbo/Large Whisper variants)
- Audio feedback and translation options
- Post-processing options (LLM provider, model, prompt, API key)
The app enforces single instance behavior - launching when already running brings the settings window to front rather than creating a new process.