AGENTS.md

Development Commands

Prerequisites:

Rust (latest stable)
Bun package manager

Core Development:

# Install dependencies
bun install

# Run in development mode
bun run tauri dev

# Build for production
bun run tauri build

# Frontend only development
bun run dev        # Start Vite dev server

# Checck for issues (rust and typescript linting and auto formatting and build check, replaces tscx, cargo check, cargo fmt, prettier)
bun check

Use bun check instead of tscx for linting and formatting. It runs both Rust and TypeScript checks in one command.

Architecture Overview

Handy is a cross-platform desktop speech-to-text application built with Tauri (Rust backend + React/TypeScript frontend).

Core Components

Backend (Rust - src-tauri/src/):

lib.rs - Main application entry point with Tauri setup, tray menu, and managers
managers/ - Core business logic managers:
- audio.rs - Audio recording and device management
- model.rs - Whisper model downloading and management
- transcription.rs - Speech-to-text processing pipeline
audio_toolkit/ - Low-level audio processing:
- audio/ - Device enumeration, recording, resampling
- vad/ - Voice Activity Detection using Silero VAD
command.rs - Tauri command handlers for frontend communication
commands/ - More tauri command handlers for frontend communication divided by feature
shortcut.rs - Global keyboard shortcut handling
settings.rs - Application settings management

Frontend (React/TypeScript - src/):

App.tsx - Main application component with onboarding flow
components/settings/ - Settings UI components
components/model-selector/ - Model management interface
hooks/ - React hooks for settings and model management
lib/types.ts - Shared TypeScript type definitions

Key Architecture Patterns

Manager Pattern: Core functionality is organized into managers (Audio, Model, Transcription) that are initialized at startup and managed by Tauri's state system.

Command-Event Architecture: Frontend communicates with backend via Tauri commands, backend sends updates via events.

Pipeline Processing: Audio → VAD → Whisper → Text output with configurable components at each stage.

Technology Stack

Core Libraries:

whisper-rs - Local Whisper inference with GPU acceleration
cpal - Cross-platform audio I/O
vad-rs - Voice Activity Detection
rdev - Global keyboard shortcuts
rubato - Audio resampling
rodio - Audio playback for feedback sounds

Platform-Specific Features:

macOS: Metal acceleration for Whisper, accessibility permissions
Windows: Vulkan acceleration, code signing
Linux: OpenBLAS + Vulkan acceleration

Application Flow

Initialization: App starts minimized to tray, loads settings, initializes managers
Model Setup: First-run downloads preferred Whisper model (Small/Medium/Turbo/Large)
Recording: Global shortcut triggers audio recording with VAD filtering
Processing: Audio sent to Whisper model for transcription
Output: Text pasted to active application via system clipboard

Settings System

Settings are stored using Tauri's store plugin with reactive updates:

Keyboard shortcuts (configurable, supports push-to-talk)
Audio devices (microphone/output selection)
Model preferences (Small/Medium/Turbo/Large Whisper variants)
Audio feedback and translation options
Post-processing options (LLM provider, model, prompt, API key)

Single Instance Architecture

The app enforces single instance behavior - launching when already running brings the settings window to front rather than creating a new process.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Development Commands

Architecture Overview

Core Components

Key Architecture Patterns

Technology Stack

Application Flow

Settings System

Single Instance Architecture

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Development Commands

Architecture Overview

Core Components

Key Architecture Patterns

Technology Stack

Application Flow

Settings System

Single Instance Architecture