Skip to content

Latest commit

 

History

History
105 lines (72 loc) · 3.69 KB

File metadata and controls

105 lines (72 loc) · 3.69 KB

AGENTS.md

Development Commands

Prerequisites:

  • Rust (latest stable)
  • Bun package manager

Core Development:

# Install dependencies
bun install

# Run in development mode
bun run tauri dev

# Build for production
bun run tauri build

# Frontend only development
bun run dev        # Start Vite dev server

# Checck for issues (rust and typescript linting and auto formatting and build check, replaces tscx, cargo check, cargo fmt, prettier)
bun check

Use bun check instead of tscx for linting and formatting. It runs both Rust and TypeScript checks in one command.

Architecture Overview

Handy is a cross-platform desktop speech-to-text application built with Tauri (Rust backend + React/TypeScript frontend).

Core Components

Backend (Rust - src-tauri/src/):

  • lib.rs - Main application entry point with Tauri setup, tray menu, and managers
  • managers/ - Core business logic managers:
    • audio.rs - Audio recording and device management
    • model.rs - Whisper model downloading and management
    • transcription.rs - Speech-to-text processing pipeline
  • audio_toolkit/ - Low-level audio processing:
    • audio/ - Device enumeration, recording, resampling
    • vad/ - Voice Activity Detection using Silero VAD
  • command.rs - Tauri command handlers for frontend communication
  • commands/ - More tauri command handlers for frontend communication divided by feature
  • shortcut.rs - Global keyboard shortcut handling
  • settings.rs - Application settings management

Frontend (React/TypeScript - src/):

  • App.tsx - Main application component with onboarding flow
  • components/settings/ - Settings UI components
  • components/model-selector/ - Model management interface
  • hooks/ - React hooks for settings and model management
  • lib/types.ts - Shared TypeScript type definitions

Key Architecture Patterns

Manager Pattern: Core functionality is organized into managers (Audio, Model, Transcription) that are initialized at startup and managed by Tauri's state system.

Command-Event Architecture: Frontend communicates with backend via Tauri commands, backend sends updates via events.

Pipeline Processing: Audio → VAD → Whisper → Text output with configurable components at each stage.

Technology Stack

Core Libraries:

  • whisper-rs - Local Whisper inference with GPU acceleration
  • cpal - Cross-platform audio I/O
  • vad-rs - Voice Activity Detection
  • rdev - Global keyboard shortcuts
  • rubato - Audio resampling
  • rodio - Audio playback for feedback sounds

Platform-Specific Features:

  • macOS: Metal acceleration for Whisper, accessibility permissions
  • Windows: Vulkan acceleration, code signing
  • Linux: OpenBLAS + Vulkan acceleration

Application Flow

  1. Initialization: App starts minimized to tray, loads settings, initializes managers
  2. Model Setup: First-run downloads preferred Whisper model (Small/Medium/Turbo/Large)
  3. Recording: Global shortcut triggers audio recording with VAD filtering
  4. Processing: Audio sent to Whisper model for transcription
  5. Output: Text pasted to active application via system clipboard

Settings System

Settings are stored using Tauri's store plugin with reactive updates:

  • Keyboard shortcuts (configurable, supports push-to-talk)
  • Audio devices (microphone/output selection)
  • Model preferences (Small/Medium/Turbo/Large Whisper variants)
  • Audio feedback and translation options
  • Post-processing options (LLM provider, model, prompt, API key)

Single Instance Architecture

The app enforces single instance behavior - launching when already running brings the settings window to front rather than creating a new process.