Voice-to-text for Linux and macOS. Speak -> transcribe -> clipboard.
Homebrew:
brew tap tindotdev/tap
brew install tindotdev/tap/dictate-cliFrom source:
git clone https://github.com/tindotdev/dictate.git && cd dictate
just installdictate # record -> clipboard
dictate --stdout # record -> stdout (+ clipboard)
dictate --no-clipboard # record -> stdout only
dictate --stop-after 30s # auto-stop after 30 seconds, then transcribe
dictate --language en # language hint for accuracy
dictate --device <query> # select device by name or index
dictate --save-last-audio # save audio locally for retry
dictate retry # rerun Whisper + post-process on saved audio
dictate --transcription-provider fireworks
dictate -p --post-process-provider fireworks
dictate devices # list audio input devicesRecording control:
- Press
Enterto stop recording and continue to transcription - For headless/scripted use, pass
--stop-after <duration>(for example30s,2m,500ms) - Press
Ctrl+Cto cancel the current session - Cancelled
dictateanddictate retryruns exit with status130 - After cancellation is observed,
dictatedoes not print transcript output or write to the clipboard
If a long dictation did not come out the way you wanted, save the audio once and rerun transcription without speaking again:
dictate --save-last-audio -p
dictate retry
dictate retry --transcription-model large-v3
dictate retry --transcription-provider fireworks --post-process-provider fireworks
dictate retry --prompt "Keep the wording literal" -p
dictate retry --no-post-processNotes:
dictate retryreuses the last audio saved with--save-last-audio- Retry inherits the saved recording's transcription and post-process settings by default
- Any flags passed to
dictate retryoverride the saved settings for that run - Retry replays the saved provider, endpoint, and raw model choices unless you override them
- Direct
dictaterecordings use shorter network timeouts and fewer retries so interactive/hotkey use stays bounded;dictate retrykeeps longer, more persistent budgets - Use
dictate retry --no-post-processto compare raw Whisper output against a previously cleaned-up run - The saved audio stays available until it is replaced by a later
--save-last-audiorecording
Transcription and post-processing can be resolved independently:
dictate --transcription-provider groq
dictate --transcription-provider fireworks
dictate --transcription-provider openai-compatible --base-url https://host/v1/audio/transcriptions --transcription-model-id my-whisper-model
dictate -p --post-process-provider openai-compatible --post-process-base-url https://host/v1/chat/completions --post-process-model my-chat-modelNotes:
--transcription-modelstays semantic and provider-aware:large-v3-turboorlarge-v3--transcription-model-idsets the raw provider ASR model id directlyopenai-compatibletranscription requires both a raw model id and an explicit endpointopenai-compatiblepost-processing requires both a raw chat model and an explicit endpoint--base-urland--post-process-base-urlstill override the transcription and post-process endpoints for the current run
dictate --format verbose_json # structured JSON
dictate --timestamps word # word-level timestamps (requires verbose_json)When --post-process is enabled with --format json or --format verbose_json, output includes:
post_processed(boolean)post_process_status(applied,failed_fallback,skipped_verbose_json,skipped_empty_text,not_configured)
Optional post-processing cleans raw Whisper output (filler words, punctuation, capitalization).
dictate -p
dictate -p --post-process-model openai/gpt-oss-120b
dictate -p --post-process-provider fireworksNotes:
- Groq default post-processing model:
openai/gpt-oss-20b - Fireworks default post-processing model:
accounts/fireworks/models/gpt-oss-120b - Fail-safe behavior: if post-processing fails, raw transcription text is still returned
--format verbose_jsonskips post-processing to avoid mismatches between top-leveltextand timestampedsegments/words--post-process-base-urlis available for OpenAI-compatible chat endpoints
Quality is tracked with golden-case evaluations (just eval-prompt, just eval-matrix):
- 14 golden scenarios (filler removal, technical terms, punctuation, mixed and edge cases)
- Best tested matrix configuration:
openai/gpt-oss-20b+cleanup_v2.txt=14/14 (100%) - Current built-in runtime configuration:
openai/gpt-oss-20b+cleanup.txt=13/14 (93%)
Detailed methodology and latest results:
crates/dictate-core/src/postprocess/prompts/README.mdcrates/dictate-core/src/postprocess/prompts/RESULTS-latest.md
Custom terms improve transcription accuracy for technical jargon, names, and abbreviations.
dictate vocab add AWS OpenAI
dictate vocab remove AWS
dictate vocab list
dictate vocab editVocabulary hints are injected into Whisper's prompt parameter and stored at ~/.config/dictate/.
Required:
export GROQ_API_KEY="your-api-key" # console.groq.com/keysOptional:
export GROQ_BASE_URL="..." # override transcription endpoint
export GROQ_CHAT_BASE_URL="..." # override post-process chat endpoint
export FIREWORKS_API_KEY="..." # fireworks.ai account key
export FIREWORKS_BASE_URL="..." # override Fireworks transcription endpoint
export FIREWORKS_CHAT_BASE_URL="..." # override Fireworks chat endpoint
export OPENAI_COMPATIBLE_API_KEY="..."
export OPENAI_COMPATIBLE_BASE_URL="..."
export OPENAI_COMPATIBLE_CHAT_BASE_URL="..."Add to your shell profile for persistence. From source installs, just add-secret can help.
Generate and install completions for your shell (fish, bash, zsh ):
dictate completions fish > ~/.config/fish/completions/dictate.fish
dictate completions bash > ~/.local/share/bash-completion/completions/dictate
dictate completions zsh > ~/.zfunc/_dictate # then add ~/.zfunc to fpathdictate.nvim controls dictate-cli directly from Neovim. It starts dictate record,
stops capture with SIGUSR1, and cancels in-flight transcription with SIGINT.
Transcript text is inserted into the buffer instead of relying on the clipboard.
Install with lazy.nvim using opts, cmd, and keys:
{
"tindotdev/dictate",
opts = {},
cmd = { "DictateStart", "DictateStop", "DictateToggle" },
keys = {
{
"<F9>",
"<cmd>DictateToggle<cr>",
desc = "Dictate Toggle",
},
},
}Default plugin options:
require("dictate").setup({
cmd = { "dictate" },
args = {},
clipboard = false,
insert_trailing_space = true,
disabled_filetypes = { "help", "lazy", "mason", "TelescopePrompt" },
disabled_buftypes = { "nofile", "prompt", "quickfix", "terminal" },
})Commands (available after the plugin is loaded):
:DictateStart:DictateStop:DictateToggle
Health checks:
:checkhealth dictateFor local plugin work, use the in-repo minimal Neovim profile instead of your full
editor config. It prepends this repo to runtimepath, maps <F9> to
:DictateToggle, and keeps the test loop isolated from unrelated plugins.
Fixture-backed smoke test (no microphone or API key required):
just nvim-dev-fakeUseful variants:
just nvim-dev-fake scenario=post_process
just nvim-dev-fake transcript="custom fixture text"Real end-to-end test against the local debug binary:
cargo build -p dictate-cli
GROQ_API_KEY=... just nvim-dev-realjust nvim-dev-real now isolates dictate-cli from your normal user config by
pointing XDG_CONFIG_HOME and XDG_DATA_HOME at repo-local directories under
tmp/nvim-dev-real/. It seeds empty vocabulary.json and dictionary.json
from tests/manual/fixtures/config/dictate/vocabulary.json
and tests/manual/fixtures/config/dictate/dictionary.json,
so malformed files in ~/.config/dictate do not affect plugin testing.
Inside the minimal profile:
- Run
:checkhealth dictate - Use
<F9>or:DictateToggleto start and stop - Run
:DictateDevInfoto confirm which command/profile is active
Automated Neovim regression coverage remains available with:
just test-nvimLinux:
- Audio: PipeWire or PulseAudio
- Clipboard:
wl-clipboard(Wayland) orxclip/xsel(X11) - Launcher notifications:
libnotify(notify-send) andglib2(gdbus)
macOS:
- Grant microphone access to your terminal app
- Clipboard uses built-in
pbcopy(no extra clipboard package required)
Install the canonical launcher files from this repo, then bind them to keyboard shortcuts:
just install-launchersThe desktop launcher is a toggle — press once to start recording, press again to stop. It runs headlessly (no terminal window) and uses desktop notifications for feedback:
- Recording — persistent notification stays visible while recording
- Transcribing — replaces the recording notification after the launcher sends a dedicated stop signal
- Cancelling — pressing the shortcut again during transcription requests cancellation
- Done — shows result for 3 seconds, then auto-dismisses
Auto-stops after 5 minutes by default. Override with DICTATE_TIMEOUT=120 (seconds).
If transcription still hangs after stop, the launcher escalates after
DICTATE_TRANSCRIBE_TIMEOUT=45 seconds by default.
Linux compositor examples:
- Sway:
bindsym $mod+d exec dictate-launch -p - Hyprland:
bind = SUPER, D, exec, dictate-launch -p - COSMIC:
super + semicolon -> dictate-launch -p
All flags after dictate-launch are passed through to dictate (e.g. -p for post-processing,
--language en, --device "USB Mic").
Launcher control mapping on Linux/macOS:
- First press starts recording
- Second press during recording sends
SIGUSR1to stop recording and continue to transcription - Press again during transcription to send
SIGINTcancellation - Terminal
Ctrl+Cstill means cancel, not stop
Kitty adapter:
- Install with
just install-launcher-kittyorjust install-launchers - Bind in Kitty with:
map kitty_mod+d launch --type=background dictate-kitty
map kitty_mod+shift+d launch --type=background dictate-kitty retry
The canonical launcher sources are:
- contrib/launchers/dictate-launch
- contrib/launchers/dictate-kitty
- contrib/launchers/dictate-launch-common.sh
Repo-side launcher validation:
just test-launchersruns start/stop/retry smoke tests against fakedictate, notification, and Kitty binariesjust lint-launchersrunsshellcheck
Compatibility wrappers remain at:
Debugging helpers:
DICTATE_STATE_DIR=/tmp/dictate-testoverrides where pid/state/output files are writtenDICTATE_LAUNCH_LOG=/tmp/dictate-launch.logappends launcher events and child exec linesDICTATE_LAUNCH_TRACE=1enables Bash xtrace; ifDICTATE_LAUNCH_LOGis set, trace output goes thereDICTATE_BIN=/path/to/dictateandDICTATE_TRANSCRIPTION_MODEL=...let you test alternate binaries/settings
On macOS outside Kitty, create a system shortcut (Shortcuts or Automator) that runs dictate in your terminal.
microphone -> cpal -> resample (16kHz mono) -> chunking -> OpenAI-compatible ASR -> optional OpenAI-compatible LLM cleanup -> clipboard/stdout
- Audio capture: cpal with real-time resampling
- Ring buffer: lock-free SPSC for zero-allocation transfer
- Progressive chunking: overlapping chunks for long recordings
- Transcription: Groq, Fireworks, or another OpenAI-compatible ASR endpoint
- Post-processing: optional Groq, Fireworks, or OpenAI-compatible chat cleanup with fail-safe fallback
- Clipboard: platform-aware with fallback to stderr
Audio:
- Linux: check
systemctl --user status pipewire, then rundictate devices - Linux: if needed, add your user to the
audiogroup:sudo usermod -aG audio $USER(re-login required) - macOS: verify microphone permission in System Settings -> Privacy & Security -> Microphone
Clipboard:
- Linux (Wayland):
echo "test" | wl-copy && wl-paste - Linux (X11): verify
xcliporxselis installed - macOS:
echo "test" | pbcopy && pbpaste
API errors:
401: invalid API key429: rate limited (retries automatically)413: recording too longdictate retry: first create a reusable recording withdictate --save-last-audio
Audio is sent to the provider you configure for transcription and optional
post-processing. By default, audio is not stored locally. If you pass
--save-last-audio, dictate stores one reusable local recording until it is
replaced by a later saved recording. Review the privacy policy and terms for
whichever provider you use, such as Groq or
Fireworks.
Audio pipeline design inspired by whis.
MIT
