dictate

Voice-to-text for Linux and macOS. Speak -> transcribe -> clipboard.

Installation

Homebrew:

brew tap tindotdev/tap
brew install tindotdev/tap/dictate-cli

From source:

git clone https://github.com/tindotdev/dictate.git && cd dictate
just install

Usage

dictate                        # record -> clipboard
dictate --stdout               # record -> stdout (+ clipboard)
dictate --no-clipboard         # record -> stdout only
dictate --stop-after 30s       # auto-stop after 30 seconds, then transcribe
dictate --language en          # language hint for accuracy
dictate --device <query>       # select device by name or index
dictate --save-last-audio      # save audio locally for retry
dictate retry                  # rerun Whisper + post-process on saved audio
dictate --transcription-provider fireworks
dictate -p --post-process-provider fireworks
dictate devices                # list audio input devices

Recording control:

Press Enter to stop recording and continue to transcription
For headless/scripted use, pass --stop-after <duration> (for example 30s, 2m, 500ms)
Press Ctrl+C to cancel the current session
Cancelled dictate and dictate retry runs exit with status 130
After cancellation is observed, dictate does not print transcript output or write to the clipboard

Retry the last recording

If a long dictation did not come out the way you wanted, save the audio once and rerun transcription without speaking again:

dictate --save-last-audio -p
dictate retry
dictate retry --transcription-model large-v3
dictate retry --transcription-provider fireworks --post-process-provider fireworks
dictate retry --prompt "Keep the wording literal" -p
dictate retry --no-post-process

Notes:

dictate retry reuses the last audio saved with --save-last-audio
Retry inherits the saved recording's transcription and post-process settings by default
Any flags passed to dictate retry override the saved settings for that run
Retry replays the saved provider, endpoint, and raw model choices unless you override them
Direct dictate recordings use shorter network timeouts and fewer retries so interactive/hotkey use stays bounded; dictate retry keeps longer, more persistent budgets
Use dictate retry --no-post-process to compare raw Whisper output against a previously cleaned-up run
The saved audio stays available until it is replaced by a later --save-last-audio recording

Providers

Transcription and post-processing can be resolved independently:

dictate --transcription-provider groq
dictate --transcription-provider fireworks
dictate --transcription-provider openai-compatible --base-url https://host/v1/audio/transcriptions --transcription-model-id my-whisper-model
dictate -p --post-process-provider openai-compatible --post-process-base-url https://host/v1/chat/completions --post-process-model my-chat-model

Notes:

--transcription-model stays semantic and provider-aware: large-v3-turbo or large-v3
--transcription-model-id sets the raw provider ASR model id directly
openai-compatible transcription requires both a raw model id and an explicit endpoint
openai-compatible post-processing requires both a raw chat model and an explicit endpoint
--base-url and --post-process-base-url still override the transcription and post-process endpoints for the current run

Output formats

dictate --format verbose_json  # structured JSON
dictate --timestamps word      # word-level timestamps (requires verbose_json)

When --post-process is enabled with --format json or --format verbose_json, output includes:

post_processed (boolean)
post_process_status (applied, failed_fallback, skipped_verbose_json, skipped_empty_text, not_configured)

Post-processing (LLM cleanup)

Optional post-processing cleans raw Whisper output (filler words, punctuation, capitalization).

dictate -p
dictate -p --post-process-model openai/gpt-oss-120b
dictate -p --post-process-provider fireworks

Notes:

Groq default post-processing model: openai/gpt-oss-20b
Fireworks default post-processing model: accounts/fireworks/models/gpt-oss-120b
Fail-safe behavior: if post-processing fails, raw transcription text is still returned
--format verbose_json skips post-processing to avoid mismatches between top-level text and timestamped segments/words
--post-process-base-url is available for OpenAI-compatible chat endpoints

Quality is tracked with golden-case evaluations (just eval-prompt, just eval-matrix):

14 golden scenarios (filler removal, technical terms, punctuation, mixed and edge cases)
Best tested matrix configuration: openai/gpt-oss-20b + cleanup_v2.txt = 14/14 (100%)
Current built-in runtime configuration: openai/gpt-oss-20b + cleanup.txt = 13/14 (93%)

Detailed methodology and latest results:

crates/dictate-core/src/postprocess/prompts/README.md
crates/dictate-core/src/postprocess/prompts/RESULTS-latest.md

Vocabulary

Custom terms improve transcription accuracy for technical jargon, names, and abbreviations.

dictate vocab add AWS OpenAI
dictate vocab remove AWS
dictate vocab list
dictate vocab edit

Vocabulary hints are injected into Whisper's prompt parameter and stored at ~/.config/dictate/.

Configuration

Required:

export GROQ_API_KEY="your-api-key"  # console.groq.com/keys

Optional:

export GROQ_BASE_URL="..."       # override transcription endpoint
export GROQ_CHAT_BASE_URL="..."  # override post-process chat endpoint
export FIREWORKS_API_KEY="..."   # fireworks.ai account key
export FIREWORKS_BASE_URL="..."  # override Fireworks transcription endpoint
export FIREWORKS_CHAT_BASE_URL="..."  # override Fireworks chat endpoint
export OPENAI_COMPATIBLE_API_KEY="..."
export OPENAI_COMPATIBLE_BASE_URL="..."
export OPENAI_COMPATIBLE_CHAT_BASE_URL="..."

Add to your shell profile for persistence. From source installs, just add-secret can help.

Shell completions

Generate and install completions for your shell (fish, bash, zsh ):

dictate completions fish > ~/.config/fish/completions/dictate.fish
dictate completions bash > ~/.local/share/bash-completion/completions/dictate
dictate completions zsh > ~/.zfunc/_dictate  # then add ~/.zfunc to fpath

Neovim

dictate.nvim controls dictate-cli directly from Neovim. It starts dictate record, stops capture with SIGUSR1, and cancels in-flight transcription with SIGINT. Transcript text is inserted into the buffer instead of relying on the clipboard.

Install with lazy.nvim using opts, cmd, and keys:

{
  "tindotdev/dictate",
  opts = {},
  cmd = { "DictateStart", "DictateStop", "DictateToggle" },
  keys = {
    {
      "<F9>",
      "<cmd>DictateToggle<cr>",
      desc = "Dictate Toggle",
    },
  },
}

Default plugin options:

require("dictate").setup({
  cmd = { "dictate" },
  args = {},
  clipboard = false,
  insert_trailing_space = true,
  disabled_filetypes = { "help", "lazy", "mason", "TelescopePrompt" },
  disabled_buftypes = { "nofile", "prompt", "quickfix", "terminal" },
})

Commands (available after the plugin is loaded):

:DictateStart
:DictateStop
:DictateToggle

Health checks:

:checkhealth dictate

Local plugin development

For local plugin work, use the in-repo minimal Neovim profile instead of your full editor config. It prepends this repo to runtimepath, maps <F9> to :DictateToggle, and keeps the test loop isolated from unrelated plugins.

Fixture-backed smoke test (no microphone or API key required):

just nvim-dev-fake

Useful variants:

just nvim-dev-fake scenario=post_process
just nvim-dev-fake transcript="custom fixture text"

Real end-to-end test against the local debug binary:

cargo build -p dictate-cli
GROQ_API_KEY=... just nvim-dev-real

just nvim-dev-real now isolates dictate-cli from your normal user config by pointing XDG_CONFIG_HOME and XDG_DATA_HOME at repo-local directories under tmp/nvim-dev-real/. It seeds empty vocabulary.json and dictionary.json from tests/manual/fixtures/config/dictate/vocabulary.json and tests/manual/fixtures/config/dictate/dictionary.json, so malformed files in ~/.config/dictate do not affect plugin testing.

Inside the minimal profile:

Run :checkhealth dictate
Use <F9> or :DictateToggle to start and stop
Run :DictateDevInfo to confirm which command/profile is active

Automated Neovim regression coverage remains available with:

just test-nvim

Platform requirements

Linux:

Audio: PipeWire or PulseAudio
Clipboard: wl-clipboard (Wayland) or xclip/xsel (X11)
Launcher notifications: libnotify (notify-send) and glib2 (gdbus)

macOS:

Grant microphone access to your terminal app
Clipboard uses built-in pbcopy (no extra clipboard package required)

Global shortcut

Install the canonical launcher files from this repo, then bind them to keyboard shortcuts:

just install-launchers

The desktop launcher is a toggle — press once to start recording, press again to stop. It runs headlessly (no terminal window) and uses desktop notifications for feedback:

Recording — persistent notification stays visible while recording
Transcribing — replaces the recording notification after the launcher sends a dedicated stop signal
Cancelling — pressing the shortcut again during transcription requests cancellation
Done — shows result for 3 seconds, then auto-dismisses

Auto-stops after 5 minutes by default. Override with DICTATE_TIMEOUT=120 (seconds). If transcription still hangs after stop, the launcher escalates after DICTATE_TRANSCRIBE_TIMEOUT=45 seconds by default.

Linux compositor examples:

Sway: bindsym $mod+d exec dictate-launch -p
Hyprland: bind = SUPER, D, exec, dictate-launch -p
COSMIC: super + semicolon -> dictate-launch -p

All flags after dictate-launch are passed through to dictate (e.g. -p for post-processing, --language en, --device "USB Mic").

Launcher control mapping on Linux/macOS:

First press starts recording
Second press during recording sends SIGUSR1 to stop recording and continue to transcription
Press again during transcription to send SIGINT cancellation
Terminal Ctrl+C still means cancel, not stop

Kitty adapter:

Install with just install-launcher-kitty or just install-launchers
Bind in Kitty with:

map kitty_mod+d launch --type=background dictate-kitty
map kitty_mod+shift+d launch --type=background dictate-kitty retry

The canonical launcher sources are:

Repo-side launcher validation:

just test-launchers runs start/stop/retry smoke tests against fake dictate, notification, and Kitty binaries
just lint-launchers runs shellcheck

Compatibility wrappers remain at:

Debugging helpers:

DICTATE_STATE_DIR=/tmp/dictate-test overrides where pid/state/output files are written
DICTATE_LAUNCH_LOG=/tmp/dictate-launch.log appends launcher events and child exec lines
DICTATE_LAUNCH_TRACE=1 enables Bash xtrace; if DICTATE_LAUNCH_LOG is set, trace output goes there
DICTATE_BIN=/path/to/dictate and DICTATE_TRANSCRIPTION_MODEL=... let you test alternate binaries/settings

On macOS outside Kitty, create a system shortcut (Shortcuts or Automator) that runs dictate in your terminal.

Architecture

microphone -> cpal -> resample (16kHz mono) -> chunking -> OpenAI-compatible ASR -> optional OpenAI-compatible LLM cleanup -> clipboard/stdout

Audio capture: cpal with real-time resampling
Ring buffer: lock-free SPSC for zero-allocation transfer
Progressive chunking: overlapping chunks for long recordings
Transcription: Groq, Fireworks, or another OpenAI-compatible ASR endpoint
Post-processing: optional Groq, Fireworks, or OpenAI-compatible chat cleanup with fail-safe fallback
Clipboard: platform-aware with fallback to stderr

Troubleshooting

Audio:

Linux: check systemctl --user status pipewire, then run dictate devices
Linux: if needed, add your user to the audio group: sudo usermod -aG audio $USER (re-login required)
macOS: verify microphone permission in System Settings -> Privacy & Security -> Microphone

Clipboard:

Linux (Wayland): echo "test" | wl-copy && wl-paste
Linux (X11): verify xclip or xsel is installed
macOS: echo "test" | pbcopy && pbpaste

API errors:

401: invalid API key
429: rate limited (retries automatically)
413: recording too long
dictate retry: first create a reusable recording with dictate --save-last-audio

Privacy

Audio is sent to the provider you configure for transcription and optional post-processing. By default, audio is not stored locally. If you pass --save-last-audio, dictate stores one reusable local recording until it is replaced by a later saved recording. Review the privacy policy and terms for whichever provider you use, such as Groq or Fireworks.

Acknowledgments

Audio pipeline design inspired by whis.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 121 Commits
.github		.github
assets		assets
contrib		contrib
crates		crates
docs/exec-plans/completed		docs/exec-plans/completed
lua/dictate		lua/dictate
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
PLANS.md		PLANS.md
README.md		README.md
dist-workspace.toml		dist-workspace.toml
justfile		justfile
lefthook.yml		lefthook.yml
release-plz.toml		release-plz.toml
renovate.json		renovate.json
selene.toml		selene.toml
stylua.toml		stylua.toml
vim.yml		vim.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dictate

Installation

Usage

Retry the last recording

Providers

Output formats

Post-processing (LLM cleanup)

Vocabulary

Configuration

Shell completions

Neovim

Local plugin development

Platform requirements

Global shortcut

Architecture

Troubleshooting

Privacy

Acknowledgments

License

About

Uh oh!

Releases 14

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

dictate

Installation

Usage

Retry the last recording

Providers

Output formats

Post-processing (LLM cleanup)

Vocabulary

Configuration

Shell completions

Neovim

Local plugin development

Platform requirements

Global shortcut

Architecture

Troubleshooting

Privacy

Acknowledgments

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages