VoiceToText

Instant dictation for macOS and Linux.
One hotkey. No window. Text appears where you type.

why

Typing is slow

Speaking is 3–4× faster than typing.
Most dictation tools are heavyweight, cloud-locked, or tied to a specific app.

VoiceToText is a background utility: press a hotkey, speak, and the text appears in whatever you're already doing.

how it works

Four steps

⌥ Space
record
transcribe
paste

Press once to start recording. Press again to stop.
Text is on your clipboard and pasted automatically.
No window. No clicks. Zero friction.

platforms

macOS + Linux

macOS

  • Swift + AppKit
  • Menu bar app
  • Option + Space hotkey
  • AVFoundation recording
  • CGEvent auto-paste
  • macOS 14+

Linux

  • Rust
  • System tray (SNI)
  • Alt + Space hotkey
  • PipeWire + ffmpeg
  • XDG portal auto-paste
  • GNOME Wayland
transcription

Groq Whisper

Both platforms use whisper-large-v3-turbo via Groq's API.

~8× faster inference than the full Whisper model with minimal accuracy loss. Audio is captured at 16 kHz mono — Whisper's native rate — to keep uploads small and latency low.

POST api.groq.com/openai/v1/audio/transcriptions
model: whisper-large-v3-turbo
format: multipart/form-data (.m4a / .ogg)
macOS internals

macOS architecture

ComponentRole
HotkeyManagerNSEvent global monitor for Option+Space
RecordingManagerAVAudioRecorder → .m4a at 16 kHz mono
GroqTranscriberURLSession multipart POST, 30s timeout
AutoPasterNSPasteboard + CGEvent Cmd+V synthesis
TranscriptionJournalDaily .md log in ~/Documents/VoiceToText/
StatusItemControllerMenu bar icon with state badges
linux internals

Linux architecture

ModuleRole
hotkey_daemonAsync event loop, XDG GlobalShortcuts portal
recorderpw-record | ffmpeg → Opus/OGG at 16 kbps
groqPure request builder + response parser
transportreqwest blocking HTTP execution
autopasteXDG RemoteDesktop portal → Ctrl+Y keysyms
tray_appksni StatusNotifierItem tray
global hotkeys

Platform-specific approach

macOS

  • NSEvent global monitor
  • Option + Space (keyCode 49)
  • Requires Accessibility permission
  • Works in all apps system-wide

Linux

  • XDG GlobalShortcuts portal
  • Alt + Space (preferred)
  • Sandboxed — no special perms
  • Requires xdg-desktop-portal-gnome
auto-paste

Platform-specific approach

macOS

  • NSPasteboard + CGEvent
  • Synthesises Cmd+V
  • Works in all apps
  • Requires Accessibility

Linux

  • XDG RemoteDesktop portal
  • Injects Ctrl+Y keysym
  • GNOME Wayland only
  • Restore token cached on disk

On unsupported Linux desktops, auto-paste is skipped gracefully — the transcript is still on the clipboard.

journal

Daily Markdown log

Every transcription is appended to a daily file in ~/Documents/VoiceToText/. Both platforms produce identical Markdown.

# Transcriptions - May 31, 2026

## 2:34 PM
The text you dictated goes here.

---
configuration

Just an API key

One Groq API key is the only required configuration. Three ways to provide it:

# Environment variable (any platform)
export GROQ_API_KEY="your-key"

# macOS: interactive setup
make setup

# Linux: config file
~/.config/voicetotext/config.json
design

Zero UI

No window ever opens during normal use.
The tray icon and its badge are the entire interface.

IdleIcon in tray — no badge
RecordingRed dot badge (macOS) / attention status (Linux)
TranscribingGrey pill badge
DoneCompletion sound + text pasted
get started · macOS

macOS quick start

# 1. Get a Groq API key
# console.groq.com/keys

# 2. Store it
make setup

# 3. Build and install
make build

# 4. Grant Accessibility permission
# System Settings → Privacy → Accessibility

# Press Option+Space to record
get started · linux

Linux quick start

# 1. Install runtime deps
# pipewire ffmpeg wl-clipboard

# 2. Set your Groq API key
export GROQ_API_KEY="your-key"

# 3. Build
cd linux && cargo build --release

# 4. Run the daemon
./target/release/voicetotext daemon

# Press Alt+Space to record