VoiceToText

Instant dictation for macOS and Linux.
One hotkey. No window. Text appears where you type.

why

Typing is slow

Speaking is 3–4× faster than typing.
Most dictation tools are heavyweight, cloud-locked, or tied to a specific app.

VoiceToText is a background utility: press a hotkey, speak, and the text appears in whatever you're already doing.

how it works

Four steps

⌥ Space

→

record

→

transcribe

→

paste

Press once to start recording. Press again to stop.
Text is on your clipboard and pasted automatically.
No window. No clicks. Zero friction.

platforms

macOS + Linux

macOS

Swift + AppKit
Menu bar app
Option + Space hotkey
AVFoundation recording
CGEvent auto-paste
macOS 14+

Linux

Rust
System tray (SNI)
Alt + Space hotkey
PipeWire + ffmpeg
XDG portal auto-paste
GNOME Wayland

transcription

Groq Whisper

Both platforms use whisper-large-v3-turbo via Groq's API.

~8× faster inference than the full Whisper model with minimal accuracy loss. Audio is captured at 16 kHz mono — Whisper's native rate — to keep uploads small and latency low.

POST api.groq.com/openai/v1/audio/transcriptions
model: whisper-large-v3-turbo
format: multipart/form-data (.m4a / .ogg)

macOS internals

macOS architecture

Component	Role
`HotkeyManager`	NSEvent global monitor for Option+Space
`RecordingManager`	AVAudioRecorder → .m4a at 16 kHz mono
`GroqTranscriber`	URLSession multipart POST, 30s timeout
`AutoPaster`	NSPasteboard + CGEvent Cmd+V synthesis
`TranscriptionJournal`	Daily .md log in ~/Documents/VoiceToText/
`StatusItemController`	Menu bar icon with state badges

linux internals

Linux architecture

Module	Role
`hotkey_daemon`	Async event loop, XDG GlobalShortcuts portal
`recorder`	pw-record \| ffmpeg → Opus/OGG at 16 kbps
`groq`	Pure request builder + response parser
`transport`	reqwest blocking HTTP execution
`autopaste`	XDG RemoteDesktop portal → Ctrl+Y keysyms
`tray_app`	ksni StatusNotifierItem tray

global hotkeys

Platform-specific approach

macOS

NSEvent global monitor
Option + Space (keyCode 49)
Requires Accessibility permission
Works in all apps system-wide

Linux

XDG GlobalShortcuts portal
Alt + Space (preferred)
Sandboxed — no special perms
Requires xdg-desktop-portal-gnome

auto-paste

Platform-specific approach

macOS

NSPasteboard + CGEvent
Synthesises Cmd+V
Works in all apps
Requires Accessibility

Linux

XDG RemoteDesktop portal
Injects Ctrl+Y keysym
GNOME Wayland only
Restore token cached on disk

On unsupported Linux desktops, auto-paste is skipped gracefully — the transcript is still on the clipboard.

journal

Daily Markdown log

Every transcription is appended to a daily file in ~/Documents/VoiceToText/. Both platforms produce identical Markdown.

# Transcriptions - May 31, 2026

## 2:34 PM
The text you dictated goes here.

---

configuration

Just an API key

One Groq API key is the only required configuration. Three ways to provide it:

# Environment variable (any platform)
export GROQ_API_KEY="your-key"

# macOS: interactive setup
make setup

# Linux: config file
~/.config/voicetotext/config.json

design

Zero UI

No window ever opens during normal use.
The tray icon and its badge are the entire interface.

Idle	Icon in tray — no badge
Recording	Red dot badge (macOS) / attention status (Linux)
Transcribing	Grey pill badge
Done	Completion sound + text pasted

get started · macOS

macOS quick start

# 1. Get a Groq API key
# console.groq.com/keys

# 2. Store it
make setup

# 3. Build and install
make build

# 4. Grant Accessibility permission
# System Settings → Privacy → Accessibility

# Press Option+Space to record

get started · linux

Linux quick start

# 1. Install runtime deps
# pipewire ffmpeg wl-clipboard

# 2. Set your Groq API key
export GROQ_API_KEY="your-key"

# 3. Build
cd linux && cargo build --release

# 4. Run the daemon
./target/release/voicetotext daemon

# Press Alt+Space to record