Instant dictation for macOS and Linux.
One hotkey. No window. Text appears where you type.
Speaking is 3–4× faster than typing.
Most dictation tools are heavyweight, cloud-locked, or tied to a specific app.
VoiceToText is a background utility: press a hotkey, speak, and the text appears in whatever you're already doing.
Press once to start recording. Press again to stop.
Text is on your clipboard and pasted automatically.
No window. No clicks. Zero friction.
Both platforms use whisper-large-v3-turbo via Groq's API.
~8× faster inference than the full Whisper model with minimal accuracy loss. Audio is captured at 16 kHz mono — Whisper's native rate — to keep uploads small and latency low.
| Component | Role |
|---|---|
HotkeyManager | NSEvent global monitor for Option+Space |
RecordingManager | AVAudioRecorder → .m4a at 16 kHz mono |
GroqTranscriber | URLSession multipart POST, 30s timeout |
AutoPaster | NSPasteboard + CGEvent Cmd+V synthesis |
TranscriptionJournal | Daily .md log in ~/Documents/VoiceToText/ |
StatusItemController | Menu bar icon with state badges |
| Module | Role |
|---|---|
hotkey_daemon | Async event loop, XDG GlobalShortcuts portal |
recorder | pw-record | ffmpeg → Opus/OGG at 16 kbps |
groq | Pure request builder + response parser |
transport | reqwest blocking HTTP execution |
autopaste | XDG RemoteDesktop portal → Ctrl+Y keysyms |
tray_app | ksni StatusNotifierItem tray |
On unsupported Linux desktops, auto-paste is skipped gracefully — the transcript is still on the clipboard.
Every transcription is appended to a daily file in ~/Documents/VoiceToText/. Both platforms produce identical Markdown.
One Groq API key is the only required configuration. Three ways to provide it:
No window ever opens during normal use.
The tray icon and its badge are the entire interface.
| Idle | Icon in tray — no badge |
| Recording | Red dot badge (macOS) / attention status (Linux) |
| Transcribing | Grey pill badge |
| Done | Completion sound + text pasted |