2026-03-07-voice-pilot-brainstorm
Voice Pilot Brainstorm
Date: 2026-03-07 Status: Ready for planning
What We’re Building
A voice control daemon for managing multiple Claude Code CLI sessions. When a Claude session finishes and needs input, Voice Pilot:
- Plays a notification sound
- Announces which project is ready via TTS (macOS
say) - Activates the microphone to listen for a response
- User says “More” → reads back a summary of the last output
- User speaks a response → transcribed and sent to the correct tmux session via
tmux send-keys - User says “Skip” / “Ignore” → dismisses, re-notifies later
Works both at-desk (multi-session coding) and away-from-desk (overnight agents).
Why This Approach
Architecture: Claude Code Hooks + Listener daemon
- Event-driven — Claude Code’s
Notificationhook triggers a script when Claude finishes, writing to a queue. No polling, no heuristic prompt detection. - Listener daemon picks up queued notifications, handles TTS, mic, and response routing.
- Leverages existing tmux infrastructure from the overnight agent.
- More reliable than tmux polling (which can misfire on
❯appearing in output).
Speech-to-text: Cloud API (OpenAI Whisper API or Google) — accurate, zero local model setup.
Mic activation: After notification only — mic is not always-on. Only activates when a session needs input and the notification plays.
Scope: Monitor only — user starts their own tmux/Claude sessions. Voice Pilot just watches and responds.
Key Decisions
- Hooks over polling — Claude Code
Notificationhook writes to a queue file; daemon reads from it. No tmux content scraping. - Cloud STT — OpenAI Whisper API for transcription. User already has OpenAI access (Codex/ChatGPT).
- Mic activates post-notification only — no wake word, no always-on listening. Simplest and most privacy-friendly.
- macOS native TTS —
saycommand,afplayfor notification sound. Zero dependencies for output. - tmux send-keys for routing — responses go to the correct pane via tmux, same pattern as overnight agent.
- Monitor-only scope — doesn’t launch sessions, just watches existing ones.
Voice Commands
| Command | Action |
|---|---|
| ”More” / “Details” | Read back last ~15 lines of output via TTS |
| ”Skip” / “Ignore” | Dismiss notification, re-notify on next idle |
| Any other speech | Transcribed and sent as input to the session |
Components
- Hook script — installed per-project in
.claude/settings.json, writes notification events to a shared queue (e.g.,/tmp/voice-pilot-queue/) - Listener daemon (
voice_pilot.py) — reads queue, plays notification, TTS, captures mic, transcribes, routes response - Requirements — Python with
speech_recognition,pyaudio; brewportaudio; macOSsay/afplay
Open Questions
- Should the queue be a file, a Unix socket, or a named pipe? (File is simplest)
- How to handle multiple notifications arriving close together? (FIFO queue, announce each in order)
- Should there be a web UI or status dashboard? (Probably not for v1 — terminal output is fine)
- Integration with overnight agent’s existing notification plans (ntfy.sh)?
Next
Run /workflows:plan to create implementation plan.