2026-03-07-voice-pilot-brainstorm

Voice Pilot Brainstorm

Date: 2026-03-07 Status: Ready for planning

A voice control daemon for managing multiple Claude Code CLI sessions. When a Claude session finishes and needs input, Voice Pilot:

Plays a notification sound
Announces which project is ready via TTS (macOS say)
Activates the microphone to listen for a response
User says “More” → reads back a summary of the last output
User speaks a response → transcribed and sent to the correct tmux session via tmux send-keys
User says “Skip” / “Ignore” → dismisses, re-notifies later

Works both at-desk (multi-session coding) and away-from-desk (overnight agents).

Architecture: Claude Code Hooks + Listener daemon

Event-driven — Claude Code’s Notification hook triggers a script when Claude finishes, writing to a queue. No polling, no heuristic prompt detection.
Listener daemon picks up queued notifications, handles TTS, mic, and response routing.
Leverages existing tmux infrastructure from the overnight agent.
More reliable than tmux polling (which can misfire on ❯ appearing in output).

Speech-to-text: Cloud API (OpenAI Whisper API or Google) — accurate, zero local model setup.

Mic activation: After notification only — mic is not always-on. Only activates when a session needs input and the notification plays.

Scope: Monitor only — user starts their own tmux/Claude sessions. Voice Pilot just watches and responds.

Hooks over polling — Claude Code Notification hook writes to a queue file; daemon reads from it. No tmux content scraping.
Cloud STT — OpenAI Whisper API for transcription. User already has OpenAI access (Codex/ChatGPT).
Mic activates post-notification only — no wake word, no always-on listening. Simplest and most privacy-friendly.
macOS native TTS — say command, afplay for notification sound. Zero dependencies for output.
tmux send-keys for routing — responses go to the correct pane via tmux, same pattern as overnight agent.
Monitor-only scope — doesn’t launch sessions, just watches existing ones.

Command	Action
”More” / “Details”	Read back last ~15 lines of output via TTS
”Skip” / “Ignore”	Dismiss notification, re-notify on next idle
Any other speech	Transcribed and sent as input to the session

Hook script — installed per-project in .claude/settings.json, writes notification events to a shared queue (e.g., /tmp/voice-pilot-queue/)
Listener daemon (voice_pilot.py) — reads queue, plays notification, TTS, captures mic, transcribes, routes response
Requirements — Python with speech_recognition, pyaudio; brew portaudio; macOS say/afplay

Should the queue be a file, a Unix socket, or a named pipe? (File is simplest)
How to handle multiple notifications arriving close together? (FIFO queue, announce each in order)
Should there be a web UI or status dashboard? (Probably not for v1 — terminal output is fine)
Integration with overnight agent’s existing notification plans (ntfy.sh)?

Run /workflows:plan to create implementation plan.