2026-03-07-voice-pilot-brainstorm

Voice Pilot Brainstorm

Date: 2026-03-07 Status: Ready for planning

What We’re Building

A voice control daemon for managing multiple Claude Code CLI sessions. When a Claude session finishes and needs input, Voice Pilot:

  1. Plays a notification sound
  2. Announces which project is ready via TTS (macOS say)
  3. Activates the microphone to listen for a response
  4. User says “More” → reads back a summary of the last output
  5. User speaks a response → transcribed and sent to the correct tmux session via tmux send-keys
  6. User says “Skip” / “Ignore” → dismisses, re-notifies later

Works both at-desk (multi-session coding) and away-from-desk (overnight agents).

Why This Approach

Architecture: Claude Code Hooks + Listener daemon

  • Event-driven — Claude Code’s Notification hook triggers a script when Claude finishes, writing to a queue. No polling, no heuristic prompt detection.
  • Listener daemon picks up queued notifications, handles TTS, mic, and response routing.
  • Leverages existing tmux infrastructure from the overnight agent.
  • More reliable than tmux polling (which can misfire on appearing in output).

Speech-to-text: Cloud API (OpenAI Whisper API or Google) — accurate, zero local model setup.

Mic activation: After notification only — mic is not always-on. Only activates when a session needs input and the notification plays.

Scope: Monitor only — user starts their own tmux/Claude sessions. Voice Pilot just watches and responds.

Key Decisions

  1. Hooks over polling — Claude Code Notification hook writes to a queue file; daemon reads from it. No tmux content scraping.
  2. Cloud STT — OpenAI Whisper API for transcription. User already has OpenAI access (Codex/ChatGPT).
  3. Mic activates post-notification only — no wake word, no always-on listening. Simplest and most privacy-friendly.
  4. macOS native TTSsay command, afplay for notification sound. Zero dependencies for output.
  5. tmux send-keys for routing — responses go to the correct pane via tmux, same pattern as overnight agent.
  6. Monitor-only scope — doesn’t launch sessions, just watches existing ones.

Voice Commands

CommandAction
”More” / “Details”Read back last ~15 lines of output via TTS
”Skip” / “Ignore”Dismiss notification, re-notify on next idle
Any other speechTranscribed and sent as input to the session

Components

  1. Hook script — installed per-project in .claude/settings.json, writes notification events to a shared queue (e.g., /tmp/voice-pilot-queue/)
  2. Listener daemon (voice_pilot.py) — reads queue, plays notification, TTS, captures mic, transcribes, routes response
  3. Requirements — Python with speech_recognition, pyaudio; brew portaudio; macOS say/afplay

Open Questions

  • Should the queue be a file, a Unix socket, or a named pipe? (File is simplest)
  • How to handle multiple notifications arriving close together? (FIFO queue, announce each in order)
  • Should there be a web UI or status dashboard? (Probably not for v1 — terminal output is fine)
  • Integration with overnight agent’s existing notification plans (ntfy.sh)?

Next

Run /workflows:plan to create implementation plan.