feat: Voice Pilot — voice control for Claude Code CLI sessions
Voice Pilot — Voice Control for Claude Code CLI Sessions
Overview
A Python daemon that monitors tmux sessions running Claude Code, announces when they need input via macOS TTS, listens for voice responses, and routes transcribed text back to the correct session via tmux send-keys.
Problem Statement
When running multiple Claude Code sessions across projects (or overnight agents), the user must constantly switch between terminal tabs to check which sessions are waiting for input. This breaks flow and makes it impossible to manage sessions while away from the desk.
Proposed Solution
Event-driven architecture using Claude Code’s Notification hook:
Claude Code finishes → Notification hook fires → writes event to queue
↓
Voice Pilot daemon
↓
afplay /System/Library/Sounds/Ping.aiff
↓
say "{project} is ready"
↓
Mic activates (30s timeout)
↓
┌──────────────┬──────────────┬──────────────┐
│ "more" │ "skip" │ Any other │
│ │ │ speech │
↓ ↓ ↓ │
Read last 15 Dismiss, Transcribe via │
lines via TTS move to next Whisper API │
↓ ↓ │
Mic reactivates tmux send-keys -l │
(re-check cmds) │
└─────────────────────────────┘ │
Technical Approach
Design Principles (from review)
- Zero pip dependencies — use
urllib.requestinstead ofhttpxfor the Whisper API call. No venv, norequirements.txt. Just installsoxand run. - Safe JSON construction — use
jqin the hook script to avoid shell injection. - Atomic file operations — write to temp file,
mvto queue (prevents partial reads). Delete events after processing (crash resilience). - Per-user temp directory — use
$TMPDIR(macOS per-user) instead of/tmp(world-writable). - Fail fast — validate
OPENAI_API_KEYat startup, not on first API call. - Subprocess error handling — wrap all external calls in try/except with logging.
Components (3 files)
1. Hook script: scripts/voice-pilot/notify-hook.sh
Shell script invoked by Claude Code’s Notification hook. Writes a JSON event file atomically to the queue directory using jq.
#!/usr/bin/env bash
# Called by Claude Code Notification hook
# Writes event atomically to $TMPDIR/voice-pilot-queue/
set -euo pipefail
QUEUE_DIR="${TMPDIR:-/tmp}/voice-pilot-queue"
mkdir -p "$QUEUE_DIR"
EVENT_FILE="$QUEUE_DIR/$(date +%s%N).json"
TMPFILE=$(mktemp "$QUEUE_DIR/.tmp.XXXXXX")
jq -n \
--arg pane_id "${TMUX_PANE:-unknown}" \
--arg project "$(basename "$PWD")" \
--argjson timestamp "$(date +%s)" \
'{pane_id: $pane_id, project: $project, timestamp: $timestamp}' \
> "$TMPFILE"
mv "$TMPFILE" "$EVENT_FILE"
Event schema (3 fields only):
pane_id— tmux pane ID for routing responses back (e.g.,%3)project— directory basename as human-readable project nametimestamp— Unix epoch for stale event detection
2. Main daemon: scripts/voice-pilot/voice_pilot.py
Python script (stdlib only, no pip deps) that watches the queue directory, processes events FIFO, and manages the notification → listen → route cycle.
Constants:
QUEUE_DIR = Path(os.environ.get("TMPDIR", "/tmp")) / "voice-pilot-queue"
POLL_INTERVAL_S = 1
LISTEN_TIMEOUT_S = 30
STALE_THRESHOLD_S = 600 # 10 minutes
SILENCE_THRESHOLD_PCT = "3%" # sox silence detection sensitivity
SILENCE_DURATION_S = "2.0" # seconds of silence before stop
MAX_TTS_CHARS = 500 # truncate "more" output for TTS
NOTIFICATION_SOUND = "/System/Library/Sounds/Ping.aiff"
Core loop:
while running:
events = sorted(queue_dir.glob("*.json"), key=lambda f: f.name)
for event_file in events:
try:
event = json.loads(event_file.read_text())
except (json.JSONDecodeError, OSError) as e:
log(f"Bad event file {event_file}: {e}")
event_file.unlink(missing_ok=True)
continue
# Skip stale events (older than 10 minutes)
if time.time() - event["timestamp"] > STALE_THRESHOLD_S:
log(f"Skipping stale event for {event['project']}")
event_file.unlink(missing_ok=True)
continue
# Verify tmux pane still exists
if not tmux_pane_exists(event["pane_id"]):
log(f"Pane {event['pane_id']} gone, skipping")
event_file.unlink(missing_ok=True)
continue
handle_notification(event)
event_file.unlink(missing_ok=True) # consume AFTER processing
time.sleep(POLL_INTERVAL_S)
Key functions:
def handle_notification(event):
"""Full notification cycle: sound → TTS → listen → route."""
play_sound()
say(f"{event['project']} is ready")
# say is synchronous — mic opens only after TTS completes
response = listen()
response = dispatch_command(response, event)
if response:
send_to_pane(event["pane_id"], response)
def dispatch_command(response, event):
"""Handle voice commands. Returns text to send, or None."""
if not response:
log("No speech detected")
return None
word = response.lower().strip()
if word == "more":
output = get_pane_output(event["pane_id"], lines=15)
cleaned = strip_ansi(output)[:MAX_TTS_CHARS]
say(cleaned)
response = listen()
return dispatch_command(response, event) # re-check for skip/more
if word == "skip":
log(f"Skipped {event['project']}")
return None
return response
def send_to_pane(pane_id, text):
"""Send text to tmux pane using literal mode for safety."""
try:
subprocess.run(["tmux", "send-keys", "-t", pane_id, "-l", text], check=True)
subprocess.run(["tmux", "send-keys", "-t", pane_id, "Enter"], check=True)
log(f"Sent to pane {pane_id}")
except subprocess.CalledProcessError as e:
log(f"Failed to send to pane {pane_id}: {e}")
def play_sound():
"""Play notification sound."""
try:
subprocess.run(["afplay", NOTIFICATION_SOUND],
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
except subprocess.CalledProcessError:
pass # non-critical
def say(text):
"""Speak text using macOS TTS. Blocks until complete."""
try:
subprocess.run(["say", text], check=True)
except subprocess.CalledProcessError as e:
log(f"TTS failed: {e}")
def listen():
"""Record audio via sox and transcribe via Whisper API."""
audio_path = None
try:
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
audio_path = f.name
subprocess.run([
"rec", audio_path,
"rate", "16k",
"channels", "1",
"silence", "1", "0.1", SILENCE_THRESHOLD_PCT,
"1", SILENCE_DURATION_S, SILENCE_THRESHOLD_PCT,
"trim", "0", str(LISTEN_TIMEOUT_S),
], timeout=LISTEN_TIMEOUT_S + 5, check=True,
stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
text = whisper_transcribe(audio_path)
log(f"Heard: \"{text}\"")
return text
except subprocess.TimeoutExpired:
log("Listen timed out")
return None
except subprocess.CalledProcessError as e:
log(f"Recording failed: {e}")
return None
finally:
if audio_path:
Path(audio_path).unlink(missing_ok=True)
def whisper_transcribe(audio_path):
"""Send audio to OpenAI Whisper API using urllib (no pip deps)."""
import urllib.request
boundary = "----VoicePilotBoundary"
with open(audio_path, "rb") as f:
audio_data = f.read()
body = (
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="model"\r\n\r\nwhisper-1\r\n'
f"--{boundary}\r\n"
f'Content-Disposition: form-data; name="file"; filename="audio.wav"\r\n'
f"Content-Type: audio/wav\r\n\r\n"
).encode() + audio_data + f"\r\n--{boundary}--\r\n".encode()
req = urllib.request.Request(
"https://api.openai.com/v1/audio/transcriptions",
data=body,
headers={
"Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
"Content-Type": f"multipart/form-data; boundary={boundary}",
},
)
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read())["text"]
def strip_ansi(text):
"""Remove ANSI escape codes, OSC sequences, and terminal artifacts for TTS."""
text = re.sub(r'\x1b\[[0-9;]*[a-zA-Z]', '', text) # CSI sequences
text = re.sub(r'\x1b\].*?\x07', '', text) # OSC sequences
text = re.sub(r'[╭╮╰╯─│┤├┬┴┼▐▛▜▌▝▘█▙▟⏺⏵]', '', text)
text = re.sub(r'\s+', ' ', text).strip()
return text
def get_pane_output(pane_id, lines=15):
"""Capture last N lines from a tmux pane."""
result = subprocess.run(
["tmux", "capture-pane", "-t", pane_id, "-p", "-S", f"-{lines}"],
capture_output=True, text=True
)
return result.stdout if result.returncode == 0 else ""
def tmux_pane_exists(pane_id):
"""Check if a tmux pane still exists."""
return subprocess.run(
["tmux", "has-session", "-t", pane_id],
capture_output=True
).returncode == 0
Startup validation:
def main():
if not os.environ.get("OPENAI_API_KEY"):
print("ERROR: OPENAI_API_KEY environment variable is required")
sys.exit(1)
if subprocess.run(["which", "rec"], capture_output=True).returncode != 0:
print("ERROR: sox is required — brew install sox")
sys.exit(1)
if subprocess.run(["which", "jq"], capture_output=True).returncode != 0:
print("ERROR: jq is required — brew install jq")
sys.exit(1)
QUEUE_DIR.mkdir(parents=True, exist_ok=True)
# Signal handling for clean shutdown
running = True
def shutdown(sig, frame):
nonlocal running
print("\nShutting down...")
running = False
signal.signal(signal.SIGINT, shutdown)
signal.signal(signal.SIGTERM, shutdown)
print(f"Voice Pilot listening on {QUEUE_DIR}")
print("Waiting for Claude Code notifications...")
# main loop here (uses `running` flag)
3. Hook configuration: .claude/settings.json
{
"hooks": {
"Notification": [
{
"type": "command",
"command": "bash scripts/voice-pilot/notify-hook.sh"
}
]
}
}
Note: this is a one-time config change, not a code artifact to maintain.
Microphone Lifecycle
saycompletes (synchronous — blocks until audio finishes)- Mic opens via
recwith silence detection:- Starts recording when voice detected (above 3% threshold)
- Stops recording after 2 seconds of silence
- Hard timeout at 30 seconds
- Audio sent to Whisper API (30s HTTP timeout)
- Mic is off at all other times
No artificial delay needed — say is synchronous and returns only after audio playback completes.
Voice Commands
Only two single-word commands. Whisper transcription is unpredictable with phrases, so keep it minimal:
| Command | Action |
|---|---|
"more" | Read back last 15 lines (ANSI-stripped, truncated to 500 chars), then re-listen. Can be chained. |
"skip" | Dismiss notification, move to next event |
| Anything else | Transcribed text sent to session via tmux send-keys -l |
Key Safety Measures
- JSON injection prevention —
jqconstructs JSON safely in hook script - Atomic queue writes — temp file +
mvprevents partial reads - Crash-safe consumption — events deleted after processing, not before
- tmux literal mode —
send-keys -lprevents special character interpretation - Per-user queue directory —
$TMPDIRinstead of world-writable/tmp - Startup validation — fail fast if
OPENAI_API_KEY,sox, orjqmissing - Subprocess error handling — all external calls wrapped in try/except
Known Limitations (v1)
- No re-notification after skip (session stays idle until next Claude notification)
- No TTS interruption during “More” readback
- No daemon supervision (manual start/stop)
- No protection against keyboard + voice double-input
- No visual mic indicator
- Sox silence threshold (3%) may need tuning for different mics/environments
Acceptance Criteria
- Hook script writes valid JSON events atomically to
$TMPDIR/voice-pilot-queue/ - Daemon processes events in FIFO order, deleting after successful handling
- Notification sound plays via
afplay(Ping.aiff) - Project name announced via
say - Mic activates only after TTS completes (no echo issues)
- “more” reads back ANSI-stripped, truncated pane output via TTS, then re-listens
- “skip” dismisses the notification
- Freeform speech transcribed via Whisper API and sent to correct pane via
tmux send-keys -l - Stale events (>10 min) discarded
- Dead tmux panes detected and skipped
- Daemon validates deps at startup and fails fast with clear errors
- Daemon exits cleanly on SIGINT/SIGTERM
- Zero pip dependencies — runs with system Python + sox + jq
Implementation Checklist
Phase 1: Infrastructure
- Create
scripts/voice-pilot/directory -
brew install sox jq(if not already installed) - Verify
OPENAI_API_KEYis set in shell environment - Add
.claude/settings.jsonwith Notification hook config
Phase 2: Hook Script
- Write
scripts/voice-pilot/notify-hook.sh(jq-based, atomic writes) -
chmod +x scripts/voice-pilot/notify-hook.sh - Test: run hook manually inside a tmux pane, verify JSON in
$TMPDIR/voice-pilot-queue/
Phase 3: Core Daemon
- Write
scripts/voice-pilot/voice_pilot.py:- Startup validation (API key, sox, jq)
- Signal handling (SIGINT/SIGTERM)
- Queue watcher (poll every 1s, consume after processing)
handle_notification()→play_sound()→say()→listen()→dispatch_command()dispatch_command()— recursive for “more”, handles “skip”, passes through freeformlisten()— sox recording with silence detection + Whisper API via urllibstrip_ansi()— CSI + OSC + box-drawing removal + truncationsend_to_pane()— tmux send-keys -l with error handling- Logging to stdout with timestamps
Phase 4: Integration Test
- Start Claude Code in a tmux session
- Start
python3 scripts/voice-pilot/voice_pilot.pyin another terminal - Trigger a notification (let Claude finish a task)
- Verify: sound plays, project announced, mic activates
- Test “more” — verify cleaned output is read aloud
- Test “skip” — verify dismissed
- Test freeform response — verify it arrives in correct pane
- Test with 2+ simultaneous Claude sessions — verify FIFO and correct routing
- Test stale event discard (create old event file, verify skipped)
- Test dead pane handling (kill a pane, verify event skipped)
File Tree
scripts/voice-pilot/
├── notify-hook.sh # Claude Code Notification hook (jq + atomic writes)
└── voice_pilot.py # Main daemon (stdlib only, no pip deps)
.claude/settings.json # Hook configuration (one-time setup)
Dependencies
| Dependency | Install | Purpose |
|---|---|---|
sox | brew install sox | Audio recording via rec command |
jq | brew install jq | Safe JSON construction in hook script |
OPENAI_API_KEY | env var | Whisper API authentication |
macOS say | built-in | Text-to-speech |
macOS afplay | built-in | Notification sound |
tmux | already installed | Session management |
| Python 3 | built-in (macOS) | Daemon runtime (stdlib only) |
References
- Brainstorm:
docs/brainstorms/2026-03-07-voice-pilot-brainstorm.md - Overnight agent architecture:
docs/solutions/architecture-patterns/overnight-autonomous-agent.md - Overnight launch script (tmux pattern):
scripts/overnight-launch.sh - Claude Code hooks docs: https://code.claude.com/docs/en/hooks