feat: Voice Pilot — voice control for Claude Code CLI sessions

Voice Pilot — Voice Control for Claude Code CLI Sessions

Overview

A Python daemon that monitors tmux sessions running Claude Code, announces when they need input via macOS TTS, listens for voice responses, and routes transcribed text back to the correct session via tmux send-keys.

Problem Statement

When running multiple Claude Code sessions across projects (or overnight agents), the user must constantly switch between terminal tabs to check which sessions are waiting for input. This breaks flow and makes it impossible to manage sessions while away from the desk.

Proposed Solution

Event-driven architecture using Claude Code’s Notification hook:

Claude Code finishes → Notification hook fires → writes event to queue
                                                        ↓
                                              Voice Pilot daemon
                                                        ↓
                                          afplay /System/Library/Sounds/Ping.aiff
                                                        ↓
                                          say "{project} is ready"
                                                        ↓
                                          Mic activates (30s timeout)
                                                        ↓
                              ┌──────────────┬──────────────┬──────────────┐
                              │   "more"     │   "skip"     │  Any other   │
                              │              │              │   speech     │
                              ↓              ↓              ↓              │
                        Read last 15    Dismiss,       Transcribe via      │
                        lines via TTS   move to next   Whisper API         │
                              ↓                             ↓              │
                        Mic reactivates              tmux send-keys -l     │
                        (re-check cmds)                                    │
                              └─────────────────────────────┘              │

Technical Approach

Design Principles (from review)

Zero pip dependencies — use urllib.request instead of httpx for the Whisper API call. No venv, no requirements.txt. Just install sox and run.
Safe JSON construction — use jq in the hook script to avoid shell injection.
Atomic file operations — write to temp file, mv to queue (prevents partial reads). Delete events after processing (crash resilience).
Per-user temp directory — use $TMPDIR (macOS per-user) instead of /tmp (world-writable).
Fail fast — validate OPENAI_API_KEY at startup, not on first API call.
Subprocess error handling — wrap all external calls in try/except with logging.

Components (3 files)

1. Hook script: `scripts/voice-pilot/notify-hook.sh`

Shell script invoked by Claude Code’s Notification hook. Writes a JSON event file atomically to the queue directory using jq.

#!/usr/bin/env bash
# Called by Claude Code Notification hook
# Writes event atomically to $TMPDIR/voice-pilot-queue/

set -euo pipefail

QUEUE_DIR="${TMPDIR:-/tmp}/voice-pilot-queue"
mkdir -p "$QUEUE_DIR"

EVENT_FILE="$QUEUE_DIR/$(date +%s%N).json"
TMPFILE=$(mktemp "$QUEUE_DIR/.tmp.XXXXXX")

jq -n \
  --arg pane_id "${TMUX_PANE:-unknown}" \
  --arg project "$(basename "$PWD")" \
  --argjson timestamp "$(date +%s)" \
  '{pane_id: $pane_id, project: $project, timestamp: $timestamp}' \
  > "$TMPFILE"

mv "$TMPFILE" "$EVENT_FILE"

Event schema (3 fields only):

pane_id — tmux pane ID for routing responses back (e.g., %3)
project — directory basename as human-readable project name
timestamp — Unix epoch for stale event detection

2. Main daemon: `scripts/voice-pilot/voice_pilot.py`

Python script (stdlib only, no pip deps) that watches the queue directory, processes events FIFO, and manages the notification → listen → route cycle.

Constants:

QUEUE_DIR = Path(os.environ.get("TMPDIR", "/tmp")) / "voice-pilot-queue"
POLL_INTERVAL_S = 1
LISTEN_TIMEOUT_S = 30
STALE_THRESHOLD_S = 600        # 10 minutes
SILENCE_THRESHOLD_PCT = "3%"   # sox silence detection sensitivity
SILENCE_DURATION_S = "2.0"     # seconds of silence before stop
MAX_TTS_CHARS = 500            # truncate "more" output for TTS
NOTIFICATION_SOUND = "/System/Library/Sounds/Ping.aiff"

Core loop:

while running:
    events = sorted(queue_dir.glob("*.json"), key=lambda f: f.name)
    for event_file in events:
        try:
            event = json.loads(event_file.read_text())
        except (json.JSONDecodeError, OSError) as e:
            log(f"Bad event file {event_file}: {e}")
            event_file.unlink(missing_ok=True)
            continue

        # Skip stale events (older than 10 minutes)
        if time.time() - event["timestamp"] > STALE_THRESHOLD_S:
            log(f"Skipping stale event for {event['project']}")
            event_file.unlink(missing_ok=True)
            continue

        # Verify tmux pane still exists
        if not tmux_pane_exists(event["pane_id"]):
            log(f"Pane {event['pane_id']} gone, skipping")
            event_file.unlink(missing_ok=True)
            continue

        handle_notification(event)
        event_file.unlink(missing_ok=True)  # consume AFTER processing

    time.sleep(POLL_INTERVAL_S)

Key functions:

def handle_notification(event):
    """Full notification cycle: sound → TTS → listen → route."""
    play_sound()
    say(f"{event['project']} is ready")
    # say is synchronous — mic opens only after TTS completes

    response = listen()
    response = dispatch_command(response, event)
    if response:
        send_to_pane(event["pane_id"], response)


def dispatch_command(response, event):
    """Handle voice commands. Returns text to send, or None."""
    if not response:
        log("No speech detected")
        return None

    word = response.lower().strip()

    if word == "more":
        output = get_pane_output(event["pane_id"], lines=15)
        cleaned = strip_ansi(output)[:MAX_TTS_CHARS]
        say(cleaned)
        response = listen()
        return dispatch_command(response, event)  # re-check for skip/more

    if word == "skip":
        log(f"Skipped {event['project']}")
        return None

    return response


def send_to_pane(pane_id, text):
    """Send text to tmux pane using literal mode for safety."""
    try:
        subprocess.run(["tmux", "send-keys", "-t", pane_id, "-l", text], check=True)
        subprocess.run(["tmux", "send-keys", "-t", pane_id, "Enter"], check=True)
        log(f"Sent to pane {pane_id}")
    except subprocess.CalledProcessError as e:
        log(f"Failed to send to pane {pane_id}: {e}")


def play_sound():
    """Play notification sound."""
    try:
        subprocess.run(["afplay", NOTIFICATION_SOUND],
                       stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        pass  # non-critical


def say(text):
    """Speak text using macOS TTS. Blocks until complete."""
    try:
        subprocess.run(["say", text], check=True)
    except subprocess.CalledProcessError as e:
        log(f"TTS failed: {e}")


def listen():
    """Record audio via sox and transcribe via Whisper API."""
    audio_path = None
    try:
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
            audio_path = f.name

        subprocess.run([
            "rec", audio_path,
            "rate", "16k",
            "channels", "1",
            "silence", "1", "0.1", SILENCE_THRESHOLD_PCT,
            "1", SILENCE_DURATION_S, SILENCE_THRESHOLD_PCT,
            "trim", "0", str(LISTEN_TIMEOUT_S),
        ], timeout=LISTEN_TIMEOUT_S + 5, check=True,
           stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

        text = whisper_transcribe(audio_path)
        log(f"Heard: \"{text}\"")
        return text

    except subprocess.TimeoutExpired:
        log("Listen timed out")
        return None
    except subprocess.CalledProcessError as e:
        log(f"Recording failed: {e}")
        return None
    finally:
        if audio_path:
            Path(audio_path).unlink(missing_ok=True)


def whisper_transcribe(audio_path):
    """Send audio to OpenAI Whisper API using urllib (no pip deps)."""
    import urllib.request

    boundary = "----VoicePilotBoundary"
    with open(audio_path, "rb") as f:
        audio_data = f.read()

    body = (
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="model"\r\n\r\nwhisper-1\r\n'
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="file"; filename="audio.wav"\r\n'
        f"Content-Type: audio/wav\r\n\r\n"
    ).encode() + audio_data + f"\r\n--{boundary}--\r\n".encode()

    req = urllib.request.Request(
        "https://api.openai.com/v1/audio/transcriptions",
        data=body,
        headers={
            "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
            "Content-Type": f"multipart/form-data; boundary={boundary}",
        },
    )

    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read())["text"]


def strip_ansi(text):
    """Remove ANSI escape codes, OSC sequences, and terminal artifacts for TTS."""
    text = re.sub(r'\x1b\[[0-9;]*[a-zA-Z]', '', text)   # CSI sequences
    text = re.sub(r'\x1b\].*?\x07', '', text)             # OSC sequences
    text = re.sub(r'[╭╮╰╯─│┤├┬┴┼▐▛▜▌▝▘█▙▟⏺⏵]', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text


def get_pane_output(pane_id, lines=15):
    """Capture last N lines from a tmux pane."""
    result = subprocess.run(
        ["tmux", "capture-pane", "-t", pane_id, "-p", "-S", f"-{lines}"],
        capture_output=True, text=True
    )
    return result.stdout if result.returncode == 0 else ""


def tmux_pane_exists(pane_id):
    """Check if a tmux pane still exists."""
    return subprocess.run(
        ["tmux", "has-session", "-t", pane_id],
        capture_output=True
    ).returncode == 0

Startup validation:

def main():
    if not os.environ.get("OPENAI_API_KEY"):
        print("ERROR: OPENAI_API_KEY environment variable is required")
        sys.exit(1)

    if subprocess.run(["which", "rec"], capture_output=True).returncode != 0:
        print("ERROR: sox is required — brew install sox")
        sys.exit(1)

    if subprocess.run(["which", "jq"], capture_output=True).returncode != 0:
        print("ERROR: jq is required — brew install jq")
        sys.exit(1)

    QUEUE_DIR.mkdir(parents=True, exist_ok=True)

    # Signal handling for clean shutdown
    running = True
    def shutdown(sig, frame):
        nonlocal running
        print("\nShutting down...")
        running = False
    signal.signal(signal.SIGINT, shutdown)
    signal.signal(signal.SIGTERM, shutdown)

    print(f"Voice Pilot listening on {QUEUE_DIR}")
    print("Waiting for Claude Code notifications...")

    # main loop here (uses `running` flag)

3. Hook configuration: `.claude/settings.json`

{
  "hooks": {
    "Notification": [
      {
        "type": "command",
        "command": "bash scripts/voice-pilot/notify-hook.sh"
      }
    ]
  }
}

Note: this is a one-time config change, not a code artifact to maintain.

Microphone Lifecycle

say completes (synchronous — blocks until audio finishes)
Mic opens via rec with silence detection:
- Starts recording when voice detected (above 3% threshold)
- Stops recording after 2 seconds of silence
- Hard timeout at 30 seconds
Audio sent to Whisper API (30s HTTP timeout)
Mic is off at all other times

No artificial delay needed — say is synchronous and returns only after audio playback completes.

Voice Commands

Only two single-word commands. Whisper transcription is unpredictable with phrases, so keep it minimal:

Command	Action
`"more"`	Read back last 15 lines (ANSI-stripped, truncated to 500 chars), then re-listen. Can be chained.
`"skip"`	Dismiss notification, move to next event
Anything else	Transcribed text sent to session via `tmux send-keys -l`

Key Safety Measures

JSON injection prevention — jq constructs JSON safely in hook script
Atomic queue writes — temp file + mv prevents partial reads
Crash-safe consumption — events deleted after processing, not before
tmux literal mode — send-keys -l prevents special character interpretation
Per-user queue directory — $TMPDIR instead of world-writable /tmp
Startup validation — fail fast if OPENAI_API_KEY, sox, or jq missing
Subprocess error handling — all external calls wrapped in try/except

Known Limitations (v1)

No re-notification after skip (session stays idle until next Claude notification)
No TTS interruption during “More” readback
No daemon supervision (manual start/stop)
No protection against keyboard + voice double-input
No visual mic indicator
Sox silence threshold (3%) may need tuning for different mics/environments

Acceptance Criteria

Implementation Checklist

Phase 1: Infrastructure

Create scripts/voice-pilot/ directory
brew install sox jq (if not already installed)
Verify OPENAI_API_KEY is set in shell environment
Add .claude/settings.json with Notification hook config

Phase 2: Hook Script

Write scripts/voice-pilot/notify-hook.sh (jq-based, atomic writes)
chmod +x scripts/voice-pilot/notify-hook.sh
Test: run hook manually inside a tmux pane, verify JSON in $TMPDIR/voice-pilot-queue/

Phase 3: Core Daemon

Write scripts/voice-pilot/voice_pilot.py:
- Startup validation (API key, sox, jq)
- Signal handling (SIGINT/SIGTERM)
- Queue watcher (poll every 1s, consume after processing)
- handle_notification() → play_sound() → say() → listen() → dispatch_command()
- dispatch_command() — recursive for “more”, handles “skip”, passes through freeform
- listen() — sox recording with silence detection + Whisper API via urllib
- strip_ansi() — CSI + OSC + box-drawing removal + truncation
- send_to_pane() — tmux send-keys -l with error handling
- Logging to stdout with timestamps

Phase 4: Integration Test

File Tree

scripts/voice-pilot/
├── notify-hook.sh        # Claude Code Notification hook (jq + atomic writes)
└── voice_pilot.py        # Main daemon (stdlib only, no pip deps)

.claude/settings.json     # Hook configuration (one-time setup)

Dependencies

Dependency	Install	Purpose
`sox`	`brew install sox`	Audio recording via `rec` command
`jq`	`brew install jq`	Safe JSON construction in hook script
`OPENAI_API_KEY`	env var	Whisper API authentication
macOS `say`	built-in	Text-to-speech
macOS `afplay`	built-in	Notification sound
`tmux`	already installed	Session management
Python 3	built-in (macOS)	Daemon runtime (stdlib only)

References

Brainstorm: docs/brainstorms/2026-03-07-voice-pilot-brainstorm.md
Overnight agent architecture: docs/solutions/architecture-patterns/overnight-autonomous-agent.md
Overnight launch script (tmux pattern): scripts/overnight-launch.sh
Claude Code hooks docs: https://code.claude.com/docs/en/hooks