feat: Voice Pilot — voice control for Claude Code CLI sessions

Voice Pilot — Voice Control for Claude Code CLI Sessions

Overview

A Python daemon that monitors tmux sessions running Claude Code, announces when they need input via macOS TTS, listens for voice responses, and routes transcribed text back to the correct session via tmux send-keys.

Problem Statement

When running multiple Claude Code sessions across projects (or overnight agents), the user must constantly switch between terminal tabs to check which sessions are waiting for input. This breaks flow and makes it impossible to manage sessions while away from the desk.

Proposed Solution

Event-driven architecture using Claude Code’s Notification hook:

Claude Code finishes → Notification hook fires → writes event to queue

                                              Voice Pilot daemon

                                          afplay /System/Library/Sounds/Ping.aiff

                                          say "{project} is ready"

                                          Mic activates (30s timeout)

                              ┌──────────────┬──────────────┬──────────────┐
                              │   "more"     │   "skip"     │  Any other   │
                              │              │              │   speech     │
                              ↓              ↓              ↓              │
                        Read last 15    Dismiss,       Transcribe via      │
                        lines via TTS   move to next   Whisper API         │
                              ↓                             ↓              │
                        Mic reactivates              tmux send-keys -l     │
                        (re-check cmds)                                    │
                              └─────────────────────────────┘              │

Technical Approach

Design Principles (from review)

  1. Zero pip dependencies — use urllib.request instead of httpx for the Whisper API call. No venv, no requirements.txt. Just install sox and run.
  2. Safe JSON construction — use jq in the hook script to avoid shell injection.
  3. Atomic file operations — write to temp file, mv to queue (prevents partial reads). Delete events after processing (crash resilience).
  4. Per-user temp directory — use $TMPDIR (macOS per-user) instead of /tmp (world-writable).
  5. Fail fast — validate OPENAI_API_KEY at startup, not on first API call.
  6. Subprocess error handling — wrap all external calls in try/except with logging.

Components (3 files)

1. Hook script: scripts/voice-pilot/notify-hook.sh

Shell script invoked by Claude Code’s Notification hook. Writes a JSON event file atomically to the queue directory using jq.

#!/usr/bin/env bash
# Called by Claude Code Notification hook
# Writes event atomically to $TMPDIR/voice-pilot-queue/

set -euo pipefail

QUEUE_DIR="${TMPDIR:-/tmp}/voice-pilot-queue"
mkdir -p "$QUEUE_DIR"

EVENT_FILE="$QUEUE_DIR/$(date +%s%N).json"
TMPFILE=$(mktemp "$QUEUE_DIR/.tmp.XXXXXX")

jq -n \
  --arg pane_id "${TMUX_PANE:-unknown}" \
  --arg project "$(basename "$PWD")" \
  --argjson timestamp "$(date +%s)" \
  '{pane_id: $pane_id, project: $project, timestamp: $timestamp}' \
  > "$TMPFILE"

mv "$TMPFILE" "$EVENT_FILE"

Event schema (3 fields only):

  • pane_id — tmux pane ID for routing responses back (e.g., %3)
  • project — directory basename as human-readable project name
  • timestamp — Unix epoch for stale event detection

2. Main daemon: scripts/voice-pilot/voice_pilot.py

Python script (stdlib only, no pip deps) that watches the queue directory, processes events FIFO, and manages the notification → listen → route cycle.

Constants:

QUEUE_DIR = Path(os.environ.get("TMPDIR", "/tmp")) / "voice-pilot-queue"
POLL_INTERVAL_S = 1
LISTEN_TIMEOUT_S = 30
STALE_THRESHOLD_S = 600        # 10 minutes
SILENCE_THRESHOLD_PCT = "3%"   # sox silence detection sensitivity
SILENCE_DURATION_S = "2.0"     # seconds of silence before stop
MAX_TTS_CHARS = 500            # truncate "more" output for TTS
NOTIFICATION_SOUND = "/System/Library/Sounds/Ping.aiff"

Core loop:

while running:
    events = sorted(queue_dir.glob("*.json"), key=lambda f: f.name)
    for event_file in events:
        try:
            event = json.loads(event_file.read_text())
        except (json.JSONDecodeError, OSError) as e:
            log(f"Bad event file {event_file}: {e}")
            event_file.unlink(missing_ok=True)
            continue

        # Skip stale events (older than 10 minutes)
        if time.time() - event["timestamp"] > STALE_THRESHOLD_S:
            log(f"Skipping stale event for {event['project']}")
            event_file.unlink(missing_ok=True)
            continue

        # Verify tmux pane still exists
        if not tmux_pane_exists(event["pane_id"]):
            log(f"Pane {event['pane_id']} gone, skipping")
            event_file.unlink(missing_ok=True)
            continue

        handle_notification(event)
        event_file.unlink(missing_ok=True)  # consume AFTER processing

    time.sleep(POLL_INTERVAL_S)

Key functions:

def handle_notification(event):
    """Full notification cycle: sound → TTS → listen → route."""
    play_sound()
    say(f"{event['project']} is ready")
    # say is synchronous — mic opens only after TTS completes

    response = listen()
    response = dispatch_command(response, event)
    if response:
        send_to_pane(event["pane_id"], response)


def dispatch_command(response, event):
    """Handle voice commands. Returns text to send, or None."""
    if not response:
        log("No speech detected")
        return None

    word = response.lower().strip()

    if word == "more":
        output = get_pane_output(event["pane_id"], lines=15)
        cleaned = strip_ansi(output)[:MAX_TTS_CHARS]
        say(cleaned)
        response = listen()
        return dispatch_command(response, event)  # re-check for skip/more

    if word == "skip":
        log(f"Skipped {event['project']}")
        return None

    return response


def send_to_pane(pane_id, text):
    """Send text to tmux pane using literal mode for safety."""
    try:
        subprocess.run(["tmux", "send-keys", "-t", pane_id, "-l", text], check=True)
        subprocess.run(["tmux", "send-keys", "-t", pane_id, "Enter"], check=True)
        log(f"Sent to pane {pane_id}")
    except subprocess.CalledProcessError as e:
        log(f"Failed to send to pane {pane_id}: {e}")


def play_sound():
    """Play notification sound."""
    try:
        subprocess.run(["afplay", NOTIFICATION_SOUND],
                       stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)
    except subprocess.CalledProcessError:
        pass  # non-critical


def say(text):
    """Speak text using macOS TTS. Blocks until complete."""
    try:
        subprocess.run(["say", text], check=True)
    except subprocess.CalledProcessError as e:
        log(f"TTS failed: {e}")


def listen():
    """Record audio via sox and transcribe via Whisper API."""
    audio_path = None
    try:
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
            audio_path = f.name

        subprocess.run([
            "rec", audio_path,
            "rate", "16k",
            "channels", "1",
            "silence", "1", "0.1", SILENCE_THRESHOLD_PCT,
            "1", SILENCE_DURATION_S, SILENCE_THRESHOLD_PCT,
            "trim", "0", str(LISTEN_TIMEOUT_S),
        ], timeout=LISTEN_TIMEOUT_S + 5, check=True,
           stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

        text = whisper_transcribe(audio_path)
        log(f"Heard: \"{text}\"")
        return text

    except subprocess.TimeoutExpired:
        log("Listen timed out")
        return None
    except subprocess.CalledProcessError as e:
        log(f"Recording failed: {e}")
        return None
    finally:
        if audio_path:
            Path(audio_path).unlink(missing_ok=True)


def whisper_transcribe(audio_path):
    """Send audio to OpenAI Whisper API using urllib (no pip deps)."""
    import urllib.request

    boundary = "----VoicePilotBoundary"
    with open(audio_path, "rb") as f:
        audio_data = f.read()

    body = (
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="model"\r\n\r\nwhisper-1\r\n'
        f"--{boundary}\r\n"
        f'Content-Disposition: form-data; name="file"; filename="audio.wav"\r\n'
        f"Content-Type: audio/wav\r\n\r\n"
    ).encode() + audio_data + f"\r\n--{boundary}--\r\n".encode()

    req = urllib.request.Request(
        "https://api.openai.com/v1/audio/transcriptions",
        data=body,
        headers={
            "Authorization": f"Bearer {os.environ['OPENAI_API_KEY']}",
            "Content-Type": f"multipart/form-data; boundary={boundary}",
        },
    )

    with urllib.request.urlopen(req, timeout=30) as resp:
        return json.loads(resp.read())["text"]


def strip_ansi(text):
    """Remove ANSI escape codes, OSC sequences, and terminal artifacts for TTS."""
    text = re.sub(r'\x1b\[[0-9;]*[a-zA-Z]', '', text)   # CSI sequences
    text = re.sub(r'\x1b\].*?\x07', '', text)             # OSC sequences
    text = re.sub(r'[╭╮╰╯─│┤├┬┴┼▐▛▜▌▝▘█▙▟⏺⏵]', '', text)
    text = re.sub(r'\s+', ' ', text).strip()
    return text


def get_pane_output(pane_id, lines=15):
    """Capture last N lines from a tmux pane."""
    result = subprocess.run(
        ["tmux", "capture-pane", "-t", pane_id, "-p", "-S", f"-{lines}"],
        capture_output=True, text=True
    )
    return result.stdout if result.returncode == 0 else ""


def tmux_pane_exists(pane_id):
    """Check if a tmux pane still exists."""
    return subprocess.run(
        ["tmux", "has-session", "-t", pane_id],
        capture_output=True
    ).returncode == 0

Startup validation:

def main():
    if not os.environ.get("OPENAI_API_KEY"):
        print("ERROR: OPENAI_API_KEY environment variable is required")
        sys.exit(1)

    if subprocess.run(["which", "rec"], capture_output=True).returncode != 0:
        print("ERROR: sox is required — brew install sox")
        sys.exit(1)

    if subprocess.run(["which", "jq"], capture_output=True).returncode != 0:
        print("ERROR: jq is required — brew install jq")
        sys.exit(1)

    QUEUE_DIR.mkdir(parents=True, exist_ok=True)

    # Signal handling for clean shutdown
    running = True
    def shutdown(sig, frame):
        nonlocal running
        print("\nShutting down...")
        running = False
    signal.signal(signal.SIGINT, shutdown)
    signal.signal(signal.SIGTERM, shutdown)

    print(f"Voice Pilot listening on {QUEUE_DIR}")
    print("Waiting for Claude Code notifications...")

    # main loop here (uses `running` flag)

3. Hook configuration: .claude/settings.json

{
  "hooks": {
    "Notification": [
      {
        "type": "command",
        "command": "bash scripts/voice-pilot/notify-hook.sh"
      }
    ]
  }
}

Note: this is a one-time config change, not a code artifact to maintain.

Microphone Lifecycle

  1. say completes (synchronous — blocks until audio finishes)
  2. Mic opens via rec with silence detection:
    • Starts recording when voice detected (above 3% threshold)
    • Stops recording after 2 seconds of silence
    • Hard timeout at 30 seconds
  3. Audio sent to Whisper API (30s HTTP timeout)
  4. Mic is off at all other times

No artificial delay needed — say is synchronous and returns only after audio playback completes.

Voice Commands

Only two single-word commands. Whisper transcription is unpredictable with phrases, so keep it minimal:

CommandAction
"more"Read back last 15 lines (ANSI-stripped, truncated to 500 chars), then re-listen. Can be chained.
"skip"Dismiss notification, move to next event
Anything elseTranscribed text sent to session via tmux send-keys -l

Key Safety Measures

  • JSON injection preventionjq constructs JSON safely in hook script
  • Atomic queue writes — temp file + mv prevents partial reads
  • Crash-safe consumption — events deleted after processing, not before
  • tmux literal modesend-keys -l prevents special character interpretation
  • Per-user queue directory$TMPDIR instead of world-writable /tmp
  • Startup validation — fail fast if OPENAI_API_KEY, sox, or jq missing
  • Subprocess error handling — all external calls wrapped in try/except

Known Limitations (v1)

  • No re-notification after skip (session stays idle until next Claude notification)
  • No TTS interruption during “More” readback
  • No daemon supervision (manual start/stop)
  • No protection against keyboard + voice double-input
  • No visual mic indicator
  • Sox silence threshold (3%) may need tuning for different mics/environments

Acceptance Criteria

  • Hook script writes valid JSON events atomically to $TMPDIR/voice-pilot-queue/
  • Daemon processes events in FIFO order, deleting after successful handling
  • Notification sound plays via afplay (Ping.aiff)
  • Project name announced via say
  • Mic activates only after TTS completes (no echo issues)
  • “more” reads back ANSI-stripped, truncated pane output via TTS, then re-listens
  • “skip” dismisses the notification
  • Freeform speech transcribed via Whisper API and sent to correct pane via tmux send-keys -l
  • Stale events (>10 min) discarded
  • Dead tmux panes detected and skipped
  • Daemon validates deps at startup and fails fast with clear errors
  • Daemon exits cleanly on SIGINT/SIGTERM
  • Zero pip dependencies — runs with system Python + sox + jq

Implementation Checklist

Phase 1: Infrastructure

  • Create scripts/voice-pilot/ directory
  • brew install sox jq (if not already installed)
  • Verify OPENAI_API_KEY is set in shell environment
  • Add .claude/settings.json with Notification hook config

Phase 2: Hook Script

  • Write scripts/voice-pilot/notify-hook.sh (jq-based, atomic writes)
  • chmod +x scripts/voice-pilot/notify-hook.sh
  • Test: run hook manually inside a tmux pane, verify JSON in $TMPDIR/voice-pilot-queue/

Phase 3: Core Daemon

  • Write scripts/voice-pilot/voice_pilot.py:
    • Startup validation (API key, sox, jq)
    • Signal handling (SIGINT/SIGTERM)
    • Queue watcher (poll every 1s, consume after processing)
    • handle_notification()play_sound()say()listen()dispatch_command()
    • dispatch_command() — recursive for “more”, handles “skip”, passes through freeform
    • listen() — sox recording with silence detection + Whisper API via urllib
    • strip_ansi() — CSI + OSC + box-drawing removal + truncation
    • send_to_pane() — tmux send-keys -l with error handling
    • Logging to stdout with timestamps

Phase 4: Integration Test

  • Start Claude Code in a tmux session
  • Start python3 scripts/voice-pilot/voice_pilot.py in another terminal
  • Trigger a notification (let Claude finish a task)
  • Verify: sound plays, project announced, mic activates
  • Test “more” — verify cleaned output is read aloud
  • Test “skip” — verify dismissed
  • Test freeform response — verify it arrives in correct pane
  • Test with 2+ simultaneous Claude sessions — verify FIFO and correct routing
  • Test stale event discard (create old event file, verify skipped)
  • Test dead pane handling (kill a pane, verify event skipped)

File Tree

scripts/voice-pilot/
├── notify-hook.sh        # Claude Code Notification hook (jq + atomic writes)
└── voice_pilot.py        # Main daemon (stdlib only, no pip deps)

.claude/settings.json     # Hook configuration (one-time setup)

Dependencies

DependencyInstallPurpose
soxbrew install soxAudio recording via rec command
jqbrew install jqSafe JSON construction in hook script
OPENAI_API_KEYenv varWhisper API authentication
macOS saybuilt-inText-to-speech
macOS afplaybuilt-inNotification sound
tmuxalready installedSession management
Python 3built-in (macOS)Daemon runtime (stdlib only)

References

  • Brainstorm: docs/brainstorms/2026-03-07-voice-pilot-brainstorm.md
  • Overnight agent architecture: docs/solutions/architecture-patterns/overnight-autonomous-agent.md
  • Overnight launch script (tmux pattern): scripts/overnight-launch.sh
  • Claude Code hooks docs: https://code.claude.com/docs/en/hooks