Architecture
A deep dive into the system architecture, from the browser to the phone call and back. Six diagrams covering every layer of the stack.
System Overview
A complete voice AI pipeline, from your browser to the phone and back. The frontend creates a task, the backend orchestrates the call through Twilio, processes speech with Deepgram, generates responses via the LLM, and streams everything back in real time.
LLM Providers
Switch between OpenAI, Anthropic, or a fully local setup with a single environment variable. In production, kiru runs on a local Asus ROG GX10 workstation with Ollama serving qwen3:30b-a3b for zero API costs and full privacy.
Call Lifecycle
Every call moves through well-defined states. After the call ends, automatic post-call analysis scores the negotiation, extracts tactics used, and generates a summary, all persisted for the history view.
Audio Pipeline
Twilio streams raw mulaw audio over a WebSocket. The orchestrator pipes it to Deepgram for speech-to-text, feeds transcripts to the LLM, converts responses to speech, and sends audio back, all in under a second of latency.
Negotiation Engine
The negotiation engine selects a phase based on turn count: opening, discovery, proposal, or closing. Each phase uses different tactics and tone. Post-call, the full conversation is analyzed and scored.
Frontend Updates
The browser subscribes to a WebSocket channel for the active call. Every transcript update, agent thinking state, status change, and analysis result is pushed in real time. No polling, no delays.
Start a negotiation and see the full pipeline in action.