Architecture

How kiru works.

A deep dive into the system architecture, from the browser to the phone call and back. Six diagrams covering every layer of the stack.

System Overview

End-to-end architecture.

A complete voice AI pipeline, from your browser to the phone and back. The frontend creates a task, the backend orchestrates the call through Twilio, processes speech with Deepgram, generates responses via the LLM, and streams everything back in real time.

LLM Providers

Multi-provider LLM support.

Switch between OpenAI, Anthropic, or a fully local setup with a single environment variable. In production, kiru runs on a local Asus ROG GX10 workstation with Ollama serving qwen3:30b-a3b for zero API costs and full privacy.

Call Lifecycle

State machine call flow.

Every call moves through well-defined states. After the call ends, automatic post-call analysis scores the negotiation, extracts tactics used, and generates a summary, all persisted for the history view.

Audio Pipeline

Real-time audio processing.

Twilio streams raw mulaw audio over a WebSocket. The orchestrator pipes it to Deepgram for speech-to-text, feeds transcripts to the LLM, converts responses to speech, and sends audio back, all in under a second of latency.

Negotiation Engine

Adaptive strategy.

The negotiation engine selects a phase based on turn count: opening, discovery, proposal, or closing. Each phase uses different tactics and tone. Post-call, the full conversation is analyzed and scored.

Frontend Updates

Live WebSocket events.

The browser subscribes to a WebSocket channel for the active call. Every transcript update, agent thinking state, status change, and analysis result is pushed in real time. No polling, no delays.

Ready to try it?

Start a negotiation and see the full pipeline in action.

kiru
Built byPranav·Ethan·Jayanth
Launch AppTreeHacks 2026