Voicebot Platform

A self-hosted AI voice
platform you actually own.

Multi-tenant, real-time voice agents that answer phone calls, understand intent, and execute business actions — without sending your data through a third-party telephony cloud. Sub-second response times. One operator, many bots. Bring your own AI providers.

~1.3s
caller→bot response latency
N
bots per single deployment
100%
self-hosted, your infra
3 clicks
to launch a new bot

01Architecture at a glance

Four loosely-coupled tiers connected by a real-time event stream. New tenants don't require redeploys — bot definitions land in the admin database, the platform picks them up and starts answering calls within seconds.

Admin & Control Plane
Admin Console
Web UI for defining bots — scenario, voice, prompts, telephony details.
Configuration API
REST endpoints behind the console; source of truth for bot definitions.
Configuration Store
Durable storage for bot definitions, call records, and tenant data.
Real-time Backbone
Event Stream
Internal pub/sub fabric. Carries live transcripts, intents, and call lifecycle signals so any consumer can react in real time.
Voice Plane
SIP Carriers & Providers
Where real traffic comes from — production calls arrive via your telephony carrier, PSTN gateway, or upstream SIP provider over standard SIP/TLS.
Telephony Gateway
SIP signalling, media bridging, NAT traversal, secure transports.
PBX Core
Call routing, recording, supervised transfer, IVR fallback.
Voice Adapter
Bridges live RTP audio into the AI pipeline per call.
Conversation Engine
Streaming Speech-to-Text → LLM with function calling → Text-to-Speech, tuned for phone-quality input and low end-to-end latency.
Browser Dialer · demo only
A built-in WebRTC softphone for sales demos, QA, and internal testing — not the production traffic path.
Business Logic
Orchestrator
Listens to the event stream, executes tools (lookups, transactions, CRM writes), answers grounded questions, and feeds responses back to the caller.
Admin / Control Event Stream Voice Plane Business Logic

02How voice actually moves

Behind the scenes, a phone call is two parallel streams talking to each other — signalling (who's calling whom, what codecs to use, when to hang up) and media (the actual audio packets). Here's the short version of what's in flight.

Caller SIP carrier · browser Telephony Gateway SIP proxy + media bridge PBX Core routing · dialplan Bot Client SIP UA · audio adapter Bot Server Conversation Engine no SIP / no RTP here ▍ SIGNALLING — SIP INVITE (SIP+SDP) INVITE (routed) INVITE (bridged) 200 OK (SDP answer) 200 OK 200 OK ACK — call established ▍ MEDIA — RTP / SRTP ▍ AI BRIDGE — WebSocket RTP audio · 20 ms packets RTP audio · bot speech back raw audio frames synthesized speech WebSocket — not SIP/RTP ▍ TEARDOWN BYE → 200 OK · RTP stops · WS closes
Two networks, one call. The SIP/RTP world (cyan + green) carries the call from the caller through the gateway, PBX, and finally into the Bot Client — a standard SIP user that the PBX bridges to. Past that, the Bot Server speaks a different language: raw audio frames over a WebSocket (purple). Browser callers use the same SIP/RTP dressed up as WebRTC (SIP over WSS, DTLS-SRTP); the gateway translates transparently.

03How a call flows

A caller dialing one of the bot's extensions, end to end.

Caller SIP carrier · PSTN (or demo dialer) INVITE Telephony Gateway signalling · media NAT · TLS route PBX Core match → bot extension recording · IVR bridge Voice Adapter RTP ↔ AI pipeline opens per-call session audio Conversation Engine streaming STT · LLM · TTS ~1.3s end-to-end Admin Console define bots Config Store bot definitions picks bot per-call bot config publish transcripts intents · events Event Stream real-time fan-out — call lifecycle · transcripts · intents Orchestrator executes business logic CRM · DB · external APIs subscribe tool results Solid arrows: live call media & signalling · Dashed: configuration & tool results
1

Inbound call lands

A real call arrives from your SIP carrier or PSTN provider (or the built-in browser dialer for demos). The telephony gateway terminates signalling and bridges media.

2

Routed to the right bot

The PBX core matches the dialed number to one of the registered bots — picked from the admin store.

3

Conversation engine wakes up

A per-call AI pipeline is assembled from that bot's definition — voice, prompts, available tools.

4

Real-time conversation

Caller speech is transcribed live. The LLM responds, calls tools as needed, and the answer is spoken back — typically in ~1.3 seconds.

5

Events fan out

Every transcript line, detected intent, and lifecycle event is broadcast on the platform's event stream. Downstream consumers (CRM, analytics, custom orchestrator) react in real time.

04What you get

Production-grade voicebot capability without the third-party telephony bill or the data-exit problem.

Sub-second responsiveness

Streaming STT, streaming TTS, smart turn-taking. Callers don't notice a robot on the line.

🏢

True multi-tenant

One deployment hosts unlimited bots — each with its own voice, prompts, scenario, and phone number.

🔌

Bring your own AI

Pluggable Speech-to-Text, LLM, and Text-to-Speech providers. Pin per bot, swap without redeploy.

🛠

Scenarios as code

Drop in a new business scenario — pizza ordering, hotel booking, support triage — as a small module. The platform exposes its tools to the LLM automatically.

📡

Live event stream

Every transcript line, intent, and call event is broadcast in real time. Build dashboards, CRM hooks, or analytics on top.

🔒

Your infra, your data

Runs on any Linux box or VPS. Calls, transcripts, and customer data never leave your perimeter.

🌐

Carrier-grade SIP, demo-ready browser

Real traffic flows in over standard SIP from your carrier or PSTN provider. A built-in WebRTC dialer ships alongside it for sales demos and internal QA — no client install needed.

📞

Live observability

Per-call logs, intent traces, and metrics. See what the AI heard, what it decided, and what it said — in real time.

05Where we are

Shipping in phases — each shippable on its own, none blocking the next.

PHASE 1
Voice loop with scenario-based function calling
Done
PHASE 2
Multi-tenant admin console + live config sync
Done
PHASE 3
Real-time event stream — transcripts, intents, call lifecycle
Ongoing
PHASE 5
Knowledge base + retrieval-augmented answers
Upcoming