Providers

View as Markdown

Providers are the services behind every live voice agent. They work together in real time to hear the caller, reason about the next response, speak naturally, and connect through the phone network.


Why provider choice matters

Lower latency

Choose fast speech and voice providers so callers experience natural turn-taking and fewer awkward pauses.

Better language coverage

Match transcribers, voices, and models to your caller languages, accents, and regional expectations.

Enterprise control

Use approved provider accounts, regional deployment preferences, SLAs, and compliance controls where required.

Cost and quality balance

Tune provider selection for testing, production campaigns, high-value calls, or multilingual workflows.

Provider configuration should be tested with your actual prompt, tools, knowledge base, telephony route, and caller scenarios. Voice quality is a system property, not a single setting.


How providers work together

  1. Speech-to-text converts caller audio into text.
  2. LLM understands intent, plans the next response, and decides when to use tools.
  3. Tools and knowledge provide business context or trigger real actions.
  4. Text-to-speech turns the response into natural audio.
  5. Telephony carries the call inbound or outbound.

Supported provider categories

LLMs

The reasoning layer of the voice agent. It interprets caller intent, follows the prompt, selects tools, and produces the next response.

OpenAI-compatible

Use OpenAI-style model APIs for fast setup and broad model compatibility.

Anthropic

Use Claude-family models for strong instruction following and complex reasoning.

Google Vertex

Use Vertex-hosted models for Google Cloud-centered deployments.

Azure-hosted models

Use enterprise Azure deployment patterns where required by your organization.


Transcribers

Speech-to-text quality directly affects agent behavior. Accuracy, latency, endpointing, and language support all matter for production calls.

Deepgram

Fast streaming transcription for phone conversations.

Azure Speech

Enterprise speech recognition with regional cloud options.

Google Speech

Speech recognition for Google Cloud-aligned stacks.

Sarvam

Useful for Indian language and regional voice workflows.

ElevenLabs

Speech services for voice workflows using ElevenLabs infrastructure.

WhisperRay

Alternate transcription option for supported deployments.


Synthesizers

Text-to-speech controls how the agent sounds. Choose voices that match the brand, region, call type, and latency target.

ElevenLabs

Natural, expressive voices for customer-facing agents.

Azure TTS

Enterprise text-to-speech with broad language coverage.

Google TTS

Cloud-hosted voices for multilingual workflows.

Sarvam

Indian language voice support for regional deployments.

Smallest

Low-latency voice synthesis for responsive conversations.


Telephony providers

Telephony providers connect agents to phone numbers, inbound routes, outbound calls, and recordings.

Twilio

Global telephony coverage for inbound and outbound calls.

Exotel

India-focused telephony routing for supported workspaces.

Tata Tele

Enterprise telecom connectivity for supported deployments.

LiveKit SIP

SIP and real-time media routing patterns for advanced setups.


Use caseRecommended direction
Quick testingUse the workspace defaults and focus on prompt quality first.
Production outreachTune transcriber, model, voice, telephony route, webhooks, and retry rules together.
Indian languagesPrioritize regional STT/TTS options and test real caller accents.
Lowest latencyUse low-latency STT and TTS, keep prompts tight, and reduce unnecessary tool calls.
Highest qualityUse stronger LLMs, expressive voices, high-quality knowledge sources, and robust QA calls.

Next steps