Providers
Providers are the services behind every live voice agent. They work together in real time to hear the caller, reason about the next response, speak naturally, and connect through the phone network.
Why provider choice matters
Choose fast speech and voice providers so callers experience natural turn-taking and fewer awkward pauses.
Match transcribers, voices, and models to your caller languages, accents, and regional expectations.
Use approved provider accounts, regional deployment preferences, SLAs, and compliance controls where required.
Tune provider selection for testing, production campaigns, high-value calls, or multilingual workflows.
Provider configuration should be tested with your actual prompt, tools, knowledge base, telephony route, and caller scenarios. Voice quality is a system property, not a single setting.
How providers work together
- Speech-to-text converts caller audio into text.
- LLM understands intent, plans the next response, and decides when to use tools.
- Tools and knowledge provide business context or trigger real actions.
- Text-to-speech turns the response into natural audio.
- Telephony carries the call inbound or outbound.
Supported provider categories
LLMs
The reasoning layer of the voice agent. It interprets caller intent, follows the prompt, selects tools, and produces the next response.
Use OpenAI-style model APIs for fast setup and broad model compatibility.
Use Claude-family models for strong instruction following and complex reasoning.
Use Vertex-hosted models for Google Cloud-centered deployments.
Use enterprise Azure deployment patterns where required by your organization.
Transcribers
Speech-to-text quality directly affects agent behavior. Accuracy, latency, endpointing, and language support all matter for production calls.
Fast streaming transcription for phone conversations.
Enterprise speech recognition with regional cloud options.
Speech recognition for Google Cloud-aligned stacks.
Useful for Indian language and regional voice workflows.
Speech services for voice workflows using ElevenLabs infrastructure.
Alternate transcription option for supported deployments.
Synthesizers
Text-to-speech controls how the agent sounds. Choose voices that match the brand, region, call type, and latency target.
Natural, expressive voices for customer-facing agents.
Enterprise text-to-speech with broad language coverage.
Cloud-hosted voices for multilingual workflows.
Indian language voice support for regional deployments.
Low-latency voice synthesis for responsive conversations.
Telephony providers
Telephony providers connect agents to phone numbers, inbound routes, outbound calls, and recordings.
Global telephony coverage for inbound and outbound calls.
India-focused telephony routing for supported workspaces.
Enterprise telecom connectivity for supported deployments.
SIP and real-time media routing patterns for advanced setups.
Recommended setups
Next steps
Combine providers with prompt, tools, knowledge, webhooks, and phone settings.
Understand phone numbers, inbound routing, outbound calls, and recording behavior.
Connect external APIs and workflows during live conversations.
Build, test, and prepare a voice agent for production calls.