> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.formantai.com/llms.txt.
> For AI client integration (Claude Code, Cursor, etc.), connect to the MCP server at https://docs.formantai.com/_mcp/server.

# Providers

> Understand the AI, speech, voice, and telephony providers that power FormantAI agents.

Providers are the services behind every live voice agent. They work together in real time to hear the caller, reason about the next response, speak naturally, and connect through the phone network.

***

## Why provider choice matters

Choose fast speech and voice providers so callers experience natural turn-taking and fewer awkward pauses.

Match transcribers, voices, and models to your caller languages, accents, and regional expectations.

Use approved provider accounts, regional deployment preferences, SLAs, and compliance controls where required.

Tune provider selection for testing, production campaigns, high-value calls, or multilingual workflows.

Provider configuration should be tested with your actual prompt, tools, knowledge base, telephony route, and caller scenarios. Voice quality is a system property, not a single setting.

***

## How providers work together

```mermaid
flowchart LR
    A[Caller speaks] --> B[Speech-to-text]
    B --> C[LLM]
    C --> D[Tools and knowledge]
    D --> E[Text-to-speech]
    E --> F[Agent responds]
```

1. **Speech-to-text** converts caller audio into text.
2. **LLM** understands intent, plans the next response, and decides when to use tools.
3. **Tools and knowledge** provide business context or trigger real actions.
4. **Text-to-speech** turns the response into natural audio.
5. **Telephony** carries the call inbound or outbound.

***

## Supported provider categories

### LLMs

The reasoning layer of the voice agent. It interprets caller intent, follows the prompt, selects tools, and produces the next response.

Use OpenAI-style model APIs for fast setup and broad model compatibility.

Use Claude-family models for strong instruction following and complex reasoning.

Use Vertex-hosted models for Google Cloud-centered deployments.

Use enterprise Azure deployment patterns where required by your organization.

***

### Transcribers

Speech-to-text quality directly affects agent behavior. Accuracy, latency, endpointing, and language support all matter for production calls.

Fast streaming transcription for phone conversations.

Enterprise speech recognition with regional cloud options.

Speech recognition for Google Cloud-aligned stacks.

Useful for Indian language and regional voice workflows.

Speech services for voice workflows using ElevenLabs infrastructure.

Alternate transcription option for supported deployments.

***

### Synthesizers

Text-to-speech controls how the agent sounds. Choose voices that match the brand, region, call type, and latency target.

Natural, expressive voices for customer-facing agents.

Enterprise text-to-speech with broad language coverage.

Cloud-hosted voices for multilingual workflows.

Indian language voice support for regional deployments.

Low-latency voice synthesis for responsive conversations.

***

### Telephony providers

Telephony providers connect agents to phone numbers, inbound routes, outbound calls, and recordings.

Global telephony coverage for inbound and outbound calls.

India-focused telephony routing for supported workspaces.

Enterprise telecom connectivity for supported deployments.

SIP and real-time media routing patterns for advanced setups.

***

## Recommended setups

| Use case                | Recommended direction                                                                      |
| ----------------------- | ------------------------------------------------------------------------------------------ |
| **Quick testing**       | Use the workspace defaults and focus on prompt quality first.                              |
| **Production outreach** | Tune transcriber, model, voice, telephony route, webhooks, and retry rules together.       |
| **Indian languages**    | Prioritize regional STT/TTS options and test real caller accents.                          |
| **Lowest latency**      | Use low-latency STT and TTS, keep prompts tight, and reduce unnecessary tool calls.        |
| **Highest quality**     | Use stronger LLMs, expressive voices, high-quality knowledge sources, and robust QA calls. |

***

## Next steps

Combine providers with prompt, tools, knowledge, webhooks, and phone settings.

Understand phone numbers, inbound routing, outbound calls, and recording behavior.

Connect external APIs and workflows during live conversations.

Build, test, and prepare a voice agent for production calls.