Create voice agent
Voice AI
Create Voice Agent
POST
Create voice agent
The Create Voice Agent API allows you to create and configure AI voice agents with comprehensive settings including voice configuration, speech-to-text, LLM selection, and advanced call handling features.
The
Deepgram (Provider:
Gladia (Provider:
Sarvam (Provider:
Example:
Example:
Sensitivity Levels:
Example:
Common causes:
Common causes:
Common causes:
API Endpoint
POST/create-agent
Content-Type: application/json
Authentication: Required (Token parameter)
Request Body
Required Fields
| Field | Type | Description |
|---|---|---|
agent_name | string | Name of the voice agent (required) |
prompt | string | System prompt/instructions that define the agent’s behavior and personality (required) |
voice object is optional — include it to configure TTS provider, voice ID, and voice settings.
Optional Fields
Basic Information
| Field | Type | Description | Default |
|---|---|---|---|
description | string | Description of the agent’s purpose | Empty string |
timezone | string | Timezone for the agent (e.g., “America/New_York”, “Europe/London”) | UTC |
greeting | string | The agent’s first message when the call starts | None |
session_data_webhook | string | Webhook URL to receive end-of-session data | None |
Voice Configuration
Thevoice object is optional and, if provided, contains the following properties:
| Property | Type | Required | Description |
|---|---|---|---|
provider | string | No | Voice provider: elevenlabs, openai, deepgram, sarvam |
voice_id | string | Yes | Unique identifier for the voice |
model | string | No | TTS model to use (see Voice Models below) |
settings | object | No | Voice settings configuration (see Voice Settings below) |
Voice Providers
| Provider | Value | Description |
|---|---|---|
| ElevenLabs | elevenlabs | High-quality AI voice synthesis with natural-sounding voices and emotional range |
| OpenAI | openai | Advanced text-to-speech with multiple voice options |
| Deepgram | deepgram | Real-time speech recognition and voice synthesis |
| Sarvam | sarvam | Multilingual voice synthesis optimized for Indian languages |
Voice Models
ElevenLabs Models
| Model | Value | Description |
|---|---|---|
| Turbo v2.5 | eleven_turbo_v2_5 | Latest high-speed model with low latency (Recommended) |
| Multilingual v2 | eleven_multilingual_v2 | High-quality multilingual voice synthesis |
| Monolingual v1 | eleven_monolingual_v1 | English-only optimized model |
OpenAI Models
| Model | Value | Description |
|---|---|---|
| TTS 1 | tts-1 | Standard quality, faster generation |
| TTS 1 HD | tts-1-hd | High definition, better quality |
Voice Settings
Thesettings object contains fine-tuning parameters for voice output:
| Property | Type | Range | Description | Default |
|---|---|---|---|---|
stability | number | 0.0 - 1.0 | Controls voice consistency. Higher = more stable, Lower = more expressive | 0.5 |
voice_style | number | 0 - 100 | Style intensity for the voice | 0 |
speed | number | 0.5 - 2.0 | Speech speed multiplier | 1.0 |
speaker_boost | boolean | true/false | Enhances speaker characteristics | true |
similarity_boost | number | 0.0 - 1.0 | How closely to match original voice | 0.75 |
tone | string | - | Voice tone: professional, friendly, neutral, enthusiastic | None |
style | string | - | Speaking style: classic, conversational, narrative | classic |
instruction_sensitivity | string | - | How strictly to follow instructions: low, medium, high | medium |
Speech-to-Text Configuration
Thespeech_to_text object configures the transcription service. Use full language names (not codes) for the language field — for example english, hindi, multi, spanish, etc. Supported values include:
english,hindi,multi,albanian,arabic,armenian,azerbaijani,belarusian,bengali,bosnian,bulgarian,catalan,chinese,croatian,czech,danish,dutch,english_australia,english_india,english_new_zealand,english_uk,english_us,english_spanish,estonian,finnish,french,galician,georgian,german,german_switzerland,greek,gujarati,haitian_creole,hausa,hebrew,afrikaans,hungarian,icelandic,indonesian,italian,japanese,javanese,kannada,kazakh,khmer,korean,latvian,lithuanian,macedonian,malay,malayalam,maori,marathi,nepali,norwegian,persian,polish,portuguese,portuguese_brazil,punjabi,romanian,russian,serbian,shona,slovak,slovenian,somali,spanish,spanish_latin_america,sundanese,swahili,swedish,tagalog,tamil,tajik,telugu,thai,tswana,turkish,ukrainian,urdu,vietnamese,welsh.
speech_to_text object configures the transcription service:
| Property | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | STT provider (see providers below) |
model | string | Yes | Model to use (see models below) |
language | string | Yes | Language name (see languages above) |
STT Providers and Models
Deepgram (Provider: deepgram)
| Model | Value | Description | Use Case |
|---|---|---|---|
| Nova 2 | nova-2 | General purpose model | Default choice for most use cases |
| Nova 2 General | nova-2-general | General purpose transcription | Versatile transcription |
| Nova 2 Meeting | nova-2-meeting | Optimized for meetings | Conference calls, meetings |
| Nova 2 Phone Call | nova-2-phonecall | Optimized for phone calls | Phone conversations (Recommended) |
| Nova 2 Finance | nova-2-finance | Optimized for finance | Banking, financial services |
| Nova 2 Conversational AI | nova-2-conversationalai | Optimized for conversational AI | AI assistants, chatbots |
| Nova 2 Video | nova-2-video | Optimized for video | Video content transcription |
| Nova 2 Medical | nova-2-medical | Optimized for medical | Healthcare conversations |
| Nova 2 Drivethru | nova-2-drivethru | Optimized for drive-thru | Drive-thru scenarios |
| Nova 2 Automotive | nova-2-automotive | Optimized for automotive | Car environments |
| Nova 2 Legal | nova-2-legal | Optimized for legal | Legal conversations |
| Nova 2 Government | nova-2-government | Optimized for government | Government services |
| Nova 2 Enterprise | nova-2-enterprise | Optimized for enterprise | Enterprise applications |
| Nova 3 | nova-3 | Latest general purpose model | Most accurate, latest technology |
Gladia (Provider: gladia)
| Model | Value | Description |
|---|---|---|
| Gladia | gladia | High-accuracy multilingual transcription |
Sarvam (Provider: sarvam)
| Model | Value | Description |
|---|---|---|
| Sarvam | sarvam | Optimized for Indian languages |
LLM Configuration
Thellm object configures the language model:
| Property | Type | Required | Description |
|---|---|---|---|
llm | string | Yes | LLM provider and model (see options below) |
model | string | Yes | Model name (typically same as llm) |
Available LLM Models
OpenAI Models
| Model | Value | Description | Use Case |
|---|---|---|---|
| GPT-4o | gpt-4o | Most capable model, multimodal | Complex reasoning, best quality (Recommended) |
| GPT-4o Mini | gpt-4o-mini | Smaller, faster, cost-effective | Fast responses, simpler tasks |
| GPT-4 Turbo | gpt-4-turbo | High performance GPT-4 | Advanced reasoning |
| GPT-4.1 | gpt-4.1 | Latest GPT-4 variant | Enhanced capabilities |
| GPT-4.1 Mini | gpt-4.1-mini | Compact GPT-4.1 | Efficient processing |
| GPT-4.1 Nano | gpt-4.1-nano | Ultra-fast GPT-4.1 | Ultra-low latency |
| GPT-3.5 Turbo | gpt-3.5-turbo | Fast and cost-effective | Simple conversations |
OpenAI Realtime Models
| Model | Value | Description |
|---|---|---|
| GPT-4o Realtime | gpt-4o-realtime-preview | Real-time audio processing |
| GPT-4o Mini Realtime | gpt-4o-mini-realtime-preview | Faster real-time processing |
Meta LLaMA Models
| Model | Value | Description | Use Case |
|---|---|---|---|
| LLaMA 3.1 405B | llama-3-1-405b | Largest, most capable | Complex tasks, high accuracy |
| LLaMA 3.1 70B | llama-3-1-70b | Balanced performance | Good quality, reasonable speed |
| LLaMA 3.1 8B | llama-3-1-8b | Fast and efficient | Quick responses |
| LLaMA 3 70B | llama-3-70b | Previous generation | Reliable performance |
Mistral Models
| Model | Value | Description |
|---|---|---|
| Mistral Large 2407 | mistral-large-2407 | High-performance European model |
Other Models
| Model | Value | Description |
|---|---|---|
| L3.1 70B Euryale v2.2 | l3.1-70b-euryale-v2.2 | Fine-tuned LLaMA variant |
| DeepSeek v3 | deepseek-v3 | Advanced reasoning model |
Configurations
Theconfigurations object contains advanced call handling settings:
Confidence Threshold
| Property | Type | Range | Description | Default |
|---|---|---|---|---|
confidence_threshold | number | 0.0 - 1.0 | Minimum confidence for speech recognition | 0.8 |
Do Not Call Detection
| Property | Type | Description | Default |
|---|---|---|---|
do_not_call_detection | boolean | Detect and respect “do not call” indicators | false |
Agent Terminate Call
Configuration for when the agent can end calls autonomously:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Allow agent to terminate calls | false |
instruction | string | Instructions for when to end calls | None |
message | string | Message to say before ending call | None |
Inactivity Handling
Configuration for handling user inactivity:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Enable inactivity detection | false |
idle_time | number | Seconds of silence before prompting (5-120) | 30 |
message | string | Message to say after idle time | None |
Interruption Settings
Configuration for handling user interruptions:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Allow users to interrupt the agent | true |
value | number | Interruption sensitivity (1-5, higher = more sensitive) | 3 |
1- Very low (agent rarely gets interrupted)2- Low3- Medium (Recommended)4- High5- Very high (agent easily interrupted)
Voicemail Handling
Configuration for voicemail detection and handling:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Enable voicemail detection | false |
message | string | Message to leave if voicemail detected | None |
Response
Success Response
Status Code:200 OK
Error Responses
400 - Bad Request
- Missing required fields (
agent_name,prompt, orvoice) - Invalid data types
- Invalid provider or model values
401 - Unauthorized
- Missing
authorizationheader ortokenparameter - Invalid or expired API key
- Insufficient permissions
422 - Validation Error
- Invalid enum values (provider, model names)
- Out of range values (stability, speed, confidence_threshold)
- Invalid format (timezone, language codes)
500 - Internal Server Error
Example Requests
Minimal Request
Complete Request with All Features
Important Notes
-
Required Fields: Only
agent_nameandpromptare required. Thevoiceobject is optional — includevoice(withproviderandvoice_id) when you want to configure TTS for the agent. All other fields are optional. - Voice IDs: Get available voice IDs from the List Voices API.
-
Webhooks: If you provide a
session_data_webhook, ensure your endpoint can handle POST requests with session data. - Timezones: Use standard timezone strings (e.g., “America/New_York”, “Europe/London”, “Asia/Tokyo”).
-
Language Names: Use full language names (e.g.,
english,hindi,spanish) or region-specific variants (e.g.,english_us,english_uk) as shown in the Speech-to-Text section above. -
Model Compatibility: Ensure the voice model is compatible with your chosen provider. For example,
eleven_turbo_v2_5only works with ElevenLabs. - Rate Limits: API calls are subject to rate limiting based on your plan. See pricing documentation for details.
- Testing: After creating an agent, test it thoroughly before using in production. Use the Make Call API to test your agent.
- Attaching phone numbers is necessary to place calls via agents
Related Endpoints
- List Voices - Get available voice IDs
- Update Voice Agent - Modify agent settings
- List Voice Agents - View all agents
- Delete Voice Agent - Remove an agent
- Make Call - Test your agent with a call

