API Endpoint
POST/create-agent
Content-Type: application/json
Authentication: Required (Token parameter)
Request Body
Required Fields
| Field | Type | Description |
|---|---|---|
agent_name | string | Name of the voice agent (required) |
prompt | string | System prompt/instructions that define the agent’s behavior and personality (required) |
voice object is optional — include it to configure TTS provider, voice ID, and voice settings.
Optional Fields
Basic Information
| Field | Type | Description | Default |
|---|---|---|---|
description | string | Description of the agent’s purpose | Empty string |
timezone | string | Timezone for the agent (e.g., “America/New_York”, “Europe/London”) | UTC |
greeting | string | The agent’s first message when the call starts | None |
session_data_webhook | string | Webhook URL to receive end-of-session data | None |
Voice Configuration
Thevoice object is optional and, if provided, contains the following properties:
| Property | Type | Required | Description |
|---|---|---|---|
provider | string | No | Voice provider: elevenlabs, openai, deepgram, sarvam |
voice_id | string | Yes | Unique identifier for the voice |
model | string | No | TTS model to use (see Voice Models below) |
settings | object | No | Voice settings configuration (see Voice Settings below) |
Voice Providers
| Provider | Value | Description |
|---|---|---|
| ElevenLabs | elevenlabs | High-quality AI voice synthesis with natural-sounding voices and emotional range |
| OpenAI | openai | Advanced text-to-speech with multiple voice options |
| Deepgram | deepgram | Real-time speech recognition and voice synthesis |
| Sarvam | sarvam | Multilingual voice synthesis optimized for Indian languages |
Voice Models
ElevenLabs Models
| Model | Value | Description |
|---|---|---|
| Turbo v2.5 | eleven_turbo_v2_5 | Latest high-speed model with low latency (Recommended) |
| Multilingual v2 | eleven_multilingual_v2 | High-quality multilingual voice synthesis |
| Monolingual v1 | eleven_monolingual_v1 | English-only optimized model |
OpenAI Models
| Model | Value | Description |
|---|---|---|
| TTS 1 | tts-1 | Standard quality, faster generation |
| TTS 1 HD | tts-1-hd | High definition, better quality |
Voice Settings
Thesettings object contains fine-tuning parameters for voice output:
| Property | Type | Range | Description | Default |
|---|---|---|---|---|
stability | number | 0.0 - 1.0 | Controls voice consistency. Higher = more stable, Lower = more expressive | 0.5 |
voice_style | number | 0 - 100 | Style intensity for the voice | 0 |
speed | number | 0.5 - 2.0 | Speech speed multiplier | 1.0 |
speaker_boost | boolean | true/false | Enhances speaker characteristics | true |
similarity_boost | number | 0.0 - 1.0 | How closely to match original voice | 0.75 |
tone | string | - | Voice tone: professional, friendly, neutral, enthusiastic | None |
style | string | - | Speaking style: classic, conversational, narrative | classic |
instruction_sensitivity | string | - | How strictly to follow instructions: low, medium, high | medium |
Speech-to-Text Configuration
Thespeech_to_text object configures the transcription service. Use full language names (not codes) for the language field — for example english, hindi, multi, spanish, etc. Supported values include:
english,hindi,multi,albanian,arabic,armenian,azerbaijani,belarusian,bengali,bosnian,bulgarian,catalan,chinese,croatian,czech,danish,dutch,english_australia,english_india,english_new_zealand,english_uk,english_us,english_spanish,estonian,finnish,french,galician,georgian,german,german_switzerland,greek,gujarati,haitian_creole,hausa,hebrew,afrikaans,hungarian,icelandic,indonesian,italian,japanese,javanese,kannada,kazakh,khmer,korean,latvian,lithuanian,macedonian,malay,malayalam,maori,marathi,nepali,norwegian,persian,polish,portuguese,portuguese_brazil,punjabi,romanian,russian,serbian,shona,slovak,slovenian,somali,spanish,spanish_latin_america,sundanese,swahili,swedish,tagalog,tamil,tajik,telugu,thai,tswana,turkish,ukrainian,urdu,vietnamese,welsh.
speech_to_text object configures the transcription service:
| Property | Type | Required | Description |
|---|---|---|---|
provider | string | Yes | STT provider (see providers below) |
model | string | Yes | Model to use (see models below) |
language | string | Yes | Language name (see languages above) |
STT Providers and Models
Deepgram (Provider: deepgram)
| Model | Value | Description | Use Case |
|---|---|---|---|
| Nova 2 | nova-2 | General purpose model | Default choice for most use cases |
| Nova 2 General | nova-2-general | General purpose transcription | Versatile transcription |
| Nova 2 Meeting | nova-2-meeting | Optimized for meetings | Conference calls, meetings |
| Nova 2 Phone Call | nova-2-phonecall | Optimized for phone calls | Phone conversations (Recommended) |
| Nova 2 Finance | nova-2-finance | Optimized for finance | Banking, financial services |
| Nova 2 Conversational AI | nova-2-conversationalai | Optimized for conversational AI | AI assistants, chatbots |
| Nova 2 Video | nova-2-video | Optimized for video | Video content transcription |
| Nova 2 Medical | nova-2-medical | Optimized for medical | Healthcare conversations |
| Nova 2 Drivethru | nova-2-drivethru | Optimized for drive-thru | Drive-thru scenarios |
| Nova 2 Automotive | nova-2-automotive | Optimized for automotive | Car environments |
| Nova 2 Legal | nova-2-legal | Optimized for legal | Legal conversations |
| Nova 2 Government | nova-2-government | Optimized for government | Government services |
| Nova 2 Enterprise | nova-2-enterprise | Optimized for enterprise | Enterprise applications |
| Nova 3 | nova-3 | Latest general purpose model | Most accurate, latest technology |
Gladia (Provider: gladia)
| Model | Value | Description |
|---|---|---|
| Gladia | gladia | High-accuracy multilingual transcription |
Sarvam (Provider: sarvam)
| Model | Value | Description |
|---|---|---|
| Sarvam | sarvam | Optimized for Indian languages |
LLM Configuration
Thellm object configures the language model:
| Property | Type | Required | Description |
|---|---|---|---|
llm | string | Yes | LLM provider and model (see options below) |
model | string | Yes | Model name (typically same as llm) |
Available LLM Models
OpenAI Models
| Model | Value | Description | Use Case |
|---|---|---|---|
| GPT-4o | gpt-4o | Most capable model, multimodal | Complex reasoning, best quality (Recommended) |
| GPT-4o Mini | gpt-4o-mini | Smaller, faster, cost-effective | Fast responses, simpler tasks |
| GPT-4 Turbo | gpt-4-turbo | High performance GPT-4 | Advanced reasoning |
| GPT-4.1 | gpt-4.1 | Latest GPT-4 variant | Enhanced capabilities |
| GPT-4.1 Mini | gpt-4.1-mini | Compact GPT-4.1 | Efficient processing |
| GPT-4.1 Nano | gpt-4.1-nano | Ultra-fast GPT-4.1 | Ultra-low latency |
| GPT-3.5 Turbo | gpt-3.5-turbo | Fast and cost-effective | Simple conversations |
OpenAI Realtime Models
| Model | Value | Description |
|---|---|---|
| GPT-4o Realtime | gpt-4o-realtime-preview | Real-time audio processing |
| GPT-4o Mini Realtime | gpt-4o-mini-realtime-preview | Faster real-time processing |
Meta LLaMA Models
| Model | Value | Description | Use Case |
|---|---|---|---|
| LLaMA 3.1 405B | llama-3-1-405b | Largest, most capable | Complex tasks, high accuracy |
| LLaMA 3.1 70B | llama-3-1-70b | Balanced performance | Good quality, reasonable speed |
| LLaMA 3.1 8B | llama-3-1-8b | Fast and efficient | Quick responses |
| LLaMA 3 70B | llama-3-70b | Previous generation | Reliable performance |
Mistral Models
| Model | Value | Description |
|---|---|---|
| Mistral Large 2407 | mistral-large-2407 | High-performance European model |
Other Models
| Model | Value | Description |
|---|---|---|
| L3.1 70B Euryale v2.2 | l3.1-70b-euryale-v2.2 | Fine-tuned LLaMA variant |
| DeepSeek v3 | deepseek-v3 | Advanced reasoning model |
Configurations
Theconfigurations object contains advanced call handling settings:
Confidence Threshold
| Property | Type | Range | Description | Default |
|---|---|---|---|---|
confidence_threshold | number | 0.0 - 1.0 | Minimum confidence for speech recognition | 0.8 |
Do Not Call Detection
| Property | Type | Description | Default |
|---|---|---|---|
do_not_call_detection | boolean | Detect and respect “do not call” indicators | false |
Agent Terminate Call
Configuration for when the agent can end calls autonomously:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Allow agent to terminate calls | false |
instruction | string | Instructions for when to end calls | None |
message | string | Message to say before ending call | None |
Inactivity Handling
Configuration for handling user inactivity:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Enable inactivity detection | false |
idle_time | number | Seconds of silence before prompting (5-120) | 30 |
message | string | Message to say after idle time | None |
Interruption Settings
Configuration for handling user interruptions:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Allow users to interrupt the agent | true |
value | number | Interruption sensitivity (1-5, higher = more sensitive) | 3 |
1- Very low (agent rarely gets interrupted)2- Low3- Medium (Recommended)4- High5- Very high (agent easily interrupted)
Voicemail Handling
Configuration for voicemail detection and handling:| Property | Type | Description | Default |
|---|---|---|---|
enabled | boolean | Enable voicemail detection | false |
message | string | Message to leave if voicemail detected | None |
Response
Success Response
Status Code:200 OK
Error Responses
400 - Bad Request
- Missing required fields (
agent_name,prompt, orvoice) - Invalid data types
- Invalid provider or model values
401 - Unauthorized
- Missing
authorizationheader ortokenparameter - Invalid or expired API key
- Insufficient permissions
422 - Validation Error
- Invalid enum values (provider, model names)
- Out of range values (stability, speed, confidence_threshold)
- Invalid format (timezone, language codes)
500 - Internal Server Error
Example Requests
Minimal Request
Complete Request with All Features
Important Notes
-
Required Fields: Only
agent_nameandpromptare required. Thevoiceobject is optional — includevoice(withproviderandvoice_id) when you want to configure TTS for the agent. All other fields are optional. - Voice IDs: Get available voice IDs from the List Voices API.
-
Webhooks: If you provide a
session_data_webhook, ensure your endpoint can handle POST requests with session data. - Timezones: Use standard timezone strings (e.g., “America/New_York”, “Europe/London”, “Asia/Tokyo”).
-
Language Names: Use full language names (e.g.,
english,hindi,spanish) or region-specific variants (e.g.,english_us,english_uk) as shown in the Speech-to-Text section above. -
Model Compatibility: Ensure the voice model is compatible with your chosen provider. For example,
eleven_turbo_v2_5only works with ElevenLabs. - Rate Limits: API calls are subject to rate limiting based on your plan. See pricing documentation for details.
- Testing: After creating an agent, test it thoroughly before using in production. Use the Make Call API to test your agent.
- Attaching phone numbers is necessary to place calls via agents
Related Endpoints
- List Voices - Get available voice IDs
- Update Voice Agent - Modify agent settings
- List Voice Agents - View all agents
- Delete Voice Agent - Remove an agent
- Make Call - Test your agent with a call

