Skip to main content
The Create Voice Agent API allows you to create and configure AI voice agents with comprehensive settings including voice configuration, speech-to-text, LLM selection, and advanced call handling features.

API Endpoint

POST /create-agent Content-Type: application/json Authentication: Required (Token parameter)

Request Body

{
  "agent_name": "Customer Support Agent",
  "description": "Handles customer inquiries and support requests",
  "prompt": "You are a helpful customer support agent. Answer questions politely and professionally.",
  "timezone": "America/New_York",
  "greeting": "Hello! Thank you for calling. How can I assist you today?",
  "session_data_webhook": "https://www.tryunleashx.com/webhooks/session-data",
  "voice": {
    "provider": "elevenlabs",
    "voice_id": "RXe6OFmxoC0nlSWpuCDy",
    "model": "eleven_turbo_v2_5",
    "settings": {
      "stability": 0.5,
      "voice_style": 1,
      "speed": 1.0,
      "speaker_boost": true,
      "similarity_boost": 0.75,
      "tone": "professional",
      "style": "classic",
      "instruction_sensitivity": "medium"
    }
  },
  "speech_to_text": {
    "provider": "deepgram",
    "model": "nova-2",
    "language": "english"
  },
  "llm": {
    "llm": "gpt-4o",
    "model": "gpt-4o"
  },
  "configurations": {
    "confidence_threshold": 0.8,
    "do_not_call_detection": true,
    "agent_terminate_call": {
      "enabled": true,
      "instruction": "End the call politely when the conversation is complete",
      "message": "Thank you for calling. Have a great day!"
    },
    "inactivity_handling": {
      "enabled": true,
      "idle_time": 30,
      "message": "Are you still there? Let me know if you need any help."
    },
    "interruption": {
      "enabled": true,
      "value": 3
    },
    "voicemail": {
      "enabled": true,
      "message": "Hello, this is a message from Customer Support. Please call us back at your convenience."
    }
  }
}

Required Fields

FieldTypeDescription
agent_namestringName of the voice agent (required)
promptstringSystem prompt/instructions that define the agent’s behavior and personality (required)
The voice object is optional — include it to configure TTS provider, voice ID, and voice settings.

Optional Fields

Basic Information

FieldTypeDescriptionDefault
descriptionstringDescription of the agent’s purposeEmpty string
timezonestringTimezone for the agent (e.g., “America/New_York”, “Europe/London”)UTC
greetingstringThe agent’s first message when the call startsNone
session_data_webhookstringWebhook URL to receive end-of-session dataNone

Voice Configuration

The voice object is optional and, if provided, contains the following properties:
PropertyTypeRequiredDescription
providerstringNoVoice provider: elevenlabs, openai, deepgram, sarvam
voice_idstringYesUnique identifier for the voice
modelstringNoTTS model to use (see Voice Models below)
settingsobjectNoVoice settings configuration (see Voice Settings below)

Voice Providers

ProviderValueDescription
ElevenLabselevenlabsHigh-quality AI voice synthesis with natural-sounding voices and emotional range
OpenAIopenaiAdvanced text-to-speech with multiple voice options
DeepgramdeepgramReal-time speech recognition and voice synthesis
SarvamsarvamMultilingual voice synthesis optimized for Indian languages

Voice Models

ElevenLabs Models

ModelValueDescription
Turbo v2.5eleven_turbo_v2_5Latest high-speed model with low latency (Recommended)
Multilingual v2eleven_multilingual_v2High-quality multilingual voice synthesis
Monolingual v1eleven_monolingual_v1English-only optimized model

OpenAI Models

ModelValueDescription
TTS 1tts-1Standard quality, faster generation
TTS 1 HDtts-1-hdHigh definition, better quality

Voice Settings

The settings object contains fine-tuning parameters for voice output:
PropertyTypeRangeDescriptionDefault
stabilitynumber0.0 - 1.0Controls voice consistency. Higher = more stable, Lower = more expressive0.5
voice_stylenumber0 - 100Style intensity for the voice0
speednumber0.5 - 2.0Speech speed multiplier1.0
speaker_boostbooleantrue/falseEnhances speaker characteristicstrue
similarity_boostnumber0.0 - 1.0How closely to match original voice0.75
tonestring-Voice tone: professional, friendly, neutral, enthusiasticNone
stylestring-Speaking style: classic, conversational, narrativeclassic
instruction_sensitivitystring-How strictly to follow instructions: low, medium, highmedium

Speech-to-Text Configuration

The speech_to_text object configures the transcription service. Use full language names (not codes) for the language field — for example english, hindi, multi, spanish, etc. Supported values include:
  • english, hindi, multi, albanian, arabic, armenian, azerbaijani, belarusian, bengali, bosnian, bulgarian, catalan, chinese, croatian, czech, danish, dutch, english_australia, english_india, english_new_zealand, english_uk, english_us, english_spanish, estonian, finnish, french, galician, georgian, german, german_switzerland, greek, gujarati, haitian_creole, hausa, hebrew, afrikaans, hungarian, icelandic, indonesian, italian, japanese, javanese, kannada, kazakh, khmer, korean, latvian, lithuanian, macedonian, malay, malayalam, maori, marathi, nepali, norwegian, persian, polish, portuguese, portuguese_brazil, punjabi, romanian, russian, serbian, shona, slovak, slovenian, somali, spanish, spanish_latin_america, sundanese, swahili, swedish, tagalog, tamil, tajik, telugu, thai, tswana, turkish, ukrainian, urdu, vietnamese, welsh.
The speech_to_text object configures the transcription service:
PropertyTypeRequiredDescription
providerstringYesSTT provider (see providers below)
modelstringYesModel to use (see models below)
languagestringYesLanguage name (see languages above)

STT Providers and Models

Deepgram (Provider: deepgram)

ModelValueDescriptionUse Case
Nova 2nova-2General purpose modelDefault choice for most use cases
Nova 2 Generalnova-2-generalGeneral purpose transcriptionVersatile transcription
Nova 2 Meetingnova-2-meetingOptimized for meetingsConference calls, meetings
Nova 2 Phone Callnova-2-phonecallOptimized for phone callsPhone conversations (Recommended)
Nova 2 Financenova-2-financeOptimized for financeBanking, financial services
Nova 2 Conversational AInova-2-conversationalaiOptimized for conversational AIAI assistants, chatbots
Nova 2 Videonova-2-videoOptimized for videoVideo content transcription
Nova 2 Medicalnova-2-medicalOptimized for medicalHealthcare conversations
Nova 2 Drivethrunova-2-drivethruOptimized for drive-thruDrive-thru scenarios
Nova 2 Automotivenova-2-automotiveOptimized for automotiveCar environments
Nova 2 Legalnova-2-legalOptimized for legalLegal conversations
Nova 2 Governmentnova-2-governmentOptimized for governmentGovernment services
Nova 2 Enterprisenova-2-enterpriseOptimized for enterpriseEnterprise applications
Nova 3nova-3Latest general purpose modelMost accurate, latest technology

Gladia (Provider: gladia)

ModelValueDescription
GladiagladiaHigh-accuracy multilingual transcription

Sarvam (Provider: sarvam)

ModelValueDescription
SarvamsarvamOptimized for Indian languages

LLM Configuration

The llm object configures the language model:
PropertyTypeRequiredDescription
llmstringYesLLM provider and model (see options below)
modelstringYesModel name (typically same as llm)

Available LLM Models

OpenAI Models

ModelValueDescriptionUse Case
GPT-4ogpt-4oMost capable model, multimodalComplex reasoning, best quality (Recommended)
GPT-4o Minigpt-4o-miniSmaller, faster, cost-effectiveFast responses, simpler tasks
GPT-4 Turbogpt-4-turboHigh performance GPT-4Advanced reasoning
GPT-4.1gpt-4.1Latest GPT-4 variantEnhanced capabilities
GPT-4.1 Minigpt-4.1-miniCompact GPT-4.1Efficient processing
GPT-4.1 Nanogpt-4.1-nanoUltra-fast GPT-4.1Ultra-low latency
GPT-3.5 Turbogpt-3.5-turboFast and cost-effectiveSimple conversations

OpenAI Realtime Models

ModelValueDescription
GPT-4o Realtimegpt-4o-realtime-previewReal-time audio processing
GPT-4o Mini Realtimegpt-4o-mini-realtime-previewFaster real-time processing

Meta LLaMA Models

ModelValueDescriptionUse Case
LLaMA 3.1 405Bllama-3-1-405bLargest, most capableComplex tasks, high accuracy
LLaMA 3.1 70Bllama-3-1-70bBalanced performanceGood quality, reasonable speed
LLaMA 3.1 8Bllama-3-1-8bFast and efficientQuick responses
LLaMA 3 70Bllama-3-70bPrevious generationReliable performance

Mistral Models

ModelValueDescription
Mistral Large 2407mistral-large-2407High-performance European model

Other Models

ModelValueDescription
L3.1 70B Euryale v2.2l3.1-70b-euryale-v2.2Fine-tuned LLaMA variant
DeepSeek v3deepseek-v3Advanced reasoning model

Configurations

The configurations object contains advanced call handling settings:

Confidence Threshold

PropertyTypeRangeDescriptionDefault
confidence_thresholdnumber0.0 - 1.0Minimum confidence for speech recognition0.8

Do Not Call Detection

PropertyTypeDescriptionDefault
do_not_call_detectionbooleanDetect and respect “do not call” indicatorsfalse

Agent Terminate Call

Configuration for when the agent can end calls autonomously:
PropertyTypeDescriptionDefault
enabledbooleanAllow agent to terminate callsfalse
instructionstringInstructions for when to end callsNone
messagestringMessage to say before ending callNone
Example:
{
  "enabled": true,
  "instruction": "End the call when the customer says goodbye or has no more questions",
  "message": "Thank you for calling. Have a great day!"
}

Inactivity Handling

Configuration for handling user inactivity:
PropertyTypeDescriptionDefault
enabledbooleanEnable inactivity detectionfalse
idle_timenumberSeconds of silence before prompting (5-120)30
messagestringMessage to say after idle timeNone
Example:
{
  "enabled": true,
  "idle_time": 30,
  "message": "Are you still there? Let me know if you need any help."
}

Interruption Settings

Configuration for handling user interruptions:
PropertyTypeDescriptionDefault
enabledbooleanAllow users to interrupt the agenttrue
valuenumberInterruption sensitivity (1-5, higher = more sensitive)3
Sensitivity Levels:
  • 1 - Very low (agent rarely gets interrupted)
  • 2 - Low
  • 3 - Medium (Recommended)
  • 4 - High
  • 5 - Very high (agent easily interrupted)

Voicemail Handling

Configuration for voicemail detection and handling:
PropertyTypeDescriptionDefault
enabledbooleanEnable voicemail detectionfalse
messagestringMessage to leave if voicemail detectedNone
Example:
{
  "enabled": true,
  "message": "Hello, this is Customer Support calling. Please call us back at 1-800-123-4567. Thank you!"
}

Response

Success Response

Status Code: 200 OK
{
  "id": "agent_abc123xyz",
  "agent_name": "Customer Support Agent",
  "config": {
    "prompt": "You are a helpful customer support agent...",
    "voice": {
      "provider": "elevenlabs",
      "voice_id": "RXe6OFmxoC0nlSWpuCDy",
      "model": "eleven_turbo_v2_5"
    },
    "speech_to_text": {
      "provider": "deepgram",
      "model": "nova-2",
      "language": "english"
    },
    "llm": {
      "llm": "gpt-4o",
      "model": "gpt-4o"
    }
  },
  "created_at": 1706745600
}

Error Responses

400 - Bad Request

{
  "detail": "Invalid request body. Missing required field: agent_name"
}
Common causes:
  • Missing required fields (agent_name, prompt, or voice)
  • Invalid data types
  • Invalid provider or model values

401 - Unauthorized

{
  "detail": "Invalid authentication credentials"
}
Common causes:
  • Missing authorization header or token parameter
  • Invalid or expired API key
  • Insufficient permissions

422 - Validation Error

{
  "detail": [
    {
      "loc": ["body", "voice", "provider"],
      "msg": "Invalid voice provider. Must be one of: elevenlabs, openai, deepgram, sarvam",
      "type": "value_error"
    }
  ]
}
Common causes:
  • Invalid enum values (provider, model names)
  • Out of range values (stability, speed, confidence_threshold)
  • Invalid format (timezone, language codes)

500 - Internal Server Error

{
  "detail": "Internal server error"
}

Example Requests

Minimal Request

curl -X POST https://api.yourdomain.com/create-agent \
  -H "Content-Type: application/json" \
  -H "token: your_api_key_here" \
  -d '{
    "agent_name": "Simple Agent",
    "prompt": "You are a helpful assistant.",
    "voice": {
      "provider": "elevenlabs",
      "voice_id": "RXe6OFmxoC0nlSWpuCDy"
    }
  }'

Complete Request with All Features

curl -X POST https://api.yourdomain.com/create-agent \
  -H "Content-Type: application/json" \
  -H "token: your_api_key_here" \
  -d '{
    "agent_name": "Advanced Support Agent",
    "description": "Full-featured customer support agent",
    "prompt": "You are an experienced customer support agent. Be helpful, professional, and empathetic.",
    "timezone": "America/New_York",
    "greeting": "Hello! Thank you for calling. How can I help you today?",
    "session_data_webhook": "https://www.tryunleashx.com/webhooks/session-data",
    "voice": {
      "provider": "elevenlabs",
      "voice_id": "RXe6OFmxoC0nlSWpuCDy",
      "model": "eleven_turbo_v2_5",
      "settings": {
        "stability": 0.5,
        "voice_style": 1,
        "speed": 1.0,
        "speaker_boost": true,
        "similarity_boost": 0.75,
        "tone": "professional",
        "style": "conversational",
        "instruction_sensitivity": "medium"
      }
    },
    "speech_to_text": {
      "provider": "deepgram",
      "model": "nova-2-phonecall",
      "language": "english"
    },
    "llm": {
      "llm": "gpt-4o",
      "model": "gpt-4o"
    },
    "configurations": {
      "confidence_threshold": 0.8,
      "do_not_call_detection": true,
      "agent_terminate_call": {
        "enabled": true,
        "instruction": "End call politely when conversation is complete",
        "message": "Thank you for calling. Have a great day!"
      },
      "inactivity_handling": {
        "enabled": true,
        "idle_time": 30,
        "message": "Are you still there? Let me know if you need help."
      },
      "interruption": {
        "enabled": true,
        "value": 3
      },
      "voicemail": {
        "enabled": true,
        "message": "Hello, this is Customer Support. Please call us back. Thank you!"
      }
    }
  }'

Important Notes

  1. Required Fields: Only agent_name and prompt are required. The voice object is optional — include voice (with provider and voice_id) when you want to configure TTS for the agent. All other fields are optional.
  2. Voice IDs: Get available voice IDs from the List Voices API.
  3. Webhooks: If you provide a session_data_webhook, ensure your endpoint can handle POST requests with session data.
  4. Timezones: Use standard timezone strings (e.g., “America/New_York”, “Europe/London”, “Asia/Tokyo”).
  5. Language Names: Use full language names (e.g., english, hindi, spanish) or region-specific variants (e.g., english_us, english_uk) as shown in the Speech-to-Text section above.
  6. Model Compatibility: Ensure the voice model is compatible with your chosen provider. For example, eleven_turbo_v2_5 only works with ElevenLabs.
  7. Rate Limits: API calls are subject to rate limiting based on your plan. See pricing documentation for details.
  8. Testing: After creating an agent, test it thoroughly before using in production. Use the Make Call API to test your agent.
  9. Attaching phone numbers is necessary to place calls via agents