Cartesia
Cartesia is purpose-built for real-time voice AI, offering ultra-low latency streaming TTS with natural-sounding output. Its Sonic model supports multilingual synthesis and voice mixing. An excellent choice when end-to-end latency is critical — such as interactive voice agents where every millisecond counts.
Usage
To use Cartesia as the TTS engine, pass the following JSON in the
TTSConfig field of the StartAIConversation API:// json — TTSConfig{"TTSType": "cartesia","Model": "sonic-3-2026-01-12","APIKey": "<your_cartesia_api_key>","VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0"}
Parameter reference
Field | Type | Required | Description |
TTSType | String | Yes | Must be "cartesia". |
Model | String | Yes | |
APIKey | String | Yes | |
VoiceId | String | Yes |