Cartesia

Cartesia is purpose-built for real-time voice AI, offering ultra-low latency streaming TTS with natural-sounding output. Its Sonic model supports multilingual synthesis and voice mixing. An excellent choice when end-to-end latency is critical — such as interactive voice agents where every millisecond counts.
Usage
To use Cartesia as the TTS engine, pass the following JSON in the TTSConfig field of the StartAIConversation API:
// json — TTSConfig
{
  "TTSType": "cartesia",
  "Model": "sonic-3-2026-01-12",
  "APIKey": "<your_cartesia_api_key>",
  "VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0"
}
For the complete TTSConfig parameter reference, see the Text-to-Speech Configuration.
Parameter reference
Field
Type
Required
Description
TTSType
String
Yes
Must be "cartesia".
Model
String
Yes
Cartesia model name (e.g., sonic-3-2026-01-12). See Cartesia Models.
APIKey
String
Yes
Your Cartesia API key. Obtain from Cartesia Console.
VoiceId
String
Yes
Voice ID. Browse voices at Cartesia Voice Library.
For more details on Cartesia, see the Cartesia documentation.
Next step: StartAIConversation API Reference﻿

Field	Type	Required	Description
`TTSType`	String	Yes	Must be `"cartesia"`.
`Model`	String	Yes	Cartesia model name (e.g., `sonic-3-2026-01-12`). See Cartesia Models.
`APIKey`	String	Yes	Your Cartesia API key. Obtain from Cartesia Console.
`VoiceId`	String	Yes	Voice ID. Browse voices at Cartesia Voice Library.