Text-To-Speech Configuration

This document describes how to configure the TTSConfig parameter of the StartAIConversation API.
Supported TTSConfig Configurations
Use your third-party account to configure TTS parameters.
Azure TTS
{
    "TTSType": "azure", // Required. TTS type in string format.
    "SubscriptionKey": "xxxxxxxx", // Required. Subscription key in string format.
    "Region": "southeastasia",  // Required. Subscription region in string format.
    "VoiceName": "en-US-AmandaMultilingualNeural", // Required. Timbre name in string format.
    "Language": "en-US", // Required. Language for TTS in string format. 
    "Rate": 1 // Optional. Speech speed in float format. Value range: 0.5–2. Default value: 1.
}
See Azure language and voice support﻿
Cartesia TTS
{
    "TTSType": "cartesia", // Required. TTS type in string format. 
    "Model": "sonic-multilingual", // Required. Model.
    "APIKey": "eyxxxx", // Required. Obtained API key.
    "VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // Required. Timbre ID. Visit https://play.cartesia.ai/ for details.
}
See Cartesia TTS﻿
ElevenLabs TTS
{
    "TTSType": "elevenlabs", // Required. TTS type in string format. 
    "Model": "eleven_turbo_v2_5", // Required. Model.
    "APIKey": "eyxxxx",
    "VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // Timbre ID. Visit https://elevenlabs.io/docs/api-reference/get-voices for details.
}
See: ElevenLabs TTS﻿
Tencent TTS
{ 
    "TTSType": "tencent", // TTS type in string format. Valid values: "tencent" and "minixmax". Other vendors will be supported in future versions.
    "AppId": "Your application ID", // Required. The value is in string format.
    "SecretId": "Your key ID", // Required. The value is in string format.
    "SecretKey": "Your key", // Required. The value is in string format.
    "VoiceType": 101001, // Required. Timbre ID in integer format. Standard timbre and premium timbre are supported. The premium timbre is more real, and its price differs from that of the standard timbre. See the TTS billing overview for details. For the complete list of timbre IDs, see the TTS timbre list.
    "Speed": 1.25, // Optional. Speech speed in integer format. Value range: [-2, 6], corresponding to different speech speeds. -2: 0.6 times; -1: 0.8 times; 0: 1.0 times (default value); 1: 1.2 times; 2: 1.5 times; 6: 2.5 times. If you need a more fine-grained speech speed, the value can be accurate to 2 decimal places, such as 0.5, 1.25, and 2.81. For the conversion between the parameter value and actual speech speed, see Speech Speed Conversion.
    "Volume": 5, // Optional. Volume level in integer format. Value range: [0, 10], corresponding to 11 volume levels. The default value is 0, representing the normal volume.
    "PrimaryLanguage": 1, // Optional. Primary language in integer format. 1: Chinese (default value); 2: English; 3: Japanese.
    "FastVoiceType": "xxxx"   // Optional. Parameter for fast voice cloning.
}
See TTS Timbre List - Document Center - Tencent Cloud﻿
MiniMax TTS
{
    "TTSType": "minimax", // TTS type in string format. 
    "Model": "speech-01-turbo",
    "APIUrl": "https://api.minimax.chat/v1/t2a_v2",
    "APIKey": "eyxxxx",
    "GroupId": "181000000000000",
    "VoiceType":"female-tianmei",
    "Speed": 1.2
}
See MiniMax﻿
For rate limits, see MiniMax. Rate limits may cause response lag.
API
T2A V2 (Speech generation)
T2A Pro (Speech generation)
T2A (Speech generation)
T2A Stream (Streaming speech generation)
T2A Stream (Streaming speech generation)
﻿
Model
speech-01-turbo, speech-01-240228, speech-01-turbo-240228
speech-01, speech-02
speech-01, speech-02
speech-01
speech-01
Customer type/Limit type
RPM
RPM
RPM
RPM
CONN (maximum number of parallel tasks)
Users using a free account
3
3
3
3
1
Users using a paid account
20
20
20
20
3
Custom TTS
{
  "TTSType": "custom", // Required. The value is in string format.
  "APIKey": "ApiKey", // Required. API key in string format for authentication.
  "APIUrl": "http://0.0.0.0:8080/stream-audio" // Required. TTS API URL in string format.
  "AudioFormat": "wav", // Optional. Expected output audio format in string format. For example, mp3, ogg_opus, pcm, and wav. Default value: wav. Currently, only pcm and wav are supported.
  "SampleRate": 16000,  // Optional. Audio sampling rate in integer format. Default value: 16000 (16 kHz). Recommended value: 16000.
  "AudioChannel": 1,    // Optional. Number of audio channels in integer format. Valid values: 1 and 2. Default value: 1.
}
For specific protocol specifications, see Customize TTS protocol.
API	T2A V2 (Speech generation)	T2A Pro (Speech generation)	T2A (Speech generation)	T2A Stream (Streaming speech generation)	T2A Stream (Streaming speech generation)
Model	speech-01-turbo, speech-01-240228, speech-01-turbo-240228	speech-01, speech-02	speech-01, speech-02	speech-01	speech-01
Customer type/Limit type	RPM	RPM	RPM	RPM	CONN (maximum number of parallel tasks)
Users using a free account	3	3	3	3	1
Users using a paid account	20	20	20	20	3