Text-To-Speech Configuration

This article mainly introduces how to configure the TTSConfig parameter in the StartAIConversation API.

Supported Configurations

Please use your own third-party account for TTS parameters.

Tencent TTS

{
"TTSType": "tencent", // String TTS type, currently supports "tencent" and "minixmax", support for other vendors is ongoing.
"AppId": "Your Application ID", // String, required
"SecretId": "Your Secret ID", // String, required
"SecretKey": "Your Secret Key", // String, required
"VoiceType": 101001, // Integer, required, voice ID, including standard timbre and premium timbre. The premium timbre has higher realism and a different price from the standard timbre. See the Text To Speech billing overview for details. For the complete list of voice IDs, see the Text To Speech timbre list.
"Speed": 1.25, // Integer, optional, speech speed, range: [-2, 6], corresponding to different speech speeds: -2: represents 0.6 times, -1: represents 0.8 times, 0: represents 1.0 times (default), 1: represents 1.2 times, 2: represents 1.5 times, 6: represents 2.5 times. If you need a more detailed speech speed, you can retain 2 decimal places, such as 0.5/1.25/2.81, etc. For the conversion between parameter values and actual speech speed, see Speech Speed Conversion.
"Volume": 5, // Integer, optional, volume level, range: [0, 10], corresponding to 11 levels of volume, default value is 0, representing normal volume.
"PrimaryLanguage": 1, // Integer, optional primary language 1 - Chinese (default) 2 - English 3 - Japanese
"FastVoiceType": "xxxx" // optional parameter, parameter for Voice Reproduce
}

Minimax TTS

{
"TTSType": "minimax", // String, TTS type,
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei",
"Speed": 1.2
}
See: MiniMax
For rate limits, see: MiniMax. It may cause Lag in response.
API
T2A V2 (Speech Generation)
T2A Pro (Speech Generation)
T2A (Speech Generation)
T2A Stream (Streaming Speech Generation)
T2A Stream (Streaming Speech Generation)

Model
speech-01-turbo, speech-01-240228, speech-01-turbo-240228
speech-01, speech-02
speech-01, speech-02
speech-01
speech-01
Limit Type
RPM
RPM
RPM
RPM
CONN (Maximum Number of Parallel Running Tasks)
Free plan
3
3
3
3
1
Paid plan
20
20
20
20
3

Azure TTS

{
"TTSType": "azure", // required: String TTS type
"SubscriptionKey": "xxxxxxxx", // required: String Subscription Key
"Region": "chinanorth3", // required: String Region of subscription
"VoiceName": "zh-CN-XiaoxiaoNeural", // required: String Voice name is required
"Language": "zh-CN", // required: String Language for synthesis
"Rate": 1 // optional: float speech speed 0.5–2 Default is 1
}

Cartesia TTS

{
"TTSType": "cartesia", // required: String TTS type
"Model": "sonic-multilingual", // required model
"APIKey": "eyxxxx", // required: obtained API key
"VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // required sound id https://play.cartesia.ai/
}
See: Cartesia TTS

ElevenLabs TTS

{
"TTSType": "elevenlabs", // required: String TTS type
"Model": "eleven_turbo_v2_5", // required: model type
"APIKey": "eyxxxx",
"VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // Voice type https://elevenlabs.io/docs/api-reference/get-voices
}

Custom TTS

{
"TTSType": "custom", // required: String
"APIKey": "ApiKey", // required: String for authentication
"APIUrl": "http://0.0.0.0:8080/stream-audio" // required: String, TTS API URL
"AudioFormat": "wav", // String, optional, expected output audio format, such as mp3, ogg_opus, pcm, wav, default is wav, currently only supports pcm and wav
"SampleRate": 16000, // Integer, optional, audio sample rate, default is 16000 (16kHz), recommended value is 16000
"AudioChannel": 1, // Integer, optional, audio channel quantity, value: 1 or 2, default is 1
}
For specific protocol specifications, see Customize TTS protocol.