이 페이지는 현재 영어로만 제공되며 한국어 버전은 곧 제공될 예정입니다. 기다려 주셔서 감사드립니다.

텍스트 음성 변환 구성

This document describes how to configure the TTSConfig parameter of the StartAIConversation API.

Supported TTSConfig Configurations

Use your third-party account to configure TTS parameters.

Azure TTS

{
"TTSType": "azure", // Required. TTS type in string format.
"SubscriptionKey": "xxxxxxxx", // Required. Subscription key in string format.
"Region": "southeastasia", // Required. Subscription region in string format.
"VoiceName": "en-US-AmandaMultilingualNeural", // Required. Timbre name in string format.
"Language": "en-US", // Required. Language for TTS in string format.
"Rate": 1 // Optional. Speech speed in float format. Value range: 0.5–2. Default value: 1.
}

Cartesia TTS

{
"TTSType": "cartesia", // Required. TTS type in string format.
"Model": "sonic-multilingual", // Required. Model.
"APIKey": "eyxxxx", // Required. Obtained API key.
"VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // Required. Timbre ID. Visit https://play.cartesia.ai/ for details.
}

ElevenLabs TTS

{
"TTSType": "elevenlabs", // Required. TTS type in string format.
"Model": "eleven_turbo_v2_5", // Required. Model.
"APIKey": "eyxxxx",
"VoiceId": "eda5bbff-1ff1-4886-8ef1-4e69a77640a0" // Timbre ID. Visit https://elevenlabs.io/docs/api-reference/get-voices for details.
}

Tencent TTS

{
"TTSType": "tencent", // TTS type in string format. Valid values: "tencent" and "minixmax". Other vendors will be supported in future versions.
"AppId": "Your application ID", // Required. The value is in string format.
"SecretId": "Your key ID", // Required. The value is in string format.
"SecretKey": "Your key", // Required. The value is in string format.
"VoiceType": 101001, // Required. Timbre ID in integer format. Standard timbre and premium timbre are supported. The premium timbre is more real, and its price differs from that of the standard timbre. See the TTS billing overview for details. For the complete list of timbre IDs, see the TTS timbre list.
"Speed": 1.25, // Optional. Speech speed in integer format. Value range: [-2, 6], corresponding to different speech speeds. -2: 0.6 times; -1: 0.8 times; 0: 1.0 times (default value); 1: 1.2 times; 2: 1.5 times; 6: 2.5 times. If you need a more fine-grained speech speed, the value can be accurate to 2 decimal places, such as 0.5, 1.25, and 2.81. For the conversion between the parameter value and actual speech speed, see Speech Speed Conversion.
"Volume": 5, // Optional. Volume level in integer format. Value range: [0, 10], corresponding to 11 volume levels. The default value is 0, representing the normal volume.
"PrimaryLanguage": 1, // Optional. Primary language in integer format. 1: Chinese (default value); 2: English; 3: Japanese.
"FastVoiceType": "xxxx" // Optional. Parameter for fast voice cloning.
}

MiniMax TTS

{
"TTSType": "minimax", // TTS type in string format.
"Model": "speech-01-turbo",
"APIUrl": "https://api.minimax.chat/v1/t2a_v2",
"APIKey": "eyxxxx",
"GroupId": "181000000000000",
"VoiceType":"female-tianmei",
"Speed": 1.2
}
See MiniMax
For rate limits, see MiniMax. Rate limits may cause response lag.
API
T2A V2 (Speech generation)
T2A Pro (Speech generation)
T2A (Speech generation)
T2A Stream (Streaming speech generation)
T2A Stream (Streaming speech generation)

Model
speech-01-turbo, speech-01-240228, speech-01-turbo-240228
speech-01, speech-02
speech-01, speech-02
speech-01
speech-01
Customer type/Limit type
RPM
RPM
RPM
RPM
CONN (maximum number of parallel tasks)
Users using a free account
3
3
3
3
1
Users using a paid account
20
20
20
20
3

Custom TTS

{
"TTSType": "custom", // Required. The value is in string format.
"APIKey": "ApiKey", // Required. API key in string format for authentication.
"APIUrl": "http://0.0.0.0:8080/stream-audio" // Required. TTS API URL in string format.
"AudioFormat": "wav", // Optional. Expected output audio format in string format. For example, mp3, ogg_opus, pcm, and wav. Default value: wav. Currently, only pcm and wav are supported.
"SampleRate": 16000, // Optional. Audio sampling rate in integer format. Default value: 16000 (16 kHz). Recommended value: 16000.
"AudioChannel": 1, // Optional. Number of audio channels in integer format. Valid values: 1 and 2. Default value: 1.
}
For specific protocol specifications, see Customize TTS protocol.