• 서비스
  • 가격
  • 리소스
  • 기술지원
이 페이지는 현재 영어로만 제공되며 한국어 버전은 곧 제공될 예정입니다. 기다려 주셔서 감사드립니다.
Feedback

Custom

Bring your own TTS engine by implementing TRTC's custom streaming protocol. This option gives you full control over the synthesis pipeline — use your proprietary model, on-premise deployment, or any third-party service not natively supported. Choose this if you have specialized voice requirements or need to integrate an in-house TTS solution.

Usage

To use a custom TTS engine, pass the following JSON in the TTSConfig field of the StartAIConversation API. Your TTS service must implement the TRTC custom TTS streaming protocol:
// json — TTSConfig
{
"TTSType": "custom",
"APIKey": "<your_api_key>",
"APIUrl": "http://0.0.0.0:8080/stream-audio",
"AudioFormat": "wav",
"SampleRate": 16000,
"AudioChannel": 1
}
For the complete TTSConfig parameter reference, see the Text-to-Speech Configuration.

Parameter reference

Field
Type
Required
Description
TTSType
String
Yes
Fixed value: "custom".
APIKey
String
Yes
API key for authentication with your TTS service.
APIUrl
String
Yes
Your TTS service endpoint URL.
AudioFormat
String
No
Output audio format. Currently supports: pcm, wav. Default: wav.
SampleRate
Integer
No
Audio sample rate. Default: 16000 (16 kHz). Recommended: 16000.
AudioChannel
Integer
No
Number of audio channels. 1 (mono) or 2 (stereo). Default: 1.

Custom TTS streaming protocol

Your TTS service must implement the following HTTP streaming protocol.

Calling Method

POST: http://xxxxxxxxxxxx/api/v1/tts/stream

Example

HTTP Request Header

Content-Type: application/json;charset=UTF-8
X-Task-Id: task_id_value
X-Rquest-Id: request_id
X-Sdk-App-Id: SdkAppId
X-User-Id: UserId
X-Room-Id: RoomId
X-Room-Id-Type: "0"
Authorization: Bearer "API-KEY"

Request

{
"Text": "Hello, world! This is a test for the streaming TTS API.",
"Format": "wav",
"SampleRate": 16000,
"Channel": 1
}

HTTP Request Header

Field
Description
Content-Type
application/json
charset
UTF-8
X-Task-Id
ID of a conversation task.
X-Rquest-Id
ID of a request. Retries will carry the same RequestID.
X-Sdk-App-Id
AppId for the SDK.
X-User-Id
User ID.
X-Room-Id
Room ID.
X-Room-Id-Type
Room ID type. 0: numeric room number, 1: string room number.
Authorization
Authentication, in the format of Bearer "API-KEY".

Request

Parameter
Required
Type
Description
Text
Yes
String
Speech text.
Format
No
String
The desired audio format for output, such as mp3, ogg_opus, pcm, and wav. The default value is wav. Only pcm and wav are currently supported.
SampleRate
No
Integer
Audio sampling rate. The default value is 16000 (16k), with a recommended value of 16000.
Channel
No
Integer
Audio channel. Valid values: 1, 2. The default value is 1.

Response

The value of Content-Type needs to be used to determine whether the TTS is successful.
If successful, binary speech is returned. The Content-Type for different audio formats is as follows, and Transfer-Encoding: chunked should be set in the HTTP response header.
Audio Format
Content-Type
mp3
audio/mpeg
ogg_opus
audio/ogg
pcm
audio/L16
wav
audio/wav
If failed, the JSON result is returned with the header information: Content-type: application/json. The response is:
{
"error": {
"code": "ERROR_CODE",
"message": "A description of the error"
}
}