Custom
Bring your own TTS engine by implementing TRTC's custom streaming protocol. This option gives you full control over the synthesis pipeline — use your proprietary model, on-premise deployment, or any third-party service not natively supported. Choose this if you have specialized voice requirements or need to integrate an in-house TTS solution.
Usage
To use a custom TTS engine, pass the following JSON in the
TTSConfig field of the StartAIConversation API. Your TTS service must implement the TRTC custom TTS streaming protocol:// json — TTSConfig{"TTSType": "custom","APIKey": "<your_api_key>","APIUrl": "http://0.0.0.0:8080/stream-audio","AudioFormat": "wav","SampleRate": 16000,"AudioChannel": 1}
Parameter reference
Field | Type | Required | Description |
TTSType | String | Yes | Fixed value: "custom". |
APIKey | String | Yes | API key for authentication with your TTS service. |
APIUrl | String | Yes | Your TTS service endpoint URL. |
AudioFormat | String | No | Output audio format. Currently supports: pcm, wav. Default: wav. |
SampleRate | Integer | No | Audio sample rate. Default: 16000 (16 kHz). Recommended: 16000. |
AudioChannel | Integer | No | Number of audio channels. 1 (mono) or 2 (stereo). Default: 1. |
Custom TTS streaming protocol
Your TTS service must implement the following HTTP streaming protocol.
Calling Method
POST: http://xxxxxxxxxxxx/api/v1/tts/stream
Example
HTTP Request Header
Content-Type: application/json;charset=UTF-8X-Task-Id: task_id_valueX-Rquest-Id: request_idX-Sdk-App-Id: SdkAppIdX-User-Id: UserIdX-Room-Id: RoomIdX-Room-Id-Type: "0"Authorization: Bearer "API-KEY"
Request
{"Text": "Hello, world! This is a test for the streaming TTS API.","Format": "wav","SampleRate": 16000,"Channel": 1}
HTTP Request Header
Field | Description |
Content-Type | application/json |
charset | UTF-8 |
X-Task-Id | ID of a conversation task. |
X-Rquest-Id | ID of a request. Retries will carry the same RequestID. |
X-Sdk-App-Id | AppId for the SDK. |
X-User-Id | User ID. |
X-Room-Id | Room ID. |
X-Room-Id-Type | Room ID type. 0: numeric room number, 1: string room number. |
Authorization | Authentication, in the format of Bearer "API-KEY". |
Request
Parameter | Required | Type | Description |
Text | Yes | String | Speech text. |
Format | No | String | The desired audio format for output, such as mp3, ogg_opus, pcm, and wav. The default value is wav. Only pcm and wav are currently supported. |
SampleRate | No | Integer | Audio sampling rate. The default value is 16000 (16k), with a recommended value of 16000. |
Channel | No | Integer | Audio channel. Valid values: 1, 2. The default value is 1. |
Response
The value of Content-Type needs to be used to determine whether the TTS is successful.
If successful, binary speech is returned. The Content-Type for different audio formats is as follows, and Transfer-Encoding: chunked should be set in the HTTP response header.
Audio Format | Content-Type |
mp3 | audio/mpeg |
ogg_opus | audio/ogg |
pcm | audio/L16 |
wav | audio/wav |
If failed, the JSON result is returned with the header information: Content-type: application/json. The response is:
{"error": {"code": "ERROR_CODE","message": "A description of the error"}}