AI対話リアルタイム字幕
Tencent Real-Time Communication (TRTC) AI conversation provides the capability to display real-time captions. Real-time captions are sent through TRTC's custom messages, enabling millisecond-level synchronization with audio conversation.
Characteristics
1. Real-time: Captions are synchronized with audio conversation with a millisecond-level latency.
2. Flexible: Custom message formats are used, making it easy to integrate and extend.
Message Format
Real-time caption messages are in the format of JSON. The specific fields are as follows:
Field | Type | Description |
type | Number | Message type. 10000 indicates real-time captions. |
sender | String | The userID of the speaker. |
receiver | Array | The list of receiver userIDs. This message is actually broadcast within a room. |
payload | Object | Message payload, containing detailed caption information. |
The payload object contains the following fields:
Field | Type | Description |
text | String | Original text from Automatic Speech Recognition (ASR). |
start_time | String | Start time of this sentence. Format: "HH:MM:SS". |
end_time | String | End time of this sentence. Format: "HH:MM:SS". |
roundid | String | Unique ID for a round of conversation. |
end | Boolean | If it is true, it indicates that this is a complete sentence. |
Sample Message
{ "type": 10000, "sender": "user_a", "receiver": [], "payload": { "text": "Hello. Nice to meet you.", "start_time": "00:00:01", "end_time": "00:00:03", "roundid": "conversation_123456", "end": true } }
Implementation Notes
1. Message processing: The receiver needs to correctly parse the JSON message and identify real-time caption messages based on the type field.
2. Time synchronization: start_time and end_time are used to ensure the captions align correctly with the audio.
3. Conversation segmentation: The end field is used to determine whether a sentence has ended. It can be used for interface updates or for storing complete conversations.
Parsing Web SDK Custom Messages
trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => { let data = new TextDecoder().decode(event.data); let jsonData = JSON.parse(data); console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`); if (jsonData.type == 10000 && jsonData.payload.end == false) { // Intermediate state of captions } else if (jsonData.type == 10000 && jsonData.payload.end == true) { // End of a sentence } });