Real-time captions

Tencent Real-Time Communication (TRTC) AI conversation provides the capability to display real-time captions. Real-time captions are sent through TRTC's custom messages, enabling millisecond-level synchronization with audio conversation.
Characteristics
1. Real-time: Captions are synchronized with audio conversation with a millisecond-level latency.
2. Flexible: Custom message formats are used, making it easy to integrate and extend.
Message Format
Real-time caption messages are in the format of JSON. The specific fields are as follows:
Field
Type
Description
type
Number
Message type. 10000 indicates real-time captions.
sender
String
The userID of the speaker.
receiver
Array
The list of receiver userIDs. This message is actually broadcast within a room.
payload
Object
Message payload, containing detailed caption information.
The payload object contains the following fields:
Field
Type
Description
text
String
Original text from Automatic Speech Recognition (ASR).
start_time
String
Start time of this sentence. Format: "HH:MM:SS".
end_time
String
End time of this sentence. Format: "HH:MM:SS".
roundid
String
Unique ID for a round of conversation.
end
Boolean
If it is true, it indicates that this is a complete sentence.
Sample Message
{
  "type": 10000,
  "sender": "user_a",
  "receiver": [],
  "payload": {
    "text": "Hello. Nice to meet you.",
    "start_time": "00:00:01",
    "end_time": "00:00:03",
    "roundid": "conversation_123456",
    "end": true
  }
}

Implementation Notes
1. Message processing: The receiver needs to correctly parse the JSON message and identify real-time caption messages based on the type field.
2. Time synchronization: start_time and end_time are used to ensure the captions align correctly with the audio.
3. Conversation segmentation: The end field is used to determine whether a sentence has ended. It can be used for interface updates or for storing complete conversations.
Parsing Web SDK Custom Messages
    trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => {
        let data = new TextDecoder().decode(event.data);
        let jsonData = JSON.parse(data);
        console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);
        
        if (jsonData.type == 10000 && jsonData.payload.end == false) {
            // Intermediate state of captions
        } else if (jsonData.type == 10000 && jsonData.payload.end == true) {
           // End of a sentence
        }
    });
﻿
﻿
﻿
﻿

Field	Type	Description
type	Number	Message type. 10000 indicates real-time captions.
sender	String	The userID of the speaker.
receiver	Array	The list of receiver userIDs. This message is actually broadcast within a room.
payload	Object	Message payload, containing detailed caption information.