Real-time captions

Tencent Real-Time Communication (TRTC) AI conversation provides the capability to display real-time captions. Real-time captions are sent through TRTC's custom messages, enabling millisecond-level synchronization with audio conversation.

Characteristics

1. Real-time: Captions are synchronized with audio conversation with a millisecond-level latency.
2. Flexible: Custom message formats are used, making it easy to integrate and extend.

Message Format

Real-time caption messages are in the format of JSON. The specific fields are as follows:
Field
Type
Description
type
Number
Message type. 10000 indicates real-time captions.
sender
String
The userID of the speaker.
receiver
Array
The list of receiver userIDs. This message is actually broadcast within a room.
payload
Object
Message payload, containing detailed caption information.
The payload object contains the following fields:
Field
Type
Description
text
String
Original text from Automatic Speech Recognition (ASR).
start_time
String
Start time of this sentence. Format: "HH:MM:SS".
end_time
String
End time of this sentence. Format: "HH:MM:SS".
roundid
String
Unique ID for a round of conversation.
end
Boolean
If it is true, it indicates that this is a complete sentence.

Sample Message

{ "type": 10000, "sender": "user_a", "receiver": [], "payload": { "text": "Hello. Nice to meet you.", "start_time": "00:00:01", "end_time": "00:00:03", "roundid": "conversation_123456", "end": true } }

Implementation Notes

1. Message processing: The receiver needs to correctly parse the JSON message and identify real-time caption messages based on the type field.
2. Time synchronization: start_time and end_time are used to ensure the captions align correctly with the audio.
3. Conversation segmentation: The end field is used to determine whether a sentence has ended. It can be used for interface updates or for storing complete conversations.

Parsing Web SDK Custom Messages

trtcClient.on(TRTC.EVENT.CUSTOM_MESSAGE, (event) => { let data = new TextDecoder().decode(event.data); let jsonData = JSON.parse(data); console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`); if (jsonData.type == 10000 && jsonData.payload.end == false) { // Intermediate state of captions } else if (jsonData.type == 10000 && jsonData.payload.end == true) { // End of a sentence } });