Quick Run-Through

This article will introduce how to implement a solution for Conversational AI based on the Tencent RTC SDK.

Introduction

The solution is based on the Tencent RTC SDK to call the Tencent RTC service. By invoking the API of conversational AI, it can achieve an extremely low-delay service. This solution provides you with a very flexible integration solution. You can integrate third-party LLM、TTS and STT according to the actual needs of your business to achieve efficient practice effects. In the overall solution, we have made a lot of technical optimizations for real-time voice noise reduction, AI intelligent interruption, and context management, continuously improving user experience.

Architecture Diagram



Business Process Diagram



Integration Guide

Prerequisites

Note:
Contact us to activate Conversational AI service.
2. Create a TTS application and LLM application (third-party services can be used)

I. Integrate the Tencent RTC SDK

First Step: Import the Tencent RTC SDK Into the Project

Second Step: Enter the Tencent RTC Room

Step Three: Publish the Audio Stream

Android&IOS&Flutter
Web&H5
You can call startLocalAudio to enable mic acquisition, which requires you to determine the acquisition mode through the quality parameter. Although this parameter is named quality, it does not mean that the higher the quality, the better. Different business scenarios have the most suitable parameter selection (this parameter's more accurate meaning is scene).
SPEECH mode is recommended for Conversational AI scenarios. In this mode, the SDK audio module focuses on refining the voice signal, filtering out ambient noise to the greatest extent possible, and the audio data also gains better resistance to poor network quality. Therefore, this mode is particularly suitable for scenarios that focus on vocal communication, such as "video calls" and "online meetings".
Android
iOS&Mac
Flutter
// Enable mic acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)
mCloud.startLocalAudio(Tencent RTCCloudDef.Tencent RTC_AUDIO_QUALITY_SPEECH );
self.trtcCloud = [Tencent RTCCloud sharedInstance];
// Enable microphone acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)
[self.trtcCloud startLocalAudio:Tencent RTCAudioQualitySpeech];
// Enable mic acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)
trtcCloud.startLocalAudio(Tencent RTCAudioQuality.speech);
Use the trtc.startLocalAudio() method to enable the microphone and publish to the room.
await trtc.startLocalAudio();
Note:
Conversational AI have high requirements for the noise reduction capability of the audio capture end. For a better experience, it is recommended to enable the AI Denoiser. In addition, we also have a noise reduction model specially trained for Conversational AI. Feel free to contact us through business or submit a ticket.

II. Initiate Conversational AI

Start AI Conversation

Call the StartAIConversation API through the backend to bring the robot into the room and initiate a real-time conversational AI.
Note:
RoomId must be consistent with the RoomId used by the client to enter the room, and the type of room number (digit room ID/string room number) must also be the same (i.e., the robot and user need to be in the same room).
LLMConfig and TTSConfig are both JSON format and need to be correctly configured to successfully initiate a real-time conversational AI.
Currently supported LLMConfig and TTSConfig configuration description:
It is advisable to validate the parameters ofLLMConfigandTTSConfigthrough the following page before making your initial call to the StartAIConversation API, as detailed below:
Note:
If all the above steps are performed correctly, you can now have a conversation with AI!

III. Receive Conversational AI Captions and AI Status

Use the Tencent RTC SDK Receive Custom Messages to listen for callbacks on the client to receive real-time subtitles and AI status data, etc. cmdID is fixed at 1.

Receive Real-Time Subtitles

Message Format
{
"type": 10000, // 10000 indicates the delivery of real-time subtitles.
"sender": "user_a", // The user ID of the speaker.
"receiver": [], // List of receiver user IDs. This message is actually broadcast within the room.
"payload": {
"text":"", // The text recognized by Automatic Speech Recognition (ASR).
"translation_text":"", // The translated text.
"start_time":"00:00:01", // The start time of this sentence.
"end_time":"00:00:02", // The end time of this sentence.
"roundid": "xxxxx", // A unique identifier for a single conversation round.
"end": true // If true, it indicates this is a complete sentence.
}
}

Receive Chatbot Status

Message Format
{
"type": 10001, // Chatbot status.
"sender": "user_a", // The user ID of the sender, which represents the chatbot's ID in this case.
"receiver": [], // List of receiver user IDs. This message is actually broadcast within the room.
"payload": {
"roundid": "xxx", // A unique identifier for a single conversation round.
"timestamp": 123,
"state": 1, // 1 Listening 2 Thinking 3 Speaking 4 Interrupted
}
}


Sample Code

Android
iOS
Web&H5
@Override
public void onRecvCustomCmdMsg(String userId, int cmdID, int seq, byte[] message) {
String data = new String(message, StandardCharsets.UTF_8);
try {
JSONObject jsonData = new JSONObject(data);
Log.i(TAG, String.format("receive custom msg from %s cmdId: %d seq: %d data: %s", userId, cmdID, seq, data));
} catch (JSONException e) {
Log.e(TAG, "onRecvCustomCmdMsg err");
throw new RuntimeException(e);
}
}
func onRecvCustomCmdMsgUserId(_ userId: String, cmdID: Int, seq: UInt32, message: Data) {
if cmdID == 1 {
do {
if let jsonObject = try JSONSerialization.jsonObject(with: message, options: []) as? [String: Any] {
print("Dictionary: \(jsonObject)")
// handleMessage(jsonObject)
} else {
print("The data is not a dictionary.")
}
} catch {
print("Error parsing JSON: \(error)")
}
}
}
trtcClient.on(Tencent RTC.EVENT.CUSTOM_MESSAGE, (event) => {
let data = new TextDecoder().decode(event.data);
let jsonData = JSON.parse(data);
console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);
if (jsonData.type == 10000 && jsonData.payload.end == false) {
// Subtitle intermediate state
} else if (jsonData.type == 10000 && jsonData.payload.end == true) {
// That's all for this sentence.
}
});

IV. Send Custom Messages

Custom messages of Tencent RTC are uniformly sent via the client, cmdID is fixed at 2.
You can skip the ASR process by sending custom text and communicate directly with the AI Service through text .
{
"type": 20000, // Send custom text message on the client side
"sender": "user_a", // The user ID of the sender. The server will check if this user ID is valid.
"receiver": ["user_bot"], // List of receiver user IDs. Only the chatbot user ID needs to be specified. The server will validate if this user ID is valid.
"payload": {
"id": "uuid", // Message ID, can use UUID; used for troubleshooting.
"message": "xxx", // Message content.
"timestamp": 123 // Timestamp, used for troubleshooting.
}
}
Interruption can be achieved by sending an interruption signal.
{
"type": 20001, // Send interruption signal on the terminal
"sender": "user_a", // The user ID of the sender. The server will check if this user ID is valid.
"receiver": ["user_bot"], // List of receiver user IDs. Only the chatbot user ID needs to be specified. The server will validate if this user ID is valid.
"payload": {
"id": "uuid", // Message ID, can use UUID; used for troubleshooting.
"timestamp": 123 // Timestamp, used for troubleshooting.
}
}

Sample Code

Android
iOS
Web&H5
public void sendInterruptCode() {
try {
int cmdID = 0x2;

long time = System.currentTimeMillis();
String timeStamp = String.valueOf(time/1000);
JSONObject payLoadContent = new JSONObject();
payLoadContent.put("timestamp", timeStamp);
payLoadContent.put("id", String.valueOf(GenerateTestUserSig.SDKAPPID) + "_" + mRoomId);

String[] receivers = new String[]{robotUserId};

JSONObject interruptContent = new JSONObject();
interruptContent.put("type", AICustomMsgType.AICustomMsgType_Send_Interrupt_CMD);
interruptContent.put("sender", mUserId);
interruptContent.put("receiver", new JSONArray(receivers));
interruptContent.put("payload", payLoadContent);

String interruptString = interruptContent.toString();
byte[] data = interruptString.getBytes("UTF-8");

Log.i(TAG, "sendInterruptCode :" + interruptString);

mTencent RTCCloud.sendCustomCmdMsg(cmdID, data, true, true);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
} catch (JSONException e) {
throw new RuntimeException(e);
}
}

@objc func interruptAi() {
print("interruptAi")
let cmdId = 0x2
let timestamp = Int(Date().timeIntervalSince1970 * 1000)
let payload = [
"id": userId + "_\(roomId)" + "_\(timestamp)", // Message ID, can use UUID; used for troubleshooting.
"timestamp": timestamp // Timestamp, used for troubleshooting.
] as [String : Any]
let dict = [
"type": 20001,
"sender": userId,
"receiver": [botId],
"payload": payload
] as [String : Any]
do {
let jsonData = try JSONSerialization.data(withJSONObject: dict, options: [])
self.trtcCloud.sendCustomCmdMsg(cmdId, data: jsonData, reliable: true, ordered: true)
} catch {
print("Error serializing dictionary to JSON: \(error)")
}
}
const message = {
"type": 20001,
"sender": "user_a",
"receiver": ["user_bot"],
"payload": {
"id": "uuid",
"timestamp": 123
}
};

trtc.sendCustomMessage({
cmdId: 2,
data: new TextEncoder().encode(JSON.stringify(message)).buffer
});

Stop the Conversational AI and Exit the Tencent RTC Room

1. Server stops the conversational AI task.
Call the StopAIConversation API through the backend and terminate this conversation.
2. The client exits the Tencent RTC room. It is recommended to refer to Exit the Room (Android, iOS, Windows, and Mac).

VI. Other Features

Other Server-Side APIs

Query conversational AI task status:DescribeAIConversation
You can query the conversational AI task status. There are four values:
1.1 Idle indicates that the task has not started.
1.2 Preparing indicates that the task is in preparation.
1.3 InProgress indicates that the task is running.
1.4 Stopped indicates that the task has been stopped and resources are being cleaned up.
Update conversational AI startup parameter:UpdateAIConversation
During the conversation, the timbre of TTS can be dynamically updated.
Control conversational AI task:ControlAIConversation
When you want the chatbot to actively broadcast text, you can use this API.

Enable Server-Side Callback

Note:
The callback address is set in the Tencent RTC console for conversational AI callbacks.
Can be used in conjunction with Tencent RTC room and media callback to enrich features.

III. Introduction To Other Advanced Features

Feature
Operation Guide
Intelligent interruption
Implement context management
Call the function