Quick Run-Through
This article will introduce how to implement a solution for Conversational AI based on the Tencent RTC SDK.
Introduction
The solution is based on the Tencent RTC SDK to call the Tencent RTC service. By invoking the API of conversational AI, it can achieve an extremely low-delay service. This solution provides you with a very flexible integration solution. You can integrate third-party LLM、TTS and STT according to the actual needs of your business to achieve efficient practice effects. In the overall solution, we have made a lot of technical optimizations for real-time voice noise reduction, AI intelligent interruption, and context management, continuously improving user experience.
Architecture Diagram

Business Process Diagram

Integration Guide
Prerequisites
Note:
2. Create a TTS application and LLM application (third-party services can be used)
I. Integrate the Tencent RTC SDK
First Step: Import the Tencent RTC SDK Into the Project
Second Step: Enter the Tencent RTC Room
Step Three: Publish the Audio Stream
You can call startLocalAudio to enable mic acquisition, which requires you to determine the acquisition mode through the quality parameter. Although this parameter is named quality, it does not mean that the higher the quality, the better. Different business scenarios have the most suitable parameter selection (this parameter's more accurate meaning is scene).
SPEECH mode is recommended for Conversational AI scenarios. In this mode, the SDK audio module focuses on refining the voice signal, filtering out ambient noise to the greatest extent possible, and the audio data also gains better resistance to poor network quality. Therefore, this mode is particularly suitable for scenarios that focus on vocal communication, such as "video calls" and "online meetings".
// Enable mic acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)mCloud.startLocalAudio(Tencent RTCCloudDef.Tencent RTC_AUDIO_QUALITY_SPEECH );
self.trtcCloud = [Tencent RTCCloud sharedInstance];// Enable microphone acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)[self.trtcCloud startLocalAudio:Tencent RTCAudioQualitySpeech];
// Enable mic acquisition and set the current scenario to: Voice Mode (strong in noise suppression, resistance to network conditions)trtcCloud.startLocalAudio(Tencent RTCAudioQuality.speech);
await trtc.startLocalAudio();
Note:
Conversational AI have high requirements for the noise reduction capability of the audio capture end. For a better experience, it is recommended to enable the AI Denoiser. In addition, we also have a noise reduction model specially trained for Conversational AI. Feel free to contact us through business or submit a ticket.
II. Initiate Conversational AI
Start AI Conversation
Call the StartAIConversation API through the backend to bring the robot into the room and initiate a real-time conversational AI.
Note:
RoomId
must be consistent with the RoomId
used by the client to enter the room, and the type of room number (digit room ID/string room number) must also be the same (i.e., the robot and user need to be in the same room).LLMConfig
and TTSConfig
are both JSON format and need to be correctly configured to successfully initiate a real-time conversational AI.Currently supported
LLMConfig
and TTSConfig
configuration description:It is advisable to validate the parameters of
LLMConfig
andTTSConfig
through the following page before making your initial call to the StartAIConversation API, as detailed below:Note:
If all the above steps are performed correctly, you can now have a conversation with AI!
III. Receive Conversational AI Captions and AI Status
Use the Tencent RTC SDK Receive Custom Messages to listen for callbacks on the client to receive real-time subtitles and AI status data, etc. cmdID is fixed at 1.
Receive Real-Time Subtitles
Message Format
{"type": 10000, // 10000 indicates the delivery of real-time subtitles."sender": "user_a", // The user ID of the speaker."receiver": [], // List of receiver user IDs. This message is actually broadcast within the room."payload": {"text":"", // The text recognized by Automatic Speech Recognition (ASR)."translation_text":"", // The translated text."start_time":"00:00:01", // The start time of this sentence."end_time":"00:00:02", // The end time of this sentence."roundid": "xxxxx", // A unique identifier for a single conversation round."end": true // If true, it indicates this is a complete sentence.}}
Receive Chatbot Status
Message Format
{"type": 10001, // Chatbot status."sender": "user_a", // The user ID of the sender, which represents the chatbot's ID in this case."receiver": [], // List of receiver user IDs. This message is actually broadcast within the room."payload": {"roundid": "xxx", // A unique identifier for a single conversation round."timestamp": 123,"state": 1, // 1 Listening 2 Thinking 3 Speaking 4 Interrupted}}
Sample Code
@Overridepublic void onRecvCustomCmdMsg(String userId, int cmdID, int seq, byte[] message) {String data = new String(message, StandardCharsets.UTF_8);try {JSONObject jsonData = new JSONObject(data);Log.i(TAG, String.format("receive custom msg from %s cmdId: %d seq: %d data: %s", userId, cmdID, seq, data));} catch (JSONException e) {Log.e(TAG, "onRecvCustomCmdMsg err");throw new RuntimeException(e);}}
func onRecvCustomCmdMsgUserId(_ userId: String, cmdID: Int, seq: UInt32, message: Data) {if cmdID == 1 {do {if let jsonObject = try JSONSerialization.jsonObject(with: message, options: []) as? [String: Any] {print("Dictionary: \(jsonObject)")// handleMessage(jsonObject)} else {print("The data is not a dictionary.")}} catch {print("Error parsing JSON: \(error)")}}}
trtcClient.on(Tencent RTC.EVENT.CUSTOM_MESSAGE, (event) => {let data = new TextDecoder().decode(event.data);let jsonData = JSON.parse(data);console.log(`receive custom msg from ${event.userId} cmdId: ${event.cmdId} seq: ${event.seq} data: ${data}`);if (jsonData.type == 10000 && jsonData.payload.end == false) {// Subtitle intermediate state} else if (jsonData.type == 10000 && jsonData.payload.end == true) {// That's all for this sentence.}});
Note:
We have more callbacks on the AI dialogue client. For details, see: Conversational AI status callback, Conversational AI Subtitle Callback, Conversational AI Metrics Callback, Conversational AI Error Callback.
IV. Send Custom Messages
Custom messages of Tencent RTC are uniformly sent via the client, cmdID is fixed at 2.
You can skip the ASR process by sending custom text and communicate directly with the AI Service through text .
{"type": 20000, // Send custom text message on the client side"sender": "user_a", // The user ID of the sender. The server will check if this user ID is valid."receiver": ["user_bot"], // List of receiver user IDs. Only the chatbot user ID needs to be specified. The server will validate if this user ID is valid."payload": {"id": "uuid", // Message ID, can use UUID; used for troubleshooting."message": "xxx", // Message content."timestamp": 123 // Timestamp, used for troubleshooting.}}
Interruption can be achieved by sending an interruption signal.
{"type": 20001, // Send interruption signal on the terminal"sender": "user_a", // The user ID of the sender. The server will check if this user ID is valid."receiver": ["user_bot"], // List of receiver user IDs. Only the chatbot user ID needs to be specified. The server will validate if this user ID is valid."payload": {"id": "uuid", // Message ID, can use UUID; used for troubleshooting."timestamp": 123 // Timestamp, used for troubleshooting.}}
Sample Code
public void sendInterruptCode() {try {int cmdID = 0x2;long time = System.currentTimeMillis();String timeStamp = String.valueOf(time/1000);JSONObject payLoadContent = new JSONObject();payLoadContent.put("timestamp", timeStamp);payLoadContent.put("id", String.valueOf(GenerateTestUserSig.SDKAPPID) + "_" + mRoomId);String[] receivers = new String[]{robotUserId};JSONObject interruptContent = new JSONObject();interruptContent.put("type", AICustomMsgType.AICustomMsgType_Send_Interrupt_CMD);interruptContent.put("sender", mUserId);interruptContent.put("receiver", new JSONArray(receivers));interruptContent.put("payload", payLoadContent);String interruptString = interruptContent.toString();byte[] data = interruptString.getBytes("UTF-8");Log.i(TAG, "sendInterruptCode :" + interruptString);mTencent RTCCloud.sendCustomCmdMsg(cmdID, data, true, true);} catch (UnsupportedEncodingException e) {e.printStackTrace();} catch (JSONException e) {throw new RuntimeException(e);}}
@objc func interruptAi() {print("interruptAi")let cmdId = 0x2let timestamp = Int(Date().timeIntervalSince1970 * 1000)let payload = ["id": userId + "_\(roomId)" + "_\(timestamp)", // Message ID, can use UUID; used for troubleshooting."timestamp": timestamp // Timestamp, used for troubleshooting.] as [String : Any]let dict = ["type": 20001,"sender": userId,"receiver": [botId],"payload": payload] as [String : Any]do {let jsonData = try JSONSerialization.data(withJSONObject: dict, options: [])self.trtcCloud.sendCustomCmdMsg(cmdId, data: jsonData, reliable: true, ordered: true)} catch {print("Error serializing dictionary to JSON: \(error)")}}
const message = {"type": 20001,"sender": "user_a","receiver": ["user_bot"],"payload": {"id": "uuid","timestamp": 123}};trtc.sendCustomMessage({cmdId: 2,data: new TextEncoder().encode(JSON.stringify(message)).buffer});
Stop the Conversational AI and Exit the Tencent RTC Room
1. Server stops the conversational AI task.
2. The client exits the Tencent RTC room. It is recommended to refer to Exit the Room (Android, iOS, Windows, and Mac).
VI. Other Features
Other Server-Side APIs
Query conversational AI task status:DescribeAIConversation
You can query the conversational AI task status. There are four values:
1.1
Idle
indicates that the task has not started.1.2
Preparing
indicates that the task is in preparation.1.3
InProgress
indicates that the task is running.1.4
Stopped
indicates that the task has been stopped and resources are being cleaned up.Update conversational AI startup parameter:UpdateAIConversation
During the conversation, the timbre of TTS can be dynamically updated.
Control conversational AI task:ControlAIConversation
When you want the chatbot to actively broadcast text, you can use this API.
Enable Server-Side Callback
Note:
The callback address is set in the Tencent RTC console for conversational AI callbacks.
Can be used in conjunction with Tencent RTC room and media callback to enrich features.
III. Introduction To Other Advanced Features
Feature | Operation Guide |
Intelligent interruption | |
Implement context management | |
Call the function |