All Blog

Tencent RTC Voice AI Agent

10 min read

Oct 28, 2025

tencent rtc voice ai agent.png

A realtime integrated framework for production-grade multimodal and voice AI agents.

Introduction

The Tencent RTC (TRTC) framework provides a rich set of SDKs and cloud service APIs, allowing you to add any AI program into a TRTC room as a full realtime participant.

It is primarily a combination of SDKs and cloud service APIs that provides a complete toolset, making it easy to feed realtime media and data streams into an AI processing pipeline. This pipeline can flexibly collaborate with Tencent Cloud's powerful AI services (like Real-time STT, TTS, and the third-party Large Language Model provider, and publish realtime results back to the room.

While the solution excels at AI-powered voice agents, it is designed to support any type of programmatic participant. You can deploy custom logic to process realtime audio, video, and data streams, making it suitable for a wide range of applications beyond traditional AI agents.

If you want to get your hands on the code for building an agent right away, follow the Voice AI quickstart guide. It takes just a few minutes to build your first voice agent.

Build and deploy a simple voice assistant with Node.js in minutes.
Use Tencent RTC's low-code platform to quickly build, deploy, and manage your AI Agents.
Run your agent on Tencent Cloud's global infrastructure.
Source code and examples for TRTC server-side SDKs and AI integration.
Reference documentation for TRTC SDK, STT API, TTS API, and LLM API.

AI Agent Applications & Use Cases

Multimodal Assistance: Enable agents to assist via voice, text, or screen sharing for various scenarios, including telehealth, customer service, and robotics.

Telehealth & Medical Triage: AI can assist with real-time telemedicine consultations, either with or without human involvement. Additionally, agents can triage patients based on symptoms and medical history.

Customer Service & Call Centers: AI agents can be deployed for inbound and outbound calls, enhancing customer service across industries by handling queries, directing calls, and offering self-service options.

Real-Time Translation: Use AI to translate conversations or content in real-time, improving global communication in diverse sectors.

NPCs & Robotics: Replace static scripts with language models to create lifelike NPCs for gaming or simulation, and empower robots with cloud-based AI to enhance their capabilities.

Business Operations: AI can automate tasks like taking restaurant orders, managing a company directory, and processing data in pipelines (e.g., translation, task delegation).

Framework Overview

Your agent code (running as a stateful server-side application) operates as a realtime bridge between powerful AI models and your users. AI models run in data centers, while users often connect from mobile networks with varying quality.

Tencent RTC ensures ultra-low latency communication (< 300ms global end-to-end latency) between the frontend and the agent (server).

The agent (server) then communicates with backend AI services (STT, LLM, TTS) using HTTP/WebSocket APIs. This setup provides the realtime benefits of TRTC while keeping the complexity of AI processing in cloud services.

The solution includes components for handling the core challenges of realtime voice AI, such as streaming audio through an STT-LLM-TTS pipeline, reliable intelligent interruption via VAD (Voice Activity Detection), and LLM orchestration. It supports flexible integration with Tencent RTC Cloud and major global AI providers.

Other framework features include:

Voice, Video, and Multimodal: Build agents that can process realtime input and produce output in any modality (text, voice, video, digital human).

Tool Use: Define tools compatible with the LLM, and even forward tool calls to your frontend.

Multi-Agent Collaboration: Break down complex workflows into simpler tasks and orchestrate them via the Tencent Cloud Agent Development Platform (TCADP).

Extensive AI Integrations: Flexibly integrate Tencent Cloud AI services or your own, as well as third-party STT, TTS, and LLM models.

Advanced Endpoint Detection: Use the VAD model built into the cloud STT service for natural and fluid conversation flow.

Made for Developers: Build your agents in code, not configuration, and leverage rich APIs for deep customization.

Enterprise-Grade Cloud Service: Relies on Tencent Cloud's globally covered infrastructure to ensure high availability and reliability.

How Agents Connect to Tencent RTC

When your agent code starts, it first runs on your server as a persistent or on-demand 'Worker' process.

This 'Worker' uses TRTC's server-side SDK (or simulates a client using the client SDK) to receive dispatch requests and join the specified TRTC room, acting like an invisible participant. By default, your 'Worker' requires business logic developed by you to be dispatched to handle new rooms or new calls.

After your agent and user join a room, the agent and your frontend app can communicate using TRTC. This enables reliable and fast realtime communication in any network conditions.

Getting Started

Follow these guides to learn more and get started with TRTC AI Agents.

Solutions: A comprehensive collection of examples, guides, and recipes for TRTC AI solutions.

Intro to TRTC: An overview of the TRTC Conversational AI.

Web and Mobile Integration: Call the API of Conversational AI with a custom web or mobile app.

Voice AI Quickstart: Build a simple voice assistant in minutes.

AI Models: Explore the STT Configuration, LLM Configuration, TTS Configuration available for TRTC integration.

Building Voice Agents: Contact Us for more information and support to build advanced voice AI apps with TRTC.