Feedback

Tencent ASR

Tencent's in-house ASR engine is natively integrated into the TRTC platform, delivering ultra-low latency through direct access to TRTC's real-time audio pipeline. Advanced audio processing — including AI noise suppression, echo cancellation, and customizable conversation modes — ensures clear transcription even in noisy environments. The flexible engine framework supports a broad range of models covering Chinese, English, Cantonese, and mixed-language scenarios, all configurable through STTConfig fields with no additional service accounts required. Ideal for teams seeking the fastest integration path with zero external dependencies.

Usage

To use Tencent ASR as the STT engine, pass the following JSON in the STTConfig field of the StartAIConversation API:
// json — STTConfig
{
"Language": "zh",
"VadSilenceTime": 1000
}
For the complete Tencent ASR parameter reference, see the ASR parameter configuration guide.
Built-in provider:
Tencent ASR is TRTC's built-in speech recognition engine. Unlike third-party providers (Azure, Deepgram, Soniox), it does not require the CustomParam field — just configure the STTConfig top-level fields below.

Parameter reference

The following fields are part of STTConfig. For the full definition, see STTConfig.
Field
Type
Required
Description
Language
String
No
Primary language code for recognition (e.g., "zh", "en"). See STTConfig.
VadSilenceTime
Integer
No
VAD silence duration in milliseconds. When silence exceeds this value, the current speech segment ends. See STTConfig.