Overview

In a Conversational AI session, the STT module captures the user's voice stream in real time and converts it into text, which is then forwarded to the LLM for processing. Powered by TRTC's ultra-low latency audio pipeline (end-to-end audio under 300 ms, conversation latency under 1 s globally) and advanced audio processing — AI noise suppression, echo cancellation, and customizable chat modes — the STT module delivers clear, accurate transcription even in noisy environments. You can plug in TRTC's built-in Tencent ASR or third-party STT providers through the STTConfig object.
Available Providers
Provider
Models
Integration
Best for
﻿Tencent﻿
default
Built-in
Ultra-low latency, advanced audio processing, flexible engine framework
﻿Azure﻿
Azure Speech (default)
Third-party
100+ languages, enterprise SLAs
﻿Deepgram﻿
nova-3, nova-2, etc.
Third-party
Speed & accuracy, cost-efficient English
﻿Soniox﻿
stt-rt-v4, etc.
Third-party
Multilingual, code-switching
Built-in vs Third-party:
All providers share the same STTConfig structure with top-level fields (Language, VadSilenceTime) and provider-specific settings in CustomParam. See each provider's sub-page for the complete configuration.
For the complete STT parameter reference, see the Full STT configuration guide.
﻿

Provider	Models	Integration	Best for
Tencent	default	Built-in	Ultra-low latency, advanced audio processing, flexible engine framework
Azure	Azure Speech (default)	Third-party	100+ languages, enterprise SLAs
Deepgram	nova-3, nova-2, etc.	Third-party	Speed & accuracy, cost-efficient English
Soniox	stt-rt-v4, etc.	Third-party	Multilingual, code-switching