• 서비스
  • 가격
  • 리소스
  • 기술지원
이 페이지는 현재 영어로만 제공되며 한국어 버전은 곧 제공될 예정입니다. 기다려 주셔서 감사드립니다.

Speech-To-Text Configuration

In a Conversational AI session, the STT module captures the user's voice stream in real time and converts it into text, which is then forwarded to the LLM for processing. Powered by TRTC's ultra-low latency audio pipeline (end-to-end audio under 300 ms, conversation latency under 1 s globally) and advanced audio processing — AI noise suppression, echo cancellation, and customizable chat modes — the STT module delivers clear, accurate transcription even in noisy environments. You can plug in TRTC's built-in Tencent ASR or third-party STT providers through the STTConfig object.

Available Providers

Provider
Models
Integration
Best for
16k_zh_large, 16k_zh, 16k_en, etc.
Built-in
Ultra-low latency, advanced audio processing, flexible engine framework
Azure STT
Azure Speech (default)
Third-party
100+ languages, enterprise SLAs
Deepgram
nova-3, nova-2, etc.
Third-party
Speed & accuracy, cost-efficient English
Soniox
stt-rt-v4, etc.
Third-party
Multilingual, code-switching
Built-in vs Third-party:
All providers share the same STTConfig structure with top-level fields (Language, VadSilenceTime) and provider-specific settings in CustomParam. See each provider's sub-page for the complete configuration.
For the complete STT parameter reference, see the Full STT configuration guide.