• 製品
  • 価格
  • リソース
  • サポート
このページは現在英語版のみで提供されており、日本語版も近日中に提供される予定です。ご利用いただきありがとうございます。

Speech-To-Text Configuration

In a Conversational AI session, the STT module captures the user's voice stream in real time and converts it into text, which is then forwarded to the LLM for processing. Powered by TRTC's ultra-low latency audio pipeline (end-to-end audio under 300 ms, conversation latency under 1 s globally) and advanced audio processing — AI noise suppression, echo cancellation, and customizable chat modes — the STT module delivers clear, accurate transcription even in noisy environments. You can plug in TRTC's built-in Tencent ASR or third-party STT providers through the STTConfig object.

Available Providers

Provider
Models
Integration
Best for
16k_zh_large, 16k_zh, 16k_en, etc.
Built-in
Ultra-low latency, advanced audio processing, flexible engine framework
Azure STT
Azure Speech (default)
Third-party
100+ languages, enterprise SLAs
Deepgram
nova-3, nova-2, etc.
Third-party
Speed & accuracy, cost-efficient English
Soniox
stt-rt-v4, etc.
Third-party
Multilingual, code-switching
Built-in vs Third-party:
All providers share the same STTConfig structure with top-level fields (Language, VadSilenceTime) and provider-specific settings in CustomParam. See each provider's sub-page for the complete configuration.
For the complete STT parameter reference, see the Full STT configuration guide.