텍스트 음성 변환 구성
The TTS module converts the LLM-generated text response into natural-sounding speech and delivers it back to the user in real time. With TRTC's ultra-low latency pipeline (end-to-end audio under 300 ms), synthesized speech reaches users with minimal delay. The flexible framework lets you choose from TRTC's built-in real-time TTS with a curated voice library, or bring your own third-party TTS provider through the
TTSConfig object.Available Providers
Provider | Models | Integration | Best for |
flow_01_turbo, etc. | Built-in | Lowest latency, no external account needed | |
speech-2.8-turbo, speech-2.8-hd, etc. | BYO | Emotionally expressive Chinese voices | |
Neural voices (400+) | BYO | 400+ voices, 140+ languages | |
sonic-3-2026-01-12, sonic-multilingual, etc. | BYO | Ultra-low latency, real-time streaming | |
eleven_3, eleven_flash_v2_5 etc. | BYO | Most human-like voices, voice cloning | |
Inworld AI voices | BYO | Gaming NPCs, interactive characters | |
Your own model | BYO | Bring your own TTS service |
Built-in vs BYO:
TRTC Built-in TTS provides a ready-to-use voice library with no external account needed. For all BYO providers, you will need the corresponding service account and API key. See each provider's sub-page for the complete configuration.