Text-To-Speech Configuration

The TTS module converts the LLM-generated text response into natural-sounding speech and delivers it back to the user in real time. With TRTC's ultra-low latency pipeline (end-to-end audio under 300 ms), synthesized speech reaches users with minimal delay. The flexible framework lets you choose from TRTC's built-in real-time TTS with a curated voice library, or bring your own third-party TTS provider through the TTSConfig object.

Available Providers

Provider
Models
Integration
Best for
flow_01_turbo, etc.
Built-in
Lowest latency, no external account needed
speech-2.8-turbo, speech-2.8-hd, etc.
BYO
Emotionally expressive Chinese voices
Azure TTS
Neural voices (400+)
BYO
400+ voices, 140+ languages
Cartesia
sonic-3-2026-01-12, sonic-multilingual, etc.
BYO
Ultra-low latency, real-time streaming
eleven_3, eleven_flash_v2_5 etc.
BYO
Most human-like voices, voice cloning
Inworld
Inworld AI voices
BYO
Gaming NPCs, interactive characters
Your own model
BYO
Bring your own TTS service
Built-in vs BYO:
TRTC Built-in TTS provides a ready-to-use voice library with no external account needed. For all BYO providers, you will need the corresponding service account and API key. See each provider's sub-page for the complete configuration.
For the complete TTS parameter reference, see the Text-to-Speech Configuration.