Billing of Speech-To-Text
These billing instructions apply to two services: Speech-to-Text and AI Real-Time Translation.
Speech-to-Text: Transcribes spoken audio into text using Automatic Speech Recognition (ASR/STT). This is commonly used to generate real-time captions.
AI Real-Time Translation: Translates transcribed text into target languages to deliver real-time multilingual subtitles.
Billing Information
Speech-to-Text
This service recognizes and transcribes audio streams from specified users or all users in a TRTC room.
This capability is available only to applications subscribed to the RTC-Engine Monthly Package.
For eligible packages (RTC Engine Lite and above), the service is billed on a pay-as-you-go basis after unlocking.
Third-party STT is not supported in AI real-time translation scenarios to ensure consistency and output quality.
Billing mode: Postpaid.
Billing cycle: Daily. Specific billing details and the statement issuance time are subject to Billing Statement.
AI Real-Time Translation
This service translates transcribed content into one or more specified target languages in real-time.
Billing mode: Postpaid.
Billing cycle: Daily. Specific billing details and the statement issuance time are subject to Billing Statement.
Pricing
The following table provides the list prices and language support details for both the Speech-to-Text and AI Real-Time Translation services:
ServiceType | Unit Price (USD/Minute) | Support Languages |
Speech-to-Text | 0.02 | Supports 22 languages, including: Chinese, Chinese (Traditional), English, Vietnamese, Japanese, Korean, Indonesian, Thai, Portuguese, Turkish, Arabic, Spanish, Hindi, French, Malay, Filipino, German, Italian, Russian, Swedish, Danish, and Norwegian. |
AI Real-time Translation | 0.016 | Supports 15 languages, including: Chinese, English, Vietnamese, Japanese, Korean, Indonesian, Thai, Portuguese, Arabic, Spanish, French, Malay, German, Italian, and Russian. |
Metering & Usage Notes
Note:
Service duration is metered in seconds and accumulated on a per SDKAppID basis. For billing, the total daily seconds are converted to minutes, and any remaining seconds are rounded up to the next full minute.
When speech-to-text or AI real-time translation is enabled in a TRTC room, a robot will join as a virtual participant to subscribe to the relevant audio/video streams. This subscription incurs audio and video usage duration.
Speech-to-Text
Only the duration of audio streams actively undergoing recognition is billed.
In multi-stream scenarios, the cumulative duration of all input streams is used for billing.
AI Real-Time Translation
Billed based on the duration of the input audio streams actively translated.
If a single input stream is translated into multiple target languages, billing is calculated as Input Duration × Number of Output Languages
Billing Examples
For example, suppose Users A and B are having a voice call in Chinese. Viewer C requires English subtitles, while viewer D requires Japanese subtitles. Both Speech-to-Text and AI Real-Time Translation services are involved in this scenario. The total call duration is 5 minutes. The corresponding charges are calculated as follows:
Billing Type | User A | User B | Subtotal |
Speech-to-Text | 5 minutes | 5 minutes | 10 minutes |
AI Real-time Translation | 5 minutes * 2 | 5 minutes * 2 | 20 minutes |
Speech-to-text charges: 10 minutes of usage is incurred, unit price is 0.02 USD/minute, the cost is 0.02 × 10 = 0.2 USD;
AI Real-time translation charges: 20 minutes of usage is incurred, unit price is 0.016 USD/minute, the cost is 0.016 × 20 = 0.32 USD.
In this scenario, you need to pay the total fee: 0.52 USD.