• 製品
  • 価格
  • リソース
  • サポート
このページは現在英語版のみで提供されており、日本語版も近日中に提供される予定です。ご利用いただきありがとうございます。

AI スマート認識の課金説明

These billing instructions apply to three services: Speech-to-Text , AI Real-Time Translation and Real-Time Speech Synthesis.
​​Speech-to-Text:​​ Transcribes spoken audio into text using Automatic Speech Recognition (ASR/STT). This is commonly used to generate real-time captions.
​​AI Real-Time Translation:​​ Translates transcribed text into target languages to deliver real-time multilingual subtitles.
Real-Time Speech Synthesis: Converts text into speech in real-time using TTS (Text to Speech) technology.

Speech-to-Text

This service recognizes and transcribes audio streams from specified users or all users in a TRTC room.
This capability is available only to applications subscribed to the RTC-Engine Monthly Package.
For eligible packages (RTC Engine Lite and above), the service is billed on a pay-as-you-go basis after unlocking.
Third-party STT is not supported in AI Real-Time translation scenarios to ensure consistency and output quality.
Billing mode: Postpaid.
Billing cycle: Daily. Specific billing details and the statement issuance time are subject to Billing Statement.

AI Real-Time Translation

This service translates transcribed content into one or more specified target languages in real-time.
Billing mode: Postpaid.
Billing cycle: Daily. Specific billing details and the statement issuance time are subject to Billing Statement.

Real-Time Speech Synthesis

This service converts text to natural, fluent speech in real time, enabling live voice output.
Billing mode: Postpaid.
Billing cycle: Daily. Specific billing details and the statement issuance time are subject to Billing Statement.

Pricing

The following table provides the list prices and language support details for the Speech-to-Text, AI Real-Time Translation and Real-Time Text-to-Speech services:
ServiceType
Model Type
Unit Price
Support Languages
Speech-to-Text
Standard
0.02 (USD/Minute)
Supports 22 languages, including:
Chinese, Chinese (Traditional), English, Vietnamese, Japanese, Korean, Indonesian, Thai, Portuguese, Turkish, Arabic, Spanish, Hindi, French, Malay, Filipino, German, Italian, Russian, Swedish, Danish, and Norwegian.
AI Real-time Translation
Standard
0.016 (USD/Minute)
Supports 15 languages, including:
Chinese, English, Vietnamese, Japanese, Korean, Indonesian, Thai, Portuguese, Arabic, Spanish, French, Malay, German, Italian, and Russian.
Real-Time Speech Synthesis
Flash
0.06 (USD/1,000 characters)
Supports Chinese, English
Multilingual model
Supports 37 languages, including:
Spanish, French, Russian, German, Portuguese, Arabic, Italian, Japanese, Korean, Indonesian, Vietnamese, Turkish, Dutch, Ukrainian, Thai, Polish, Romanian, Greek, Czech, Finnish, Hindi, Bulgarian, Danish, Hebrew, Malay, Persian, Slovak, Swedish, Croatian, Filipino, Hungarian, Norwegian, Slovenian, Catalan, Nynorsk, Tamil, Afrikaans

Metering & Usage Notes

Note:
Speech-to-Text and AI Real-Time Translation Service duration is metered in seconds and accumulated on a per SDKAppID basis. For billing, the total daily seconds are converted to minutes, and any remaining seconds are rounded up to the next full minute.
When Speech-to-Text or AI Real-Time translation is enabled in a TRTC room, a robot will join as a virtual participant to subscribe to the relevant audio/video streams. This subscription incurs audio and video usage duration.
Billing for the Real-Time Speech Synthesis is based on the daily cumulative character count, measured at the character level. The pricing unit is 1,000 characters, calculated to three decimal places.
For Speech Synthesis (TTS) billing, the character count is calculated as follows: each Chinese character (including Japanese Kanji, Korean Hanja, and other CJK ideographs) counts as 2 characters. All other characters—including English letters, characters from other languages, punctuation marks, symbols, spaces, and line breaks—count as 1 character.

Speech-to-Text

Only the duration of audio streams actively undergoing recognition is billed.
In multi-stream scenarios, the cumulative duration of all input streams is used for billing.

AI Real-Time Translation

Billed based on the duration of the input audio streams actively translated.
If a single input stream is translated into multiple target languages, billing is calculated as Input Duration × Number of Output Languages

Real-Time Speech Synthesis

Usage is measured based on the number of input text characters for real-time speech synthesis.
For each individual broadcaster stream, charges are applied according to the number of characters that need to be synthesized.

Billing Examples

The billing statistics in the following examples are all calculated to three decimal places:
Consider the following use case:
Users A and B:Having a conversation in Chinese
Viewer C: Requires English captions and English speech output
Viewer D: Requires Japanese captions and Japanese speech output
The system processes the conversation through the following steps:
1. Speech-to-Text recognition (converting Chinese speech to text)
2. Real-Time translation (translating text to English and Japanese)
3. Real-Time speech synthesis (converting translated text to speech)
Usage Details:
Conversation duration: 50.000 minutes
Total text characters: 40.000 thousand characters
User A's Chinese text: 9.000 thousand characters
User B's Chinese text: 11.000 thousand characters
The corresponding charges are calculated as follows:
Billing Type
User A
User B
Subtotal
Speech-to-Text
50.000 minutes
50.000 minutes
100.000 minutes
AI Real-Time Translation
50.000 minutes × 2
50.000 minutes × 2
200.000 minutes
Real-Time Speech Synthesis
9.000 thousand characters × 2
11.000 thousand characters × 2
40.000 thousand characters
Speech-To-Text charges: 100.000 minutes of usage is incurred, unit price is 0.020 USD/minute, the cost is 0.020 × 100.000 = 2.000 USD;
AI Real-Time Translation charges: 200.000 minutes of usage is incurred, unit price is 0.016 USD/minute, the cost is 0.016 × 200.000 = 3.200 USD.
Real-Time Speech Synthesis charges: 40.000 thousand characters of usage is incurred, unit price is 0.060 USD/thousand characters, the cost is 0.060 × 40.000 = 2.400 USD
In this scenario, you need to pay the total fee: 7.600 USD.

Integration Guide

For integration steps, please refer to the Speech-to-Text and Translation Integration Instructions.
To configure Real-Time Speech Synthesis (TTS) in Conversational AI, refer to Conversational AI TTS Configuration.