このページは現在英語版のみで提供されており、日本語版も近日中に提供される予定です。ご利用いただきありがとうございます。

音声をテキストに変換

Use Cases
Tencent Real-Time Communication (TRTC) supports the speech-to-text feature, which converts the audio streams of specified users or all users in a room into corresponding Chinese text for effects such as real-time captions.
Prerequisites
Log in to the TRTC console, activate the TRTC service, and create an RTC-Engine application.
Go to the purchase page to buy an RTC-Engine package of any version to unlock the speech-to-text feature.
Note:
The speech-to-text feature incurs fees based on usage. See Fee Details for more information.
Feature Overview
After a task is initiated, TRTC AI Service uses an Automatic Speech Recognition (ASR) bot to enter a TRTC room to pull the streams of specified users or all users for speech-to-text recognition, and then relay the recognition results to the client and server in real time.
﻿
Integration Guide
Step 1: Receiving Speech-to-Text Results
Method 1: Receiving Text Messages via Client SDK
Use the custom message receiving feature of the TRTC SDK to listen to callbacks on the client and receive real-time speech-to-text result data.
The client callback message format is as follows, taking the web end as an example:
trtc.on(TRTC.EVENT.CUSTOM_MESSAGE, event => { // Receive custom messages.
   // event.userId: The userId of the ASR robot.
   // event.cmdId: The message ID, which is fixed at 1 for transcriptions and captions.
   // event.seq: The sequence number of a message.
   // event.data: ArrayBuffer type. For content of transcriptions or captions, see the explanation of the data field below.
   const data = new TextDecoder().decode(event.data)
   // Explanation of the data field is as follows.
   console.log(`received custom msg from ${event.userId}, message: ${ data }`)
})
Data field explanation
Real-Time Captions
Field Name
Type
Meaning
type
Integer
10000: When there are real-time captions and a complete sentence, the message type will be delivered.
sender
String
Speaker's userid.
receiver
Array
Recipient's userid list. This message is actually broadcast within a room.
payload.text
String
Recognized text, Unicode encoded.
payload.start_time
String
Message start time. It is the absolute time after a task starts.
payload.end_time
String
Message end time. It is the absolute time after a task starts.
payload.end
Boolean
If true, it indicates that this is a complete sentence.
{
  "type": 10000,
  "sender": "user_a",
  "payload": {
     "text":"",
     "start_time":"00:00:02",
     "end_time":"00:00:05",
     "end": true
  }
}
Note:
Callback example explanation:
Transcription: A complete sentence will be transcribed and pushed.
	"How's the weather today?"
Captions: A sentence will be segmented for pushing, with each subsequent segment containing the previous one to ensure real-time performance.
"Today"
"Today's weather"
"How's the weather today?"
Sequence explanation: Caption message > Caption message > .... > Caption message (end = true)
Method 2: Receiving via Server-side Callbacks
The speech-to-text service also provides server-side event callbacks, facilitating your service to receive real-time conversation messages. See Detailed Callback Events.
Step 2: Initiating a Speech-to-Text Task
TRTC provides the following Tencent Cloud APIs for initiating and managing speech-to-text tasks:
Start a speech-to-text task: StartAITranscription
Query a speech-to-text task: DescribeAITranscription
Stop a speech-to-text task: StopAITranscription
Note:
The speech-to-text feature has a concurrency limit of 100 tasks per SDKAppId. Submit a ticket if you need to increase this limit.

Field Name	Type	Meaning
type	Integer	10000: When there are real-time captions and a complete sentence, the message type will be delivered.
sender	String	Speaker's userid.
receiver	Array	Recipient's userid list. This message is actually broadcast within a room.
payload.text	String	Recognized text, Unicode encoded.
payload.start_time	String	Message start time. It is the absolute time after a task starts.
payload.end_time	String	Message end time. It is the absolute time after a task starts.
payload.end	Boolean	If true, it indicates that this is a complete sentence.