Android&iOS&Windows&Mac
Description
The Voice-to-Text feature can recognize your sent or received successfully voice messages, and convert them into text.
Note:
Voice-to-Text is a value-added paid feature, currently in beta. You can contact us through the Telegram Technical Support Group to enable a full feature experience.
This feature is supported only by the Enhanced SDK v7.4 or later.
Display Effect
You can use this feature to achieve the text conversion effect shown below:
API Description
Speech-to-Text
You can call the
convertVoiceToText
(Java/ Swift / Objective-C / C++) interface to convert voice into text.The description of the interface parameters is as follows:
Input Parameters | Meaning | Description |
language | Identified Target Language | 1. If your mainstream users predominantly use Chinese and English, the language parameter can be passed as an empty string. In this case, we default to using the Chinese-English model for recognition. 2. If you want to specify the target language for recognition, you can set it to a specific value. For the languages currently supported, please refer to Language Support. |
callback | Recognition Result Callback | The result refers to the recognized text. |
Warning:
The voice to be recognized must be set to a 16k sampling rate, otherwise, it may fail.
Below is the sample code:
// Get the V2TIMMessage object from VMSV2TIMMessage msg = messageList.get(0);if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {// Retrieve the soundElem from V2TIMMessageV2TIMSoundElem soundElem = msg.getSoundElem();// Invoke speech-to-text conversion, using the Chinese-English recognition model by defaultsoundElem.convertVoiceToText("",new V2TIMValueCallback<String>() {@Overridepublic void onError(int code, String desc) {TUIChatUtils.callbackOnError(callBack, TAG, code, desc);String str = "convertVoiceToText failed, code: " + code + " desc: " + desc;ToastUtil.show(str,true, 1);}@Overridepublic void onSuccess(String result) {// If recognition is successful, 'result' will be the recognition resultString str = "convertVoiceToText succeed, result: " + result;ToastUtil.show(str, true, 1);}});}
// Get the V2TIMMessage object from VMSlet msg = messageList[0]if msg.elemType == .V2TIM_ELEM_TYPE_SOUND {// Retrieve the soundElem from V2TIMMessagelet soundElem = msg.soundElem// Invoke speech-to-text conversion, using the Chinese-English recognition model by defaultsoundElem.convertVoiceToText("") { code, desc, result in// If recognition is successful, 'result' will be the recognition resultprint("convertVoiceToText, code: \(code), desc: \(desc ?? ""), result: \(result ?? "")")}}
// Get the V2TIMMessage object from VMSV2TIMMessage *msg = messageList[0];if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {// Retrieve the soundElem from V2TIMMessageV2TIMSoundElem *soundElem = msg.soundElem;// Invoke speech-to-text conversion, using the Chinese-English recognition model by default[soundElem convertVoiceToText:@"" completion:^(int code, NSString *desc, NSString *result) {// If recognition is successful, 'result' will be the recognition resultNSLog(@"convertVoiceToText, code: %d, desc: %@, result: %@", code, desc, result);}];}
template <class T>class ValueCallback final : public V2TIMValueCallback<T> {public:using SuccessCallback = std::function<void(const T&)>;using ErrorCallback = std::function<void(int, const V2TIMString&)>;ValueCallback() = default;~ValueCallback() override = default;void SetCallback(SuccessCallback success_callback, ErrorCallback error_callback) {success_callback_ = std::move(success_callback);error_callback_ = std::move(error_callback);}void OnSuccess(const T& value) override {if (success_callback_) {success_callback_(value);}}void OnError(int error_code, const V2TIMString& error_message) override {if (error_callback_) {error_callback_(error_code, error_message);}}private:SuccessCallback success_callback_;ErrorCallback error_callback_;};auto callback = new ValueCallback<V2TIMString>{};callback->SetCallback([=](const V2TIMString& result) {// Speech-to-text conversion successful, 'result' will be the conversion resultdelete callback;},[=](int error_code, const V2TIMString& error_message) {// Speech-to-Text Conversion faileddelete callback;});// Get the V2TIMMessage object from VMSV2TIMMessage *msg = messageList[0];// Retrieve the soundElem from V2TIMMessageV2TIMElem *elem = message.elemList[0];if (elem->elemType == V2TIM_ELEM_TYPE_SOUND) {V2TIMSoundElem *sound_elem = (V2TIMSoundElem *)elem;// Invoke speech-to-text conversion, using the Chinese-English recognition model by defaultsound_elem->ConvertVoiceToText("", &convertVoiceToTextCallback);}
Language Support
The currently supported target languages for recognition are as follows:
Supported Languages | Input Parameter Settings |
Mandarin Chinese | "zh (cmn-Hans-CN)" |
Cantonese Chinese | "yue-Hant-HK" |
English | "en-US" |
Japanese (Japan) | "ja-JP" |