Android&iOS&Windows&Mac

Description
The Voice-to-Text feature can recognize your sent or received successfully voice messages, and convert them into text.
Note:
Voice-to-Text is a value-added paid feature, currently in beta. You can contact us through the Telegram Technical Support Group to enable a full feature experience.
This feature is supported only by the Enhanced SDK v7.4 or later.
Display Effect
You can use this feature to achieve the text conversion effect shown below:
﻿
﻿
﻿
API Description
Speech-to-Text
You can call the convertVoiceToText (Java/ Swift / Objective-C / C++) interface to convert voice into text.
The description of the interface parameters is as follows:
Input Parameters
Meaning
Description
language
Identified Target Language
1. If your mainstream users predominantly use Chinese and English, the language parameter can be passed as an empty string. In this case, we default to using the Chinese-English model for recognition.
2. If you want to specify the target language for recognition, you can set it to a specific value. For the languages currently supported, please refer to Language Support.
callback
Recognition Result Callback
The result refers to the recognized text.
Warning:
The voice to be recognized must be set to a 16k sampling rate, otherwise, it may fail.
﻿
Below is the sample code:
Java
Swift
Objective-C
C++
// Get the V2TIMMessage object from VMS
V2TIMMessage msg = messageList.get(0);
if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {
    // Retrieve the soundElem from V2TIMMessage
    V2TIMSoundElem soundElem = msg.getSoundElem();
    // Invoke speech-to-text conversion, using the Chinese-English recognition model by default
    soundElem.convertVoiceToText("",new V2TIMValueCallback<String>() {
        @Override
        public void onError(int code, String desc) {
            TUIChatUtils.callbackOnError(callBack, TAG, code, desc);
            String str = "convertVoiceToText failed, code: " + code + " desc: " + desc;
            ToastUtil.show(str,true, 1);
        }
    
        @Override
        public void onSuccess(String result) {
            // If recognition is successful, 'result' will be the recognition result
            String str = "convertVoiceToText succeed, result: " + result;
            ToastUtil.show(str, true, 1);
        }
    });
}
// Get the V2TIMMessage object from VMS
let msg = messageList[0]
if msg.elemType == .V2TIM_ELEM_TYPE_SOUND {
    // Retrieve the soundElem from V2TIMMessage
    let soundElem = msg.soundElem
    // Invoke speech-to-text conversion, using the Chinese-English recognition model by default
    soundElem.convertVoiceToText("") { code, desc, result in
        // If recognition is successful, 'result' will be the recognition result
        print("convertVoiceToText, code: \(code), desc: \(desc ?? ""), result: \(result ?? "")")
    }
}
// Get the V2TIMMessage object from VMS
V2TIMMessage *msg = messageList[0];
if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {
    // Retrieve the soundElem from V2TIMMessage
    V2TIMSoundElem *soundElem = msg.soundElem;
    // Invoke speech-to-text conversion, using the Chinese-English recognition model by default
    [soundElem convertVoiceToText:@"" completion:^(int code, NSString *desc, NSString *result) {
        // If recognition is successful, 'result' will be the recognition result
        NSLog(@"convertVoiceToText, code: %d, desc: %@, result: %@", code, desc, result);
    }];
}
template <class T>
class ValueCallback final : public V2TIMValueCallback<T> {
public:
    using SuccessCallback = std::function<void(const T&)>;
    using ErrorCallback = std::function<void(int, const V2TIMString&)>;
﻿
    ValueCallback() = default;
    ~ValueCallback() override = default;
﻿
    void SetCallback(SuccessCallback success_callback, ErrorCallback error_callback) {
        success_callback_ = std::move(success_callback);
        error_callback_ = std::move(error_callback);
    }
﻿
    void OnSuccess(const T& value) override {
        if (success_callback_) {
            success_callback_(value);
        }
    }
    void OnError(int error_code, const V2TIMString& error_message) override {
        if (error_callback_) {
            error_callback_(error_code, error_message);
        }
    }
﻿
private:
    SuccessCallback success_callback_;
    ErrorCallback error_callback_;
};
﻿
auto callback = new ValueCallback<V2TIMString>{};
callback->SetCallback(
    [=](const V2TIMString& result) {
        // Speech-to-text conversion successful, 'result' will be the conversion result
        delete callback;
    },
    [=](int error_code, const V2TIMString& error_message) {
        // Speech-to-Text Conversion failed
        delete callback;
    });
﻿
// Get the V2TIMMessage object from VMS
V2TIMMessage *msg = messageList[0];
// Retrieve the soundElem from V2TIMMessage
V2TIMElem *elem = message.elemList[0];
if (elem->elemType == V2TIM_ELEM_TYPE_SOUND) {
    V2TIMSoundElem *sound_elem = (V2TIMSoundElem *)elem;
    // Invoke speech-to-text conversion, using the Chinese-English recognition model by default
    sound_elem->ConvertVoiceToText("", &convertVoiceToTextCallback);
}
Language Support
The currently supported target languages for recognition are as follows:
Supported Languages
Input Parameter Settings
Mandarin Chinese
"zh (cmn-Hans-CN)"
Cantonese Chinese
"yue-Hant-HK"
English
"en-US"
Japanese (Japan)
"ja-JP"
﻿

Input Parameters	Meaning	Description
language	Identified Target Language	1. If your mainstream users predominantly use Chinese and English, the language parameter can be passed as an empty string. In this case, we default to using the Chinese-English model for recognition. 2. If you want to specify the target language for recognition, you can set it to a specific value. For the languages currently supported, please refer to Language Support.
callback	Recognition Result Callback	The `result` refers to the recognized text.

Supported Languages	Input Parameter Settings
Mandarin Chinese	"zh (cmn-Hans-CN)"
Cantonese Chinese	"yue-Hant-HK"
English	"en-US"
Japanese (Japan)	"ja-JP"