Android&iOS&Windows&Mac

Description

The Voice-to-Text feature can recognize your sent or received successfully voice messages, and convert them into text.
Note:
Voice-to-Text is a value-added paid feature, currently in beta. You can contact us through the Telegram Technical Support Group to enable a full feature experience.
This feature is supported only by the Enhanced SDK v7.4 or later.

Display Effect

You can use this feature to achieve the text conversion effect shown below:




API Description

Speech-to-Text

You can call the convertVoiceToText (Java/ Swift / Objective-C / C++) interface to convert voice into text.
The description of the interface parameters is as follows:
Input Parameters
Meaning
Description
language
Identified Target Language
1. If your mainstream users predominantly use Chinese and English, the language parameter can be passed as an empty string. In this case, we default to using the Chinese-English model for recognition.
2. If you want to specify the target language for recognition, you can set it to a specific value. For the languages currently supported, please refer to Language Support.
callback
Recognition Result Callback
The result refers to the recognized text.
Warning:
The voice to be recognized must be set to a 16k sampling rate, otherwise, it may fail.

Below is the sample code:
Java
Swift
Objective-C
C++
// Get the V2TIMMessage object from VMS
V2TIMMessage msg = messageList.get(0);
if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {
// Retrieve the soundElem from V2TIMMessage
V2TIMSoundElem soundElem = msg.getSoundElem();
// Invoke speech-to-text conversion, using the Chinese-English recognition model by default
soundElem.convertVoiceToText("",new V2TIMValueCallback<String>() {
@Override
public void onError(int code, String desc) {
TUIChatUtils.callbackOnError(callBack, TAG, code, desc);
String str = "convertVoiceToText failed, code: " + code + " desc: " + desc;
ToastUtil.show(str,true, 1);
}
@Override
public void onSuccess(String result) {
// If recognition is successful, 'result' will be the recognition result
String str = "convertVoiceToText succeed, result: " + result;
ToastUtil.show(str, true, 1);
}
});
}
// Get the V2TIMMessage object from VMS
let msg = messageList[0]
if msg.elemType == .V2TIM_ELEM_TYPE_SOUND {
// Retrieve the soundElem from V2TIMMessage
let soundElem = msg.soundElem
// Invoke speech-to-text conversion, using the Chinese-English recognition model by default
soundElem.convertVoiceToText("") { code, desc, result in
// If recognition is successful, 'result' will be the recognition result
print("convertVoiceToText, code: \(code), desc: \(desc ?? ""), result: \(result ?? "")")
}
}
// Get the V2TIMMessage object from VMS
V2TIMMessage *msg = messageList[0];
if (msg.elemType == V2TIM_ELEM_TYPE_SOUND) {
// Retrieve the soundElem from V2TIMMessage
V2TIMSoundElem *soundElem = msg.soundElem;
// Invoke speech-to-text conversion, using the Chinese-English recognition model by default
[soundElem convertVoiceToText:@"" completion:^(int code, NSString *desc, NSString *result) {
// If recognition is successful, 'result' will be the recognition result
NSLog(@"convertVoiceToText, code: %d, desc: %@, result: %@", code, desc, result);
}];
}
template <class T>
class ValueCallback final : public V2TIMValueCallback<T> {
public:
using SuccessCallback = std::function<void(const T&)>;
using ErrorCallback = std::function<void(int, const V2TIMString&)>;

ValueCallback() = default;
~ValueCallback() override = default;

void SetCallback(SuccessCallback success_callback, ErrorCallback error_callback) {
success_callback_ = std::move(success_callback);
error_callback_ = std::move(error_callback);
}

void OnSuccess(const T& value) override {
if (success_callback_) {
success_callback_(value);
}
}
void OnError(int error_code, const V2TIMString& error_message) override {
if (error_callback_) {
error_callback_(error_code, error_message);
}
}

private:
SuccessCallback success_callback_;
ErrorCallback error_callback_;
};

auto callback = new ValueCallback<V2TIMString>{};
callback->SetCallback(
[=](const V2TIMString& result) {
// Speech-to-text conversion successful, 'result' will be the conversion result
delete callback;
},
[=](int error_code, const V2TIMString& error_message) {
// Speech-to-Text Conversion failed
delete callback;
});

// Get the V2TIMMessage object from VMS
V2TIMMessage *msg = messageList[0];
// Retrieve the soundElem from V2TIMMessage
V2TIMElem *elem = message.elemList[0];
if (elem->elemType == V2TIM_ELEM_TYPE_SOUND) {
V2TIMSoundElem *sound_elem = (V2TIMSoundElem *)elem;
// Invoke speech-to-text conversion, using the Chinese-English recognition model by default
sound_elem->ConvertVoiceToText("", &convertVoiceToTextCallback);
}

Language Support

The currently supported target languages for recognition are as follows:
Supported Languages
Input Parameter Settings
Mandarin Chinese
"zh (cmn-Hans-CN)"
Cantonese Chinese
"yue-Hant-HK"
English
"en-US"
Japanese (Japan)
"ja-JP"