Enhancing Real-Time Audio and Video Quality with Tencent's Advanced Encoding Technologies
Looking back at the year 2023, Tencent Multimedia Lab and Tencent RTC team continued to cooperate in real-time audio and video scenarios. From the perspectives of product link and technical foundation, they conducted further in-depth optimization on video quality and compression performance, continuously polishing various underlying technologies including the self-developed H.264 real-time encoder - O264rt. This article will unveil the optimization details behind the product from a technical perspective.
In the field of audio and video, people are always pursuing higher clarity, more extreme experiences, and more vivid effects in order to obtain a more immersive feeling. Therefore, explorations starting from these points - such as 8K, VR, HDR, etc., have undoubtedly sparked a lot of discussion and research. More realistic images can greatly enhance the user experience, but this is not an easy task in the field of real-time audio and video.
Challenges in Real-Time Audio and Video
An important distinction between real-time audio and video and other live scenarios is extremely low latency. This provides users with a more barrier-free remote interaction and communication experience. As a result, users can enjoy a more extreme experience in the following scenarios: such as more efficient video conferences, more immersive live effects, and more comfortable video call experiences. However, extremely low latency also implies the need for efficient encoding on the client side, and very strict requirements for codec delay, CPU performance consumption, and jitter resistance.
Figure 1: Latency illustration for different audio and video scenarios
Generally speaking, the technical requirements for improving clarity mean that higher resolution images and better picture quality need to be processed. This leads to higher encoding complexity and greater bandwidth occupancy, which can easily cause other issues that affect the user experience, such as increased stuttering and excessive terminal CPU performance consumption. Taking Tencent RTC user data as an example, the top device distribution for terminal devices is still dominated by mid-range devices from three or four years ago. In this situation, hastily increasing encoding complexity and bandwidth to obtain high-definition images will inevitably reduce the actual QoE, affect user stuttering and heating, and impact the user experience. Clarity improvement and overall QoE should complement each other, but due to the particularity of real-time encoding on the client side, they seem to be contradictory.
In order to solve this difficult problem in real-time audio and video and achieve simultaneous improvement of video quality and overall QoE, the joint technical research and development team of Tencent RTC and Tencent Multimedia Lab has been working hard to overcome it. In the past year, the relevant teams have made significant adjustments in video encoding and decoding, network optimization, and client-side adaptation.
Optimization Behind the Numbers
30% - this is the image quality improvement of the self-developed H.264 real-time encoder O264rt by the Multimedia Lab compared to x264 in the fast gear. At the same time, the algorithm and engineering team deeply explored and optimized the core modules of interpolation, quantization, and entropy coding in the encoder, continuously supplementing multi-platform assembly optimization. Through these detailed optimizations, the encoding speed, which is directly related to CPU usage, further increased by 10%, significantly surpassing similar solutions. This allows the encoder to achieve higher quality compression in real-time narrowband scenarios, providing better image quality. On the other hand, as the encoder further reduces CPU usage, it also means that more computing resources can be saved, heat can be avoided, and a better user experience can be provided.
The rate-distortion optimization during encoder quantization can comprehensively consider the impact of bitrate and distortion at the encoding unit level and find better solutions. Its basic idea can be traced back to the 1990s, and it is reflected in H.264, H.265, and even the latest generation of codec standards H.266. However, the complexity of this scheme is extremely high, and many encoders only use it in slow gears. In order to implement this technology in real-time scenarios, the algorithm team deeply explored the optimization points and continuously mined related acceleration strategies based on conventional algorithms such as Viterbi. In 2023, based on the joint efforts of the Multimedia Lab and the TRTC technical team, there were more than 50 related algorithm optimizations for the O264rt encoder alone. It is this continuous pursuit of perfection and polishing of the encoder that can bring users better image quality and QoE without introducing additional encoding consumption.
Figure 2: Network Search and Optimization
Exploring a Wider Range of Scenarios
Tencent RTC continuously serves different scenarios for ToB users. In order to continuously improve the user experience, the H.264 real-time encoder carries out targeted optimizations for corresponding application scenarios and requirements, such as ROI tuning, screen content detection optimization, and adaptive scene optimization algorithms. Taking the low-latency live streaming scenario as an example, by adjusting the code control algorithm and introducing temporal complexity information for joint optimization of bitrate allocation, the H.264 real-time encoder can further improve the encoding quality in such scenarios, enhancing the user's subjective perception. The following image shows the subjective quality comparison before and after optimization:
Figure 3: Comparison of image quality optimization with O264rt encoder
Deepening the Technical Foundation
Tencent has been continuously working on audio and video-related standards and application implementation. Taking video encoding and decoding as an example, Tencent Multimedia Lab has been deeply involved in the formulation of international and domestic codec standards, including H.266/VVC and AVS3, since 2017. It has created a high-performance video encoding matrix covering almost all mainstream encoding standards, serving various scenarios such as real-time audio and video, transcoding, and VR. In the MSU encoding competition, it has achieved industry-leading results for several consecutive years. As the technical foundation for real-time audio and video products, deep cultivation is essential to accumulate core competitiveness and bring better value to users.
If you have any questions or need assistance, our support team is always ready to help. Please feel free to Contact Us or join us in Discord.