All Blog

Building a Web-based Audio and Video Call Engine with WebAssembly

10 min read

Nov 4, 2024

As web technology advances and audio and video call demands evolve, it's essential to explore and implement the value of new web technologies in practical applications, bringing us greater benefits. We will introduce the value and advantages of WebAssembly, WebCodecs, WebTransport, and other technologies in the audio and video industry from practical experience.

We will introduce the topic from four aspects: background, WebAssembly engine, WebAssembly implementation, and issues and prospects.

Background

With the upgrade of network infrastructure, the iteration of audio and video transmission technology, and the change of audio and video consumption habits, multimedia technology has developed from the initial on-demand and live broadcast to ultra-low-latency live broadcast and real-time audio and video interaction. WebRTC has laid the technical foundation in the development process.

The image illustrates the architecture of WebRTC (Web Real-Time Communication)

This is a schematic diagram of WebRTC's architecture. WebRTC provides a rich set of Web APIs. Audio and video capture, encoding and decoding, preprocessing and postprocessing, transmission, and rendering are all made possible by WebRTC. When developing web-based audio and video applications, the use of WebRTC reduces development difficulty and cost.

WebRTC also has some shortcomings. Firstly, WebRTC does not allow custom codecs. Secondly, WebRTC cannot reuse existing service frameworks and optimization capabilities. Lastly, WebRTC has a low level of customization.

Are there any new web technologies that can replace WebRTC and solve its problems? The following are some new technologies that can be used.

WebAssembly is a new type of code that runs in modern browsers and provides new performance features and effects. Its design goals are fast, efficient, portable, readable, debuggable, secure, and non-disruptive to the network. WebAssembly can solve performance issues of JavaScript in complex scenarios, such as 3D games, computer vision, image and video editing, and many other areas that require native performance. Using WebAssembly in some scenarios that previously used JavaScript can significantly improve efficiency. Thanks to WebAssembly's small size, it can also solve the problem of high costs for downloading and parsing JavaScript applications.

WebCodecs provides developers with a way to use existing media components in browsers, not only solving the low-latency problem of codecs but also providing more flexible configuration interfaces. The image on the right shows the configuration options for a video encoder, with many configurable options provided, such as choosing between software and hardware encoding, VBR/CBR selection, quality priority/low latency priority, etc. When using H264 encoding with HighProfile, WebCodes can easily support it, providing great convenience at the encoding level.

WebTransport is a new pluggable communication protocol that supports reliable and unreliable transmission. It can be used in applications that require reliable transmission. WebTransport's goal is to be faster, more efficient, secure, and low-latency. It can solve the problem of connection migration. WebTransport has flexible congestion control and better weak network capabilities. When dealing with head-of-line blocking, more flexible transmission methods can be used.

WebAssembly Engine

New technologies and architectures aim to provide users with more possibilities. Custom codecs, custom transmission methods, custom data encryption, custom audio and video preprocessing and postprocessing, and custom QoS operations have all been implemented in practical projects.

This is the architecture diagram of the entire WebAssembly engine. The WebAssembly engine mainly includes WebSDK, User Scheduling Center, WebTransport/WebSocket Gateway Cluster, and backend TRTC service cluster and scheduling. Since the backend TRTC service can be directly reused, the main work lies in the development of WebSDK and WebGateway. WebSDK provides interfaces such as Client, LocalStream, and RemoteStream. The Client provides users with operable methods. LocalStream provides audio and video data callbacks. RemoteStream provides audio and video data callbacks for remote users. The bus is responsible for the entire operation of WebSDK. The underlying layers include log reporting, quality reporting, exception detection, status recovery, collection and rendering, Wasm SDK, WebCodecs, WebTransport/WebSocket, etc. The orange part is the main technology used. WebCodecs and WebTransport/WebSocket are browser-provided methods that only need to be used properly.

The WebAssembly SDK is divided into five modules. Audio processing includes echo cancellation, AI noise reduction, and gain. Protocol encapsulation and decapsulation include video protocol encapsulation and decapsulation, video packet protocol segmentation and decapsulation, and FEC. Downlink quality control includes video Jitter buffer, video NetEQ, FEC recovery/NACK, and audio and video synchronization. In addition, there are uplink and downlink quality statistics, congestion control, audio encoding, and audio decoding.

The light-colored part on the far left is the JS layer. The upper and lower parts are the WebCodecs layer, the middle is Wasm, and the far right is the network transmission part. After the JS business layer collects audio and video data, it hands it over to WebAssembly for audio preprocessing. Afterward, it will be encoded by WebCodecs, packaged, and sent over the network. After collecting data from the network, it will also be unpacked in WebAssembly and undergo some audio and video post-processing. Once completed, it will be decoded by WebCodecs and rendered by JS. In actual use, audio and video encoding is implemented in the WebAssembly SDK.

WebAssembly Implementation

Tencent RTC's new SDK has been fully implemented on the web. It has been widely used in some industry users projects. In terms of memory usage, WebAssembly is similar to WebRTC, but WebAssembly has a lower CPU usage rate. Thus, WebAssembly has more flexible operability. In a scenario where two people enter a room, the encoding rate is 1Mbps, the frame rate is 30 frames, and RTT is 10ms, multiple screenshots are taken from collection to rendering, and the end-to-end latency is within 100ms. It can be seen that it is reliable to use WebAssembly for ultra-low latency communication.
From the initial technical exploration to implementation, the SDK has undergone many technical iterations. Initially, the SDK was single-threaded, but various problems were discovered in actual use, such as poor timer precision, high single-core usage, UI blocking the underlying layer, etc. Subsequently, we introduced Worker, where the main thread is only responsible for collection, rendering, etc., and everything else is handled by the Worker.

The UI collects user operation instructions and delivers them to the Worker thread through PostMessage. When the Worker collects data, it also responds to the main thread through PostMessage. The encapsulation and decapsulation of signals, streaming, status statistics, WebCodecs encoding and decoding, and WebAssembly SDK audio and video processing are all done by the Worker. In the current architecture, there are two Workers, one responsible for uplink and the other for downlink. Here we also introduced Worklet to reduce the copying of audio data and improve the transmission efficiency of audio data. In special cases, SharedArrayBuffer can be used to transfer video data to minimize the performance impact of video data.

The backend RTC service mainly uses a reused existing network architecture. The server side uses the BBR algorithm and more aggressive congestion control to achieve a lower latency weak network experience. At the same time, according to the packet loss and Jitter situation, the weak network strategy is adjusted appropriately. Finally, we also designed an adaptive FEC strategy based on network conditions.

How to solve problems encountered in the development of WebAssembly? The answer is debugging. WebAssembly debugging is very convenient, providing a visual interface.

The debugging program is developed in C++. During debugging, the plugin shown in the figure is first installed in the browser. After installation, some simple configurations are needed. After the configuration is completed, debugging can be started. After starting the application program, the wasm file and source file will be automatically loaded. The right picture takes opus encoding as an example. The left side is the source code column, which has a breakpoint. The middle is very detailed variable information, and the lower right corner is the stack call relationship. Like regular C++ programs, you need to add the -g option when compiling. If it's missing, it can't be debugged because the source code directory can't be found.

Issues and Prospects

The high degree of customization of WebAssembly is one of its major advantages. Custom audio and video encoding methods, custom encryption and decryption, support for national secret, custom 3A are all supported. Using WebAssembly for national secret support can improve performance by tens of times. Custom 3A's AI noise reduction has been put into production and actually landed, supporting the processing of more than 200 kinds of noise. QoS tuning can be customized or existing system QOS strategies can be reused. The simpler server logic allows for the reuse of backend service logic. WebAssembly has faster and safer network transmission, and WebTransport has better firewall penetration capabilities.

WebAssembly also has some problems. WebAssembly introduces three new modules: WebAssembly, WebCodecs, and WebTransport. WebAssembly has better complexity, increasing development difficulty and requiring more technical accumulation. WebTransport cannot run in Safari browser, WebCodecs can currently only run in Chrome and Edge94 and above and the latest Safari version, and WebTransport can only run in Chrome and Egde97 and above versions, all of which bring certain compatibility problems. In addition, WebTransport's uplink congestion control algorithm does not currently support adjustment. Here we have considered solving the uplink congestion control through negotiation, but when the browser acts as a client, it will directly ignore the negotiation results, so here we can only wait for official support to be implemented.

Conclusion

In the future, we hope for more open web technologies. WebTransport will become more complete, providing more flexible congestion control algorithms, WebGPU will open up hardware capabilities, and WebAssembly's SIMD support will improve. At the same time, more complex application scenarios and higher levels of customization are also part of the future goals. Cloud gaming, custom encryption and decryption, remote desktops, spatial audio, audio and video pre- and post-processing, and more scenarios can be customized.

If you have any questions or need assistance, our support team is always ready to help. Please feel free to Contact Us or join us in Discord .

Tech

Decoding Video Frames: From Basics to Advanced Insights

Tencent RTC DevSpt 10, 2024

Tech

What is WebRTC Insertable Stream: Comprehensive Guide to WebRTC Encoded Transform

Tencent RTC DevDec 11, 2024

Tech

Definition & Meaning of Simulcast: Bridging Audiences Across Platforms Seamlessly

Tencent RTC DevSpt 9, 2024

Building a Web-based Audio and Video Call Engine with WebAssembly

Background

WebAssembly Engine

WebAssembly Implementation

Issues and Prospects

Conclusion

You might also like