All Blog

Zero-Latency Remote Driving for Robotaxis: Paving the Way for Open Road Commercialization

3 min read

Jan 16, 2025

The global commercialization of autonomous driving is rapidly advancing. While continuously innovating in autonomous driving technology, automotive companies are also exploring new approaches to further enhance safety. One such approach involves equipping operational autonomous vehicles (Robotaxis) with remote safety operators.

When self-driving systems encounter difficult situations, the vehicle slows down and transfers control to a remote operator. The vehicle then operates at low speeds until the operator guides it out of the situation.

Crucially, a stable video feed with minimal delay is essential for the operator to react quickly. Any lag between the vehicle's camera and the operator's screen can lead to dangerous situations. Unreliable network connections further complicate matters by causing video interruptions, making it difficult for the operator to respond effectively.

Tencent RTC Teleoperation Solution leverages over two decades of technical expertise and the refinement of popular products like Tencent Meeting and WeChat Channels. Built upon Tencent Cloud's real-time audio and video technology, the solution prioritizes real-time performance and stability, optimized for Robotaxi commercialization scenarios.

●Ultra-Low Latency: Guarantees end-to-end video latency of just 200ms, enabling real-time decision-making for remote safety operators.

●Exceptional Stability: Achieves a video freeze rate of less than 0.1% within 150ms, virtually eliminating perceptible stuttering.

●Consistent Performance: Maintains a 99th percentile video delay of under 200ms, even in challenging conditions with signal blind spots and low bandwidth.

●Safety Focused: Provides a stable and clear view of the vehicle's surroundings, accelerating the safe commercialization of autonomous driving technology.

End-to-End Optimization for Ultra-Low Latency

To minimize latency, we've implemented end-to-end optimization across camera capture, encoding, transmission, decoding, and rendering. This results in approximately 100ms lower latency compared to CPU-based RTC solutions.

Accelerated Capture, Image Transformation, and Encoding

Leveraging the industry-leading NVIDIA autonomous driving chip platform, Tencent Cloud's remote control solution optimizes the entire sending pipeline for minimal latency. Data processing occurs entirely within the chip's memory, bypassing the CPU memory.

1.Camera Capture: We utilize DMA instead of the common MMAP method for camera data acquisition, eliminating one CPU copy and directly transferring data to the chip's physical memory via handles.

2.Image Transformation and Encoding: The encoder's input queue is repurposed as a processing buffer, enabling a cyclical approach for image transformation and encoding while preventing blocking delays.

3.Encoded Output: Encoded data is exported to CPU memory via MMAP, ensuring data within the pre-encoding stage remains off the CPU memory, reducing both latency and CPU usage.

Send/Receive Buffer Optimization

To minimize latency, Tencent Cloud's remote control solution optimizes buffering. It leverages the sending module's network buffer for smoother transmission instead of application-layer pacing. A dynamic buffer adjustment algorithm is implemented in the jitter buffer, emphasizing faster network estimation and smoother adaptation to maintain fluidity. This keeps frame interval fluctuations within 15ms.

Decoding, Rendering, and Super-Resolution Acceleration

A one-stage rendering pipeline is employed for image format conversion and super-resolution upsampling, significantly reducing latency associated with these processes.

QoSMulti-video Stream Joint Scheduling & Real-time Super-Resolution

Complex public road environments and network fluctuations often lead to coverage blind spots and unstable video quality, hindering the commercial deployment of remote driving. To address this, Tencent RTC Teleoperation Solution incorporates multi-video stream QoS priority scheduling and dynamic real-time super-resolution, in addition to traditional network optimization techniques like adaptive audio and video bitrate and HARQ. This ensures smooth and high-quality transmission of primary video streams even under low bandwidth conditions by adjusting resolution and priority.

Multi-Stream Joint Bandwidth Estimation

This approach independently feeds back the receiving interval and packet loss of each video stream for joint bandwidth estimation at the sender. Compared to aggregating all video streams for feedback, independent feedback and joint estimation enable faster response to network changes and more accurate bandwidth estimation, leveraging the temporal and size independence of each video stream.

Joint Bitrate and Resolution Adjustment

Due to hardware encoder limitations, encoding video at a specific resolution often involves a constrained bitrate range. Setting the bitrate below this range can lead to inaccurate encoding or significant quality degradation. To ensure accurate bitrate control and prevent quality degradation, a dynamic joint bitrate and resolution adjustment strategy is implemented. The encoding resolution is dynamically adjusted based on the variation in the quantization parameter (QP) range of the encoded output.

Priority-Based Sending Scheduling

A proportional fairness-based sending priority scheduling strategy is implemented to prioritize the transmission of primary video streams, allowing them to acquire more bandwidth. Additionally, a backpressure mechanism is introduced. When secondary video stream transmission is subjected to backpressure due to data accumulation in the corresponding queue, the bitrate of the secondary video stream is further reduced.

Real-time Super-Resolution Algorithm

To mitigate the image quality degradation caused by dynamic resolution reduction, a joint optimization scheme of sender-side downsampling and receiver-side super-resolution is introduced. Considering the importance of latency, hardware acceleration is employed for downsampling, while shader rendering is utilized for super-resolution, achieving a latency increase of less than 5ms. Both downsampling and super-resolution algorithms are trained with PSNR as the target, achieving PSNR > 32dB compared to the original image at 2X downsampling and PSNR > 35dB at 1.5X downsampling.

Multi-path network transmission for seamless video streaming

Open road driving often encounters weak network coverage areas, posing challenges for stable video transmission. To enhance video continuity and stability, Tencent RTC Teleoperation Solution employs multi-path network transmission, simultaneously streaming multiple video streams across different carrier networks. This enables real-time network switching in low-bandwidth scenarios, ensuring uninterrupted video transmission.

To minimize switching latency and boost transmission efficiency even in dual-weak network conditions, a joint transmission approach is implemented. This involves forward error correction (FEC) joint encoding for both primary and secondary channels of the multiple networks. By dynamically adjusting FEC redundancy and puncturing rates, seamless network switching with near-zero lag is achieved. This technology also enables robust video transmission in challenging dual-weak network environments by leveraging the combined bandwidth of both networks.

Leveraging Tencent's 21 years of experience in network and audio/video technologies, Tencent RTC mainly offers multi-person audio/video calls, and low-latency interactive live streaming solutions.