Enhancing Gaming Experience with 3D Voice: How to Implement Spatial Voice in Games

Peng Gao-Solutions Architect
Jan 9, 2024

Enhancing Gaming Experience with 3D Voice: How to Implement Spatial Voice in Games.png
Ambient sounds in most games such as gunshots and footsteps come with 3D sound effects. However, in a voice chat, players may find that the voice of their teammates, no matter where they are, gives no sense of direction. We call this 3D voice.

3D voice provides more auditory information for players to help them identify the positions of their teammates/enemies through voice and feel their presence much like in the physical world. This makes the gaming experience more convenient and fun.

Many game developers may ask: How does 3D voice work? How do I add it to my games? Below is a quick guide to 3D voice technology.

Tencent RTC helps games build 3D voice chat in games

Tencent RTC's Game Multimedia Engine is a one-stop voice communication service for creating immersive multiplayer gaming experience and boosting player engagement.

3D spatial voice and proximity voice features can help players to perceive the positions of other players in the game world which allows people to communicate as naturally as they communicate in the physical world.

How do determine sound source positions?

We can determine the position of a sound source mainly because the sound reaches the left and right ears at different times, and the strengths and other metrics are different, too. Specifically, we identify the horizontal position based on the differences in time, sound level, and timbre between binaural signals. The auricle acts as a comb filter to help identify the vertical position of a compound sound source. Sound localization also depends on such factors as sound level, spectrum, and personal experience.

How are the voice positions of players simulated?

A head-related transfer function (HRTF) is needed to do so. It can be regarded as a comprehensive filtering process where sound signals travel from the sound source to both ears. The process includes air filtering, reverb in the ambient environment, scattering and reflection on the human body (such as torso, head, and auricle), etc.

The implementation of the real-time 3D virtualization feature for voice is not merely about calling the HRTF. It also entails mapping the virtual space in the game to the real-life environment and performing high-frequency operations. The implementation process is summarized as follows. Assume there are N players connecting to the mic in a game. Given the high requirements for real-timeness in gaming, each player's terminal should receive at least (N-1) packets containing voice information and relative position information within a unit time of 20 ms in order to ensure a smooth gaming experience. Based on the relative position information, the high-precision HRTF model in the 3D audio algorithm is used to process the voice information, coupled with the information about presence of obstacles in the way, ambient sounds in the game (such as the sound of running water and echo in a room), etc. In this way, realistic real-time 3D sound is rendered on the players' devices.

The entire process is compute-intensive, and some low/mid-end devices may be unable to handle it. How to minimize resource usage on the players' devices while ensuring a smooth gaming experience remains an industry challenge. In addition, some HRTF libraries can result in serious attenuation for some frequencies in audio signals, most notably the musical instrument sounds with diverse frequency components. This not only affects the accuracy of sound localization but also dulls the instrument sounds in the output ambient sounds.

How does Game Multimedia Engine (GME) work?

Game Multimedia Engine (GME) launched the 3D voice feature in partnership with Tencent Ethereal Audio Lab, a top-notch audio technology team. Through the high-precision HRTF model and the distance attenuation model, the feature gives players a highly immersive gaming experience in the virtual world.

Thanks to optimized terminal rendering algorithms, the computing efficiency increases by nearly 50%, and the real-time spatial rendering time of a single sound source is around 0.5 ms, so that most low/mid-end devices can sustain real-time 3D sound rendering. To address the problem of signal attenuation in the rendering process, GME improves the 3D rendering effect through its proprietary audio signal equalization techniques, making ambient sounds crystal clear.

This document describes how to integrate with GME APIs for 3D sound effects.

Conclusion

In FPS, battle royale, or VR games, players need to accurately identify the positions and directions of other players, which is vital to player communication about tips/tactics and an immersive gaming experience. The 3D virtualization technology that comes with voice SDKs offers an effective solution. The positions perceived by players through voice correspond to the relative positions of their characters in the game. As the relative distance between the characters changes, the voice intensity will increase or decrease accordingly, which well simulates real-world dialog scenarios.

If you want to learn more information, please visit our official website: Tencent RTC and explore freely.

If you have any questions or need assistance, our support team is always ready to help. Please feel free to contact us, or join us on Discord.