Game Voice Evolution: RTC SDK, In-Game Voice, and Immersive Audio

Peng Gao-Solutions Architect
Jan 8, 2024

1. Unveiling the Evolution of Game Voice

Social networking now is an essential aspect of gaming experiences. On one hand, gaming is inherently a social activity that offers a range of topics and scenarios for social interaction. On the other hand, social networking fulfills our human nature and improves the gaming experiences, which can significantly increase player retention.

As an important feature of social networking, game voice has drawn growing attention from game developers and gained considerable popularity among players. There are myriads of game voice applications and solutions in the market, which provide a wide variety of features. Below is an overview of the evolution of game voice.

Game voice tools have evolved with the development of the internet. The last 20+ years have witnessed huge leaps in game voice technology:

  • from support for a single platform to cross-platform interoperability.
  • from one-to-one chat to interactive voice chat in a room with tens of thousands of online users.
  • from third-party voice communication SaaS tools to PaaS SDKs.
  • from monotonous voice chat to immersive voice experiences.

Instead of following a chronological order or revisiting the evolution of their features, this article looks at the development of game voice tools from the perspective of game voice experiences.

The most basic experience brought by game voice is that players communicate with each other through voice chat. Gaming creates a virtual world where the dialog between players is actually that between game characters. In tandem with the improvements in gameplay design and image quality, players have higher expectations of voice chat, and the voice experience similar to a conference call can no longer meet their demand. In response, a disruptive technical solution has emerged, which is called "Immersive Voice Solution".

Game voice technology has gone through several stages, starting from the most basic voice chat to immersive voice experiences and beyond. As breakthroughs in sensors, computing power, audio algorithms, IoT, and other technologies are on the horizon, all-real voice will eventually become a reality, delivering the ultimate voice experience the metaverse demands.

2. Version Iterations

/ 2.1 Game voice v1.0: Third-party voice chat tools

At this stage, players use third-party voice chat tools to communicate with each other in the process of gaming. Whether the game itself offers a voice communication feature or not, using third-party tools allows players to quickly create chat channels and communicate with each other through voice chat.

/ 2.2 Game voice v2.0: In-game voice

In-game voice solutions mainly take the form of game developers connecting SDKs developed by voice communication PaaS providers. The basic APIs that come with the SDKs are used to implement various in-game voice scenarios:

  • channel voice between teammates: teammates can have a voice chat at any position coordinates in the game.
  • range voice between different teams: players of different teams can hear each other only when their position coordinates in the game are within a specified range.
  • blocklist/allowlist.

Unlike third-party voice chat tools, third-party voice solutions require game developers to integrate and design specific voice scenarios. For players, the establishment of voice channels, audio attributes, and the features will be automatically and adaptively adjusted based on the game progress and specific scenarios, thus taking the game voice experience to the next level. Some voice SDKs. Tencent RTC's gaming voice solution (GME, Game Multimedia Engine) provides a broad suite of APIs to meet other game voice needs than voice chat:

  • voice messaging
  • speech recognition
  • accompaniment playback

Based on these APIs, game developers can design more powerful social networking features for games.

In-game voice solutions have preliminarily integrated voice and gaming business scenarios, yet basically at the level of feature integration. As a result, standalone voice SDKs can only give players a game voice experience similar to a conference call, although players don't have to establish a chat channel themselves, or run any resource-consuming third-party software.

/ 2.3 Game voice v2.5: Upgraded version of in-game voice

To further improve players' game voice experiences, voice SDKs like TRTC offer voice processing capabilities such as voice changing and virtual 3D sound field. With these features, players can change their voice in real time based on their selected voice type, which adds fun to gaming and allows a vast design space for game voice features.

In FPS, battle royale, or VR games, players need to accurately identify the positions and directions of other players, which is vital to player communication about tips/tactics and an immersive gaming experience. The 3D audio effects technology that comes with voice SDKs offers an effective solution. The positions perceived by players through voice correspond to the relative positions of their characters in the game. As the relative distance between the characters changes, the voice intensity will increase or decrease accordingly, which well simulates real-world dialog scenarios.

Through the 3D audio effects technology, voice processing and gaming scenarios are combined, which, however, are limited to position and distance information in gaming scenarios. For a truly immersive experience, voice processing should cover all aspects of gaming scenarios. A voice SDK is unlikely to provide a dedicated API for every potential factor; otherwise, the SDK would be extremely complicated and bulky, and that's not really necessary. To take the game voice experiences up a notch, we need a new solution, namely the "Immersive Game Voice Solution".

/ 2.4 Game voice v3.0: Immersive voice

An immersive voice solution means that players' voice effects are rendered in real time based entirely on the game process. All players' voices are processed through digital signal processing (DSP) algorithms, and then played back in the headphones to simulate voice communication in real-world settings. Voice chat processed in this way can deliver a more immersive game voice experience, allowing players to communicate in a natural way.

Then, how is an immersive voice solution implemented? As mentioned above, it is not advisable to have a single voice SDK packed with all sorts of APIs. Moreover, voice service providers are generally not experts in audio processing algorithms compared with specialist audio technology companies. Therefore, to develop an all-encompassing voice SDK is virtually unviable.

In view of this, a combination approach will work best, just as with the Wwise + GME solution. GME is dedicated to end-to-end real-time voice communication, and the Wwise interactive audio engine is adopted by many game developers as a tool for game sound design. The Wwise plugin acts as a bridge for data interactions between GME and the Wwise engine, and GME voice streams are seamlessly connected to the Wwise audio pipeline, so Wwise's rich sound effects processing and control features can be used in voice chat. Such a design makes it possible to deliver an immersive game voice experience.

As an interactive audio authoring tool, Wwise is generally used to create high-quality audio content for games, and GME complements Wwise in the field of game voice. Now sound engineers can also use Wwise to create immersive and interesting voice features, opening up new gameplay possibilities.

/ 2.5 Game voice v4.0: All-real voice

Immersive voice, however, is definitely not the acme of game voice experiences – all-real voice takes it further.

With the advances in AR, VR, and MR technologies, the metaverse has become a hot topic. Many technology giants are expanding into the metaverse, which is considered the next biggest opportunity in the realm of the internet in the coming decade. The metaverse refers to a parallel virtual world that is both independent of and interconnected with the real world, where people can interact, work, and do much more realistically.

To make virtual worlds more lifelike, software and hardware technologies need to be integrated to simulate human senses. As voice communication is an important form of human interaction, metaverse scenarios have higher requirements for voice, that is, all-real voice. Currently, the metaverse is still more of a concept than reality, and we'll see what the future holds.

Gaming is inherently a social activity in the internet age. Although voice chat is not a core feature for most game genres, it makes gaming more enjoyable and thus increases player retention. Therefore, it has become a common feature of online games.

Game voice technology has evolved in response to players' growing demand for better experiences and gameplay. As players have higher expectations of gaming experiences, voice is bound to hold greater weight in gaming.