Implementation Steps
Introduction
To implement a complete online Karaoke scenario, multiple functional modules are required, including room management, seat management, song selection management, and Karaoke management. The key actions and features of each functional module are shown in the table below. In the following sections, each functional module will be introduced in detail to provide a complete understanding of the required functions for building a Karaoke room.
Room Management | Seat Management | Song Selection Management | Karaoke Management |
Room List | Go on/off the seat | Song List Display | Karaoke Play Mode |
Create Room | Seat Control | Search for Songs | Song Switching |
Join Room | Lock the Seat | Song Selection | Vocal Volume Adjustment |
Leave Room | Take Seat | Song Top | Reverb/Sound Effects |
Destroy Room | Mute Seat | Selected Song List | Lyric Synchronization |
The room owner creates the Karaoke room, and users can choose to join the room they are interested in. After entering the room, users can go on the seat to participate in the interaction and have voice interaction with the room owner. Of course, users can also choose to go directly on the seat to participate in the chorus. These are two different Karaoke play modes. The overall business process of the online Karaoke scenario is shown in the figure below.
Room Management
Room management is mainly responsible for maintaining the room list. The main functions include creating a room, joining a room, destroying a room, and leaving a room. Moreover, Karaoke rooms are different from ordinary rooms and require a separate Karaoke room identifier to start related component management, such as song selection management and Karaoke management.
Create Room: After logging into the business system, users can create a room. After creating a room, the room list needs to be updated with the new room.
Destroy Room: After all users leave the room, the room needs to be destroyed. After destroying the room, the room list needs to be updated with the deletion of the room.
Note:
Room management is a necessary module for implementing online karaoke, but it is not the main functional module. The specific implementation can be combined with the business system and TRTC SDK, please refer to the voice chat room scene access solution for details.
Seat Management
The seats in the karaoke room are generally ordered and limited. Seat management is mainly responsible for defining the number of seats in the room and managing the status of all seats in the current room according to the business scenario. Seat management mainly includes the following functions: going on/off the seat, locking the seat, inviting to go on the seat, and muting the seat.
After entering the room, users can only apply to go on the seat for the seats that are in idle state.
After the host agrees to let the user go on the seat, the seat status needs to be changed to a non-idle state.
After the user stops streaming and goes off the seat, the seat status needs to be reset.
The host has the right to lock the seat, invite to go on the seat, force to go off the seat, and mute the seat.
Note:
Seat management is a necessary module for implementing online karaoke, but it is not the main functional module. The specific implementation can be combined with the business system and IM SDK, please refer to the voice chat room scene access solution for details.
Song selection management
Basic Introduction
Song selection management is an important part of the online karaoke scene, which mainly includes the following functions: song list display, song search, song selection and queue management, and list of selected songs. Moreover, each karaoke room needs to maintain a list of selected songs and an automatic queue management function, which requires the business backend to implement. Song list display and song search need to be combined with Yinsuda Authorized Music for Live Streaming to achieve.
Implementation Process
The entire song selection management mainly involves the business-side app, the business backend, and the Yinsuda backend, each with its own functions:
Business-side app:
Call the song selection API to report song information.
Call the song cutting API to notify the business backend to update the list of selected songs.
Call the singing confirmation API to notify the business backend.
Business backend:
Maintain the list of selected songs.
Send notifications to the business-side app to update the current list of selected songs.
Yinsuda backend:
Provide APIs to obtain the recommended song list and song list details for live interactive music Song List/Song List Details.
Provide an API to obtain the details of live interactive music Get Live Interactive Music Details (playToken, lyric download URL).
Provide an API to search for live interactive music Search Live Interactive Music.
Karaoke Management
The karaoke system mainly includes the following functions: singing gameplay, start/stop/song cutting, vocal volume adjustment, reverb/sound effects, and lyric synchronization. Below, we will introduce the implementation process of the karaoke management module in detail through two typical karaoke gameplay: solo singing and real-time chorus.
Solo Singing
This is mainly a multi-user interactive Karaoke scene. After the host goes on the seat, they can select songs for singing. Once the host successfully selects a song, all song selection information will be displayed on the song selection platform. The host can then choose to begin singing.
(1) Solution Architecture
The overall solution architecture mainly utilizes the VOD SDK to achieve song downloading, the VOD backend to obtain the playToken and lyric download address of the song, and the TRTC SDK to implement the singer's voice streaming, song playback, and streaming. The overall solution architecture is as follows:
(2) Specific Implementation
In the singing scenario, different roles have different implementation processes, which can be divided into two roles: singer and audience.
Role | Description | Differences |
Singer | The singer in the Karaoke room is evolved from the host who selects songs and sings after going on the seat. After leaving the room, the room is automatically dissolved and the list of selected songs is automatically cleared. | The role must be a host Upstream audio and video (no video upstream black frame) Play BGM Send SEI information (send lyric information) Song selection |
Audience | The audience in the Karaoke room plays the stream of the singer. | The role is an audience, but can also become a host by going on the seat Downstream audio and video streams Receive SEI information (receive lyric information) |
The basic implementation processes for different roles are as follows:
【Host】
The host creates and joins a TRTC room, automatically goes on the seat, and becomes a singer after selecting a song.
After selecting a song, the song/lyric is downloaded, and then the song is played through the BGM playback interface.
If the singer does not bring up the video upstream, they need to enable video upstream.
Synchronize the lyric progress of everyone through SEI information.
The singer can cut the song at any time during the singing process, and then download and sing the song/lyric again after the download is complete.
After the host leaves the room, the TRTC room will be dissolved.
【Audience】
The audience joins the TRTC room.
Listen for changes in the room's song and load the lyrics.
Pull the stream of the singer.
Parse the SEI information sent by the singer and synchronize the lyrics.
The main task is to listen for the SEI information of the song and update the corresponding song control.
(3)API call sequence
The API calls for different roles are sequenced as follows:
Host | Audience |
| |
Note:
Given the technical threshold required for the above implementation solution, TRTC provides an open-source audio and video UI component called TUIKaraoke on its official website. By integrating the TUIKaraoke component into your project, you can add online karaoke scenes to your application with just a few lines of code, and experience TRTC's related capabilities in Karaoke scenarios, such as karaoke, seat management, gift giving and receiving, text chat, and more.
Real-time Chorus
Real-time chorus refers to playing songs simultaneously on various ends while connected, and then singing together on the seat. In multi-user mode, the singers can hear each other's voices almost without delay, achieving true real-time chorus.
(1) Solution Architecture
In terms of media streams, the singers push and pull streams to each other, and one lead singer pushes out the music, while other singers play the music locally, with time synchronization through NTP. In addition, the song and the voices of all singers are mixed and processed into one stream by the mixing robot, and then pushed back to the TRTC room. The audience only needs to pull one stream to hear the synchronized voices from all ends, perfectly achieving the effect of multi-person chorus. The solution architecture for real-time chorus is shown in the following figure.
The advantages of this solution are:
It reduces end-to-end latency.
It provides a solution for users to join the chorus midway.
It accurately synchronizes music, lyrics, and vocals between different ends.
It improves the performance of devices on different ends and the accuracy of local time, and reduces the impact of network environment latency.
Note:
Depending on business needs, you can choose a real-time chorus solution for either pure audio or audio and video scenarios. If it is a pure audio scenario, black frames need to be added to send SEI messages for lyric synchronization.
The lead singer needs to use a sub-instance to upstream both the music and vocals at the same time; other singers only need to pull each other's vocal streams and play the music locally; the audience only needs to pull one mixed stream.
The figure shows the RTC viewing solution, where the mixing robot pushes the mixed stream back to the RTC room; in the CDN viewing solution, the mixing robot pushes the mixed stream to the live CDN, and the audience pulls the CDN stream to watch.
(2) Specific Implementation
We can divide the users in the online karaoke room into three roles: lead singer, chorus, and audience, as shown in the table below.
Role | Description | Differences |
Lead Singer | The lead singer is responsible for selecting songs, sending chorus signals, and sending SEI messages. | The role must be an Anchor Upstream music and vocals Song selection and initiating chorus Pushing back mixed stream Sending SEI messages |
Chorus | The chorus can receive and process chorus signals, and participate in the chorus on the seat. | The role must be an Anchor Upstream vocals Play music locally Receive chorus signals |
Audience | After entering the karaoke room, the audience can pull the stream from the seat and also participate in the chorus on the seat. | The role must be an Audience Downstream mixed stream Receive SEI messages Apply to become an Anchor to go on the seat |
The basic implementation processes for different roles are shown in the following figure:
【Lead Singer】
The lead singer needs to select a song and send chorus signals.
The lead singer creates a sub-instance to push vocals and music, and pulls the vocals of other singers.
After pushing the stream, the lead singer is responsible for initiating the mixed stream push task.
After starting the performance, play the music and synchronize the lyrics through the playback progress callback.
SEI messages need to be sent to synchronize the song progress on the audience end.
All singers need to calibrate the local song playback progress according to NTP.
【Chorus】
The chorus pushes one vocal stream and pulls the vocal stream of the user on the seat.
The chorus needs to listen for and receive chorus signals, and pre-load music resources.
After starting the performance, play the music locally, and the chorus synchronizes the lyrics through the playback progress callback.
All singers need to calibrate the local song playback progress according to NTP.
【Audience】
Pull the mixed stream to listen to the chorus.
Parse the song progress information in the SEI of the mixed stream for lyric synchronization.
After going on the seat, stop pulling the mixed stream, switch to pulling the vocal stream on the seat, and start the chorus mode.
(3)API call sequence
The sequence of API calls for different roles is as follows:
Lead singer API sequence | Chorus API sequence | Audience API sequence |
| | |
Note:
Considering the technical expertise required for the above implementation, TRTC's official website provides an open-source audio and video UI component called TUIKaraoke, which can be integrated into your project. With just a few lines of code, you can add real-time karaoke scenes to your application and experience TRTC's related capabilities for KTV scenarios, such as singing, seat management, gift exchange, text chat, and more.