Back to Learning

Fundamental Processes and Workflows of OTT and RTC

Tencent RTC-Dev Team
Spt 26, 2024

OTT (Over-the-Top) and RTC (Real-Time Communication) technologies have revolutionized how we consume media and communicate online. These technologies encompass a wide range of processes, from data capture to rendering. In this blog post, we'll explore the basic workflow of OTT and RTC systems, focusing on two primary operations: pushing and pulling streams.

Understanding the Workflow

The workflow for OTT and RTC technologies can be broadly categorized into two main processes:

  1. Pushing Stream: This involves sending data from the local end to a remote destination.
  2. Pulling Stream: This involves retrieving data from a remote source to the local end.

Let's delve into each of these processes in detail.

Pushing Stream Process

The pushing stream process is crucial for live streaming, video conferencing, and other real-time communication scenarios. Here's a breakdown of the steps involved:

OTT and RTC Workflow
- Video capture and coding
- Audio acquisition and coding
- Pre-processing
- Data packaging
- Output formats: HTTP, RTMP, file
OTT and RTC Workflow
- Video capture and coding
- Audio acquisition and coding
- Pre-processing
- Data packaging
- Output formats: HTTP, RTMP, file

1. Data Capture

  • Video Capture: Using a camera, the system captures raw video data.
  • Audio Capture: A microphone captures raw audio data.

2. Preprocessing

  • Both video and audio data undergo initial processing to optimize quality and reduce unnecessary information.

3. Encoding

  • Video Encoding: Raw video data is compressed into a more efficient format (e.g., H.264, H.265).
  • Audio Encoding: Raw audio data is compressed (e.g., AAC, Opus).

4. Packetization

  • Encoded video and audio data are packaged together, often using container formats like MP4 or FLV.

5. Transmission

The packaged data can be transmitted through various protocols:

  • HTTP: For progressive download or adaptive streaming.
  • RTMP: Commonly used for live streaming.
  • File: For local storage.
  • UDP: Used in RTC for low-latency transmission.

Pulling Stream Process

The pulling stream process is essential for playback in both OTT and RTC scenarios. Here's how it works:

Streaming pull process flowchart illustrating the journey of video and audio data from input to rendering on a monitor or speaker. Key components include:

1. **Input Sources**: File download, Live streaming pull, Local files, Low delay transmission.
2. **Input Protocols**: http://, RTMP://, file://, UDP.
3. **Audio and Video Separation**: Separates video and audio coding data.
4. **Video Coding Data**: Passes through video decoding, color data, and post-processing steps before video rendering on the monitor.
5. **Audio Coding Data**: Passes through audio decoding, waveform data, and post-processing steps before audio rendering on the speaker.
6. **Output Devices**: Monitor and Speaker.

This process ensures efficient and high-quality streaming of audio and video content.
Streaming pull process flowchart illustrating the journey of video and audio data from input to rendering on a monitor or speaker. Key components include:

1. **Input Sources**: File download, Live streaming pull, Local files, Low delay transmission.
2. **Input Protocols**: http://, RTMP://, file://, UDP.
3. **Audio and Video Separation**: Separates video and audio coding data.
4. **Video Coding Data**: Passes through video decoding, color data, and post-processing steps before video rendering on the monitor.
5. **Audio Coding Data**: Passes through audio decoding, waveform data, and post-processing steps before audio rendering on the speaker.
6. **Output Devices**: Monitor and Speaker.

This process ensures efficient and high-quality streaming of audio and video content.

1. Input

Data can be received from various sources:

  • File download (HTTP)
  • Live stream (RTMP)
  • Local file
  • Low-latency transmission (UDP)

2. Demuxing

  • The received data is separated into video and audio streams.

3. Decoding

  • Video Decoding: Compressed video data is decompressed back into raw format.
  • Audio Decoding: Compressed audio data is decompressed.

4. Post-processing

  • Both video and audio data may undergo additional processing for enhancement or effects.

5. Rendering

  • Video Rendering: Processed video data is displayed on the screen.
  • Audio Playback: Processed audio data is played through speakers or headphones.

Key Differences in OTT and RTC

While the basic workflow is similar, OTT and RTC have some key differences:

Latency Requirements:

  • OTT: Can tolerate higher latency (seconds to minutes).
  • RTC: Requires ultra-low latency (typically less than 1 second).

Scalability:

  • OTT: Designed for large-scale distribution, often using CDNs.
  • RTC: Focused on small to medium-sized groups, using peer-to-peer or server-mediated connections.

Protocols:

  • OTT: Often uses HTTP-based protocols (HLS, DASH).
  • RTC: Typically uses UDP-based protocols for lower latency.

Interactivity:

  • OTT: Generally one-way communication.
  • RTC: Supports real-time, two-way communication.

Conclusion

Understanding the fundamental processes of OTT and RTC technologies is crucial for developers working in the digital media and communication space. While the basic workflow of pushing and pulling streams remains consistent, the specific implementations can vary greatly depending on the use case, be it video-on-demand, live streaming, or video conferencing.

As these technologies continue to evolve, we're likely to see further optimizations in areas like encoding efficiency, network adaptation, and latency reduction. Developers should stay abreast of these advancements to create cutting-edge applications that provide seamless, high-quality user experiences in the ever-expanding world of digital media and communication.