File Packaging and Unpackaging in OTT and RTC: Understanding MP4, FLV, and TS Containers

-
Spt 26, 2024

In the world of Over-the-Top (OTT) and Real-Time Communication (RTC) technologies, file packaging and unpackaging play crucial roles in managing and transmitting audio-visual content. This blog post will delve into the concepts of packaging and unpackaging, with a focus on three popular container formats: MP4, FLV, and TS.

The Need for Packaging

After audio and video data pass through their respective encoders, we typically end up with two separate data streams: an AAC stream for audio and an H.264 stream for video. These streams are separate because they originate from different sources (microphone for audio, camera for video) and are processed by different components.

However, in practical applications, audio and video streams are rarely used independently. To synchronize sound and visuals from the same scene, we need to package audio and video data together into a single file.

What is Packaging?

Packaging, also known as multiplexing or muxing, is the process of combining multiple data streams into a single container file. The main purposes of packaging are:

  1. To integrate information from the same scene.
  2. To facilitate audio-video synchronization and unified configuration.

The resulting packaged file is called a container. Common container formats include MP4, FLV, TS, AVI, and MKV.

Metadata: The Key to Efficient Playback

Containers are like boxes of varying sizes, with some supporting a wide range of codec formats (like MKV) and others supporting fewer (like AVI). To enable efficient playback, containers include metadata - information about the packaged media.

Metadata typically includes:

  • Codec information for audio and video
  • Resolution
  • Frame rate
  • Bit rate
  • Sample rate
  • Bit depth
  • Number of audio channels

When a media player starts playing a file, it first reads the metadata to determine which codecs to use and how much buffer space to allocate before beginning playback.

For online streaming with features like "play while downloading," it's crucial to have the metadata at the beginning of the file. Some packaging formats place metadata at the end, which requires either transcoding to move the metadata or downloading the entire file before playback can begin.

Popular Container Formats

Let's explore three popular container formats used in OTT and RTC systems:

1. MP4 (MPEG-4 Part 14)

MP4 is a versatile media file structure standard that can embed various types of data. Most MP4 files contain H.264 or MPEG-4 encoded video and AAC encoded audio.

MP4 files are structured as a series of nested "boxes." Here are some common box types:

Box TypeDescription
ftypFile type, indicates the file format
moovMetadata container, stores file metadata
mvhdMovie header, contains file header information
trakTrack container, holds audio/video track information
tkhdTrack header, contains track header information
mdiaMedia information
mdhdMedia header, defines media header information
hdlrHandler, specifies track type (video/audio/hint)
minfMedia information, contains media data
stblSample table, stores sample mapping information
mdatMedia data container, holds the actual media data

2. FLV (Flash Video)

FLV is a popular web video container format developed by Adobe. Its simple structure and easy decoding make it ideal for online video, especially when combined with Adobe's RTMP protocol.

FLV files consist of an FLV header and FLV body:

FLV header | Tag0 | Tag0 data | Tag1 | Tag1 data | ... | TagN | TagN data

The FLV body contains multiple tags, which can be audio, video, or script (for keywords or file information).

FLV Header structure:

FieldDescription
Signature(3B)Always "FLV" (0x46 0x4C 0x56)
Version(1B)Version number, usually 0x01
Flags(1B)Bits 5-7: 0, Bit 6: Audio tag present, Bit 7: Video tag present
Header size(4B)Size of FLV header, usually 9

FLV Body structure (for each tag):

FieldDescription
Previous Tag SizeSize of the previous tag
Tag TypeAudio (0x08), Video (0x09), or Script (0x12)
Data SizeSize of the tag's data portion
TimestampTimestamp of the tag
StreamIDStream identifier
Tag DataThe actual media data

3. TS (Transport Stream)

TS, defined by the MPEG-2 standard, is commonly used in digital video broadcasting and in the HTTP Live Streaming (HLS) protocol.

The TS packaging process:

  1. Elementary Streams (ES) - raw encoded audio or video data
  2. Packetized Elementary Streams (PES) - ES data packaged into PES packets
  3. Transport Streams (TS) - PES packets multiplexed into TS packets

TS packets have a fixed size of 188 bytes, consisting of a header and payload.

Conclusion

Understanding file packaging and unpackaging is crucial for anyone working with OTT and RTC technologies. Each container format has its strengths and is suited for different use cases:

  • MP4 is widely used for internet video-on-demand services
  • FLV is popular for both live streaming and video-on-demand
  • TS is commonly used in HLS for HTML5 video delivery

As the field of digital media continues to evolve, staying updated with these container formats and their applications will be essential for developing efficient and high-quality audio-video streaming solutions.