File Packaging and Unpackaging in OTT and RTC: Understanding MP4, FLV, and TS Containers
In the world of Over-the-Top (OTT) and Real-Time Communication (RTC) technologies, file packaging and unpackaging play crucial roles in managing and transmitting audio-visual content. This blog post will delve into the concepts of packaging and unpackaging, with a focus on three popular container formats: MP4, FLV, and TS.
The Need for Packaging
After audio and video data pass through their respective encoders, we typically end up with two separate data streams: an AAC stream for audio and an H.264 stream for video. These streams are separate because they originate from different sources (microphone for audio, camera for video) and are processed by different components.
However, in practical applications, audio and video streams are rarely used independently. To synchronize sound and visuals from the same scene, we need to package audio and video data together into a single file.
What is Packaging?
Packaging, also known as multiplexing or muxing, is the process of combining multiple data streams into a single container file. The main purposes of packaging are:
- To integrate information from the same scene.
- To facilitate audio-video synchronization and unified configuration.
The resulting packaged file is called a container. Common container formats include MP4, FLV, TS, AVI, and MKV.
Metadata: The Key to Efficient Playback
Containers are like boxes of varying sizes, with some supporting a wide range of codec formats (like MKV) and others supporting fewer (like AVI). To enable efficient playback, containers include metadata - information about the packaged media.
Metadata typically includes:
- Codec information for audio and video
- Resolution
- Frame rate
- Bit rate
- Sample rate
- Bit depth
- Number of audio channels
When a media player starts playing a file, it first reads the metadata to determine which codecs to use and how much buffer space to allocate before beginning playback.
For online streaming with features like "play while downloading," it's crucial to have the metadata at the beginning of the file. Some packaging formats place metadata at the end, which requires either transcoding to move the metadata or downloading the entire file before playback can begin.
Popular Container Formats
Let's explore three popular container formats used in OTT and RTC systems:
1. MP4 (MPEG-4 Part 14)
MP4 is a versatile media file structure standard that can embed various types of data. Most MP4 files contain H.264 or MPEG-4 encoded video and AAC encoded audio.
MP4 files are structured as a series of nested "boxes." Here are some common box types:
Box Type | Description |
ftyp | File type, indicates the file format |
moov | Metadata container, stores file metadata |
mvhd | Movie header, contains file header information |
trak | Track container, holds audio/video track information |
tkhd | Track header, contains track header information |
mdia | Media information |
mdhd | Media header, defines media header information |
hdlr | Handler, specifies track type (video/audio/hint) |
minf | Media information, contains media data |
stbl | Sample table, stores sample mapping information |
mdat | Media data container, holds the actual media data |
2. FLV (Flash Video)
FLV is a popular web video container format developed by Adobe. Its simple structure and easy decoding make it ideal for online video, especially when combined with Adobe's RTMP protocol.
FLV files consist of an FLV header and FLV body:
FLV header | Tag0 | Tag0 data | Tag1 | Tag1 data | ... | TagN | TagN data
The FLV body contains multiple tags, which can be audio, video, or script (for keywords or file information).
FLV Header structure:
Field | Description |
Signature(3B) | Always "FLV" (0x46 0x4C 0x56) |
Version(1B) | Version number, usually 0x01 |
Flags(1B) | Bits 5-7: 0, Bit 6: Audio tag present, Bit 7: Video tag present |
Header size(4B) | Size of FLV header, usually 9 |
FLV Body structure (for each tag):
Field | Description |
Previous Tag Size | Size of the previous tag |
Tag Type | Audio (0x08), Video (0x09), or Script (0x12) |
Data Size | Size of the tag's data portion |
Timestamp | Timestamp of the tag |
StreamID | Stream identifier |
Tag Data | The actual media data |
3. TS (Transport Stream)
TS, defined by the MPEG-2 standard, is commonly used in digital video broadcasting and in the HTTP Live Streaming (HLS) protocol.
The TS packaging process:
- Elementary Streams (ES) - raw encoded audio or video data
- Packetized Elementary Streams (PES) - ES data packaged into PES packets
- Transport Streams (TS) - PES packets multiplexed into TS packets
TS packets have a fixed size of 188 bytes, consisting of a header and payload.
Conclusion
Understanding file packaging and unpackaging is crucial for anyone working with OTT and RTC technologies. Each container format has its strengths and is suited for different use cases:
- MP4 is widely used for internet video-on-demand services
- FLV is popular for both live streaming and video-on-demand
- TS is commonly used in HLS for HTML5 video delivery
As the field of digital media continues to evolve, staying updated with these container formats and their applications will be essential for developing efficient and high-quality audio-video streaming solutions.