[Blog] Behind the scenes: MP4 live
Hello streaming media enthusiasts! It's Jaron again, back with my first proper blog post after the introduction I posted earlier this year. As mentioned by Carina in the previous post, I'll be explaining the background of MP4 live streaming in this post.
What is MP4?
MP4 is short for MPEG-4 Part 14. It's a media container standard developed by the International Organization for Standardization, and is commonly recognized by most of the world as "a video file" these days. MP4 is based on Apple's QuickTime file format as published in 2001, and they are effectively (almost) the same thing. As a container, MP4 files can theoretically contain all kinds of data: audio, video, subtitles, metadata, etcetera.
MP4 has become the de-facto standard for video files these days. It uses a mandatory index, which is usually placed at the end of the file (since logically, only after writing the entire file the index can be generated).
A file where this index is moved to the beginning of the file - so it is available at the start of a download and playback can begin without receiving the entire file first - is referred to as a "fast start" MP4 file. Since MistServer generates all its outputs on the fly, our generated MP4 files are always such "fast start" files, even if the input file was not.
The impossible index
Such a mandatory index poses a challenge for live streams. After all: live streams have no beginning or end, and are theoretically infinite in duration. It is impossible to generate an index for an infinite duration stream, so the usual method of generating MP4 files is not applicable.
Luckily, the MP4 standard also contains a section on "fragmented" MP4. Intended for splitting MP4 data into multiple files on disk, it allows for smaller "sub-indexes" to be used for parts of a media stream.
MistServer leverages this fragmented MP4 support in the standard, and instead sends a single progressively downloaded file containing a stream of very small fragments and sub-indexes. Using this technique, it becomes possible to livestream media data in a standard MP4 container.
The big reason for wanting to do this is because practically all devices that are able to play videos will play them when provided in MP4 format. This goes for browsers, integrated media players, smart TVs - literally everything will play MP4. And since fragmented MP4 has been a part of the standard since the very beginning, these devices will play our live MP4 streams as well.
The really fun part is that when used in a browser, this method of playback requires no plugins, no scripts, no browser extensions. It will "just work", even when scripting is disabled by the user. That makes MP4 live the only playback method that can currently play a live stream when scripting is turned off in a browser. When used outside of a browser, all media players will accept the stream, without needing a specialized application.
MP4 live is a relatively new and (until now) unused technique. As such, there are a few pitfalls to keep in mind. Particularly, Google Chrome has a small handful of bugs associated with this type of stream. MistServer does browser detection and inserts workarounds for these bugs directly into the bitstream, meaning that even the workarounds for Chrome compatibility do not require client-side scripting.
Now that most browsers are providing roughly equivalent theoretical compatibility, some have started to pretend to be Chrome in their communications in an effort to trigger the "more capable" version of websites to be served. This throws a wrench into our bug workaround efforts, as such browsers are wrongly detected to be Chrome when they are not. Applying our workaround to any other browser than Chrome causes playback to halt, so we must correctly detect these not-Chrome browsers as well, and disable the workaround accordingly. MistServer does all this, too.
Finally, iOS devices and all Apple software/hardware in general don't seem to like this format of stream delivery. This makes sense, since MP4 was based on an Apple format to begin with, and the original Apple format did not contain the fragmented type at all. It would seem that Apple kept their own implementation and did not follow the newer standard. While it is logical when looked at in that light, it is a bit ironic that the only devices that will not play MP4 live are devices made by the original author of the standard it is based on. Luckily, the point is a bit moot, as all those devices prefer HLS streams anyway, and MistServer provides that format as well.
Naturally, we don't expect MP4 to stay the most common or best delivery method until the end of time. We're already working on newer standards that might take the place of MP4 in the future, and are planning to automatically select the best playback method for each device when such methods become better choices for them.
That was it for this post! You can look forward to Balder's post next time, where he will explain how to use OBS Studio with MistServer.