29 Oct 2018

[Blog] Transcript: Making sense out of the fragmented OTT delivery landscape.

Hello Everyone,

Last month was IBC2018 and they've finally released the presentations. We thought it would be nice to add our CTO Jaron's presentation as he explains how to make sense of the fragmented OTT landscape.

You can find the full presentation, slides and a transcript below.




Alright, hello everyone. Well as Ian just introduced I'm going to talk about the fragmented OTT delivery landscape.


Because, well, it is really fragmented. There are several types of streaming.

To begin we've got real time streaming, the streaming that everyone knows and loves, so to speak.

And we have pseudo streaming, which is like if you have an HTTP server and you pretend a file is there, but it's not really a file it's actually a live stream and you're just sending it pretending there’s a file.

But that wasn't enough, of course! Segmented streaming came afterwards - which is the current popular method, where you segment a stream in several parts and you have an index and then that index just updates with new parts as they become available.

Now it would be nice if it was this simple and it was just three methods, but unfortunately it is a little bit more complicated.


All of these methods have several protocols they can use to deliver. There's different protocols for real-time, pseudo and segmented and of course none of these are compatible with each other.

There are also players, besides all this. There is a ton of them, just in this hall alone there's at least seven or eight and they all say they do the same thing; and they do, and they all work fine. But how do you know what to pick? It's hard.


We should really stop making new standards.


So, real time streaming was the first one i mentioned. There's RTMP, which is the well known protocol that many systems still use as their ingest. But it's not used very often in delivery, as Flash is no longer supported in browsers. It's a very outdated protocol, it doesn't support HEVC, AV1, Opus; all the newer codecs aren't in there. But the protocol itself is highly supported by encoders and streaming services. It's something that’s hard to get rid of.

Then there's RTSP with RTP at the core, which is an actual standard unlike RTMP which is something Adobe just invented. RTSP supports a lot of things and it's very versatile. You can transport almost anything through it, but it's getting old. There's an RTSP version 2, which no one supports, but it exists. Version one is well supported but only in certain markets, like IP cameras. Most other things not as much and in browsers you can forget about it.

And then there's something newer for real-time streaming, which is WebRTC. WebRTC is the new cool kid on the block and it uses SRTP internally which is RTP with security added. Internally it's basically the same thing, but this works on browsers, which is nice as that means you can actually use it for most consumers unlike RTSP.

That gives you a bit of an overview of real time streaming. Besides these protocols you can also pick between TCP and UDP. TCP is what most internet connections use. It's a reliable method to send things, but because of it being reliable it's a bit slower and the latency is a bit higher. UDP is unreliable but has very low latency. Depending on what you're trying to do you might want to use one or the other.

All of these protocols work with TCP and/or UDP. RTMP is always TCP, RTSP can be either and WebRTC is always UDP. The spec of WebRTC says it can also use TCP, but I don't know a single browser that supports it so it's kind-of a moot point.


Then there's pseudo streaming which as i mentioned before uses fake files that are playing while they're downloading. They're infinite in length and duration, so you can't actually download them. Well, you can, but you end up with a huge file and you don't know exactly where the beginning and the end is, so it's not very nice.

While pseudo streaming is a pretty good idea in theory, there are some downsides. Besides disagreement on what format to pseudo-stream in, because there's lots of formats - like FLV, MP4, OGG, etcetera - and they all sort of work but none perfectly. The biggest downside is that you cannot cache these as they're infinite in length. So a proxy server will not store them and you cannot send them to a CDN, plus how do you decide where the beginning and end are? So pseudo streaming is nice method, but it doesn't work on a scalable system very well.


Now segmented streaming kind of solves that problem, because when you cut the stream into little files and have an index to say where those files are you can upload those little files to your CDN or cache them in a caching server and these files will not change. You just add more and remove others and the system works.

There are some disagreements here too between the different formats. Like what do we use to index? HLS uses text, but DASH uses XML. They contain the same information, but differ in the way of writing it. The container format for storing the segments themselves is also not clear: HLS uses TS and DASH uses MP4. Though they are kind of standardizing now to fMP4, but let's not go too deep here. The best practices and allowed combinations of what codecs work and which ones do not, do you align the keyframes or not - all of that differs between protocols as well. It’s hard to reach an agreement there too.

The biggest problem in segmented streaming is the high latency. Because many players want to buffer a couple of segments before playing, which means if your segments are several seconds long you will have a minimal latency of several times a couple of seconds. Which is not real-time in my understanding of the word “real-time”.

The compatibility with players and devices is also hard to follow. HLS works fine in iOS, but DASH does not unless it's fMP4 and you put a different index in front of it and it'll then only play on newer iOS models. It's hard to keep track of what will play where, that's also a kind of fragmentation you will need to solve for all of this.


So I kind of lied during the introduction when I had this slide up, there's even more fragmentation other than just these 3 types and their subtypes.

There's also encrypted streaming. When it comes to encrypted streaming there's Fairplay, Playready, Widevine and CENC which tries to combine them a little bit. But even in that they don't agree on what encryption scheme to use. So encryption is even fragmented into two different levels of fragmentation.

Then there are reliable transports now, which are getting some popularity. These are intended for between servers, because you generally don't do this to the end consumer. There are several options here too: some of these are companies/protocols that have been around for a while, some are relatively new, some are still in development, some are being standardized and some are not. That's also a type of fragmentation you may have to deal with if you do OTT streaming.


When it comes to encrypted streaming there is the common encryption standard, CENC. Common encryption, that is what it stands for, but it's not really common because it only standardizes the format and how to transport it. It standardizes on fMP4, it standardizes on where the encryption keys are, etc. But not what type of encryption to use. All encryption types use a block cipher, but some are counter based and others are not. So depending on what type of DRM you're using you might have to use one or the other. It's not really standardized, yet it is, so it's confusing on that level as well.


Then the reliable transports, they are intended for server to server. All of them use these techniques in some combination. Some add a little bit of extra fluff or remove some of it. But they all use these techniques at the core.

Forward error correction sends extra data with the stream that allows you to calculate the contents of data that is not arriving. This means not wasting any time asking for retransmits, since you can just recalculate what was missing so you don't have to ask the other server and have another round-trip in between.

Retransmits are sort of self-explanatory, where the receiving end says "hey i didn't receive package X can you retransmit it, send me another copy". This wastes time but eventually you do always get all the data so you can resolve the stream properly.

Bonding is something on a different level altogether where you connect multiple network interfaces like wireless network and GSM and you send data over both, hoping that with the combination of everything it will all end up arriving.

If you combine all three techniques of course you will get really good reception, at the cost of lots of overhead.

There's no standardization at all yet on reliable transports. It's very unclear what the advantages and disadvantages are of all these available ones. The ones listed in the previous slide all claim to be the best, to be perfect and to use a combination of the techniques. There's no real guide as to which you should be using.

So... lots of fragmentation in OTT.


So what do you do to fix all that fragmentation? Now this is where my marketing kicks in.

Right there is our booth, we are DDVTech, we make MistServer and it's a technology you can use to build your own systems on top of. We give you the engine you use underneath your own system and we help you solve all of these problems so you can focus on what makes your business unique and not have to worry about standardization, what to implement and what the next hot thing tomorrow is going to be.

We also allow you to auto-select protocols based on the stream contents and device you're trying to play on or what the network conditions are. Basically everything you need to be successful when you're building an OTT platform.


That’s the end of my presentation, if you have any questions you can drop by our booth or shoot us an email on our info address and we’ll help you out and get talking.