16 Oct 2019

[Blog] Transcript: Common OTT Problems and How to Fix Them

Hello everyone,

This blog post covers the presentation our CTO Jaron gave during the IBC2019. The presentation was about common OTT problems and how to fix them, we're sure it's a great view if you've got some questions about what protocol, codec or scaling solution you should pick. The slides are available here.

Transcript follows

Alright! Well hello everyone, as was mentioned my name is Jaron. There we go... Hi, my name is Jaron, the CTO of DDVTech. We build MistServer, so if you hear DDVTech or MistServer, one is the company and the other is the product. We're right over there. Well of course we fix all OTT problems, but you might not be using MistServer or not wanting to use MistServer so you might want to know how we solve these problems. Some of the common ones, at least.

So, I present: OTT de-Mist-ified! How you would solve these problems if you weren't using us. I'm first going to go through a little bit of OTT history and then dive into the common problems of protocol selection, codec selection, segmenting and scaling. And I'm going to try and not talk as fast because otherwise this is going to be over in a few minutes. I'm sure everyone would be happy with me not talking so fast.

Alright, history-wise. Well, first internet streaming was mostly audio-only because of bandwidth concerns. People were using modems over phone lines and "surprisingly" these tend to have enough bandwidth for about a conversation or so. It's like they were designed for it!

Then a little bit later, as bandwidth increased, people did that we now call VoD: pre-recorded videos. Back then it was just... they were uploaded to a website, you would download them, and play them in a media player. Because browsers playing video back then, well, it's just... yeah, that was it. That was not something you could take serious.

Shortly after, IP cameras appeared which used the real time streaming protocol (RTSP). Most still use RTSP to this day (even the very modern ones), because it's a very robust protocol. But, it doesn't really work on the rest of the Internet (in browsers); it just works for dedicated applications like that.

Then, Adobe came with their Flash Player and suddenly we had really cool vector animation online, and progressive streaming support. Later they also added RTMP (Real Time Media Protocol) which is still sort of the standard for user contribution.

Now we're arrived roughly today, and we have HTML5. That is proper media support on the Internet, finally! And, well, Flash went away in a flash (chuckle).

That's where we are now. What we call OTT: streaming over the internet instead of over traditional broadcast.

Let's go into the problems now. What should we do - today - to do this?

Protocol selection: protocol selection is mostly about device compatibility, because no matter where you want to stream there's some target device that you want to have working.

This could be people's cell phones, it could be set-top boxes, smart TVs... There's always something you want to make sure works, and you can't make everything work. Well, we do our best! But making everything work is infeasible, so you tend to focus on a few target groups of devices that you want to work. This decision basically boils down to, if you are targeting iOS you're using:

  • HLS or WebRTC for live, because those are the only two that Apple will allow for live on iOS.
  • For VoD you tend to do MP4 because that does work and it's a bit easier to work with. Though you could do VoD over HLS and WebRTC if you really wanted to.

For most other consumer devices it's a mixture of protocols that tend to work roughly equally well:

  • MP4 and WebM which are similar to file downloads, except they play directly in-browser.
  • The fancy new kid on the block, WebRTC, which does low latency specifically.
  • The segmented protocols, which are HLS, DASH and CMAF. They are roughly equivalent nowadays.

For special cases like IP cameras or dedicated devices like set top boxes, RTSP, RTMP and TS are still used in combination with the rest.

For most people it tends to come down to using HLS for live and MP4 for VoD because that's the combination that works on practically everything. It's not always ideal because HLS tends to have a lot of latency and not everything might work well with MP4 because it has a very large header so it can take a while to download and start. So, you might want to pick a different protocol instead.

So! That's how you should select your protocol, roughly of course. I'm going to do this pretty quickly because you can't go into all the problems in one go.

The next problem is codec selection. Now, codec selection is also largely about device compatibility because depending on what chipset is in a device you may or may not have hardware accelerated decoding support. Hardware accelerated decoding support is important, because if it's not there you can literally feel the phone burning up in your hand. It's trying to decode something that it doesn't know how to do. Without a chip it's doing it in software, software requires CPU, which requires power, which burns up your battery. So, if you're doing anything mobile you need to have hardware acceleration, and hardware acceleration (included on chips) tends to change over time.

Right now, H.264 is the most widely supported codec. It works on pretty much every consumer device that still works. Maybe some people have something really old from 15 years ago that's somehow still functional; then you might have a problem, but I don't think anyone expects modern video to play on devices like that anymore.

We also have HEVC (which is also known as H.265, it's two names for same codec). It's on newer devices. It works great, gives you better quality per byte of data. The annoying parts are:

  • Not all devices have it. Just the more modern ones.
  • The patent pool is a nightmare. H.264 also has a pattern pool, but it's pretty clear how you pay royalties. With HEVC no one even really knows who you should be paying and how much and so people tend to stay away from it, especially because not all devices are compatible.

For the future, I'm kind-of hoping that AV1 is going to be the next big thing. Because there's so many companies backing it; there's hardware vendors doing it. I'm guessing that within the next few years we will see support for AV1 on all modern devices. And, since there are theoretically no patent pools for AV1 it would also be free to use. I think if all devices start adding this and it's free to use, it would be pretty much a no-brainer to switch to AV1 in the future. So, prepare for AV1 and use H.264 today. If you really care about limiting bandwidth as much as you can while keeping high quality you might want to think about HEVC as well.

Alright, next problem: segmenting. This is specific to segmented protocols, so: HLS, DASH, CMAF. Also HSS, but that's not very popular anymore since it's mostly just CMAF with a different name. These protocols work with small segments of data, so basically smaller downloads of a couple seconds (sometimes milliseconds or minutes) of media data that are then played one after another. The idea is that you can keep the old methods of downloading files and sending files over regular plain old (web)servers without having to put anything specific for media in there. You can still do live streams, because you have tiny little pieces that you can keep adding to the stream and then removing the old ones.

The issue with segmenting is how long those segments are. Do you make them one second? Half a second? A minute? The things that are affected by segmenting are startup time, latency and stability. When it comes to startup time, smaller segments load faster, because it takes less time to download. That means that the startup time is reduced. So, if you want low startup time make small segments. The same goes for latency: because there are smaller segments the buffer on the client side can be smaller and then latency is lower. This is the technique that everyone is using to do low latency HLS over the last few years. There's new stuff coming out soon, but this is the current technology. Basically they make the segments really, really, small and the latency is low. They call it all kinds of fancy names but that's what they're doing underneath.

The big downside to small segments is stability, because longer segments are much more stable and have less overhead. So you're wasting more bandwidth by doing small segments, and you're decreasing your stability by doing small segments. If there's something wrong with the connection and even one segment is missing your stream starts buffering or stalling and nobody wants that. It's a constant battle between making them long or short and it depends on if you care more about the the latency and startup time or more about the stability. The annoying part is that most people want all three and you sadly can't have all three so the key thing is knowing your market, where your priorities are, and if you're doing something where latency is a big thing and startup time is a big thing.

For example if you're in the gambling industry or something you want to do small segments. Or, maybe not even go with segments at all but use a real real-time streaming protocol like WebRTC or RTSP. If you're more traditional and sending out content then it's a good idea to make longer segments. It will mean it will take slightly longer to start up, but it will play much more smoothly and it will use less bandwidth so it's all about knowing what you're doing and picking the configuration that goes with that.

All right the final problem is scaling. Scaling can mean two different things: it can either mean delivering at scale, so you have lots and lots of viewers or lots of lots of different broadcasts or even lots of both; or it can mean that you want to scale up and down as demand changes. Maybe as a platform that has a few people using it at night and then during day time the usage explodes and goes way up and then at night it goes way back down. It would be a waste to have servers running all night long not doing anything, so you kind of want to turn most of them off and then you want to put them back up in the morning. Something like that. There are several approaches to solving the scaling problem.

You can solve it by, for example, partially using cloud capacity. The capacity you would always need you would have in-house, and then on the peak times you put some extra cloud capacity in there. It tends to be more expensive but since it's really easy to turn it on and off people love adding that for their peak times.

You could use a CDN to offload the scaling problem. You create the content, you send it to the CDN, and now it's their problem. It's not really solving it, it's more moving it to someone else. Which is nice, because they'll solve it for you.

Peer to peer can be be a solution. A couple companies here to do that. By sending the data between all your different viewers you don't have to send it all from your own servers. You can save bandwidth and make scaling slightly easier. The problem is peer to peer only really works well if you have a lot of viewers watching the same moment in the same stream. So, for stuff like the Olympics or something this will work great, but if it's like, you know, two years ago this episode of some TV show... you're probably not going to have a good time doing this.

Of course there's the traditional adding and removing of servers. Which is a pretty obvious way to do things, but it's hard to do logistically, because you need to wait (lead time on physical servers) to do this.

Load balancing is required for most of these things, if you want to do them. Deciding where viewers are going to go. Are they going to go to a particular server, or not? You can sort-of move them away from servers you want to turn off, and then move them to the servers you want to keep on, and switch them in and out as needed this way.

There's always just buying more bandwidth for your existing servers, or having some kind of deal with your bandwidth provider that lets you use more or less depending on time of day. And combining any of these. There's no real easy answer to this, and it really depends a lot of how your business is structured.

Key to doing this (scaling) properly is having good statistics and metrics on your system. If you know what the usage is during a particular time of day, you can prepare for it. There tends to be a very clear pattern in these things. So if you know what's happening and how much load you're expecting for particular events or times of day, you can kind-of anticipate it, and make sure that these above things are done in time and not after the fact.

Alright! So back to us, MistServer. We provide ready-made solutions for all of these problems and many others. So you don't have to reinvent the wheel yourself, and you can sort-of offload some of the thinking and the troubleshooting to us. If you're facing some other problem we can probably help as well.

You can ask questions if you would like to, or you can drop by our booth right there, or shoot us an email if you're watching this online, or if you come up with a question later after the show.