Ep. #2, Adaptive Streaming & DASH
In this episode of Demuxed, Matt, Steve, & Phil are joined by Andrew Sinclair, Principal Software Engineer at Brightcove. The group has a great conversation on the history of Adaptive Video Streaming on the web before discussing the current state of Dynamic Adaptive Streaming over HTTP (DASH).
Andrew Sinclair is currently Principal Software Engineer at Brightcove and is a highly experienced media technology professional.
In this episode of Demuxed, Matt, Steve, & Phil are joined by Andrew Sinclair, Principal Software Engineer at Brightcove. The group has a great conversation on the history of Adaptive Video Streaming on the web before discussing the current state of Dynamic Adaptive Streaming over HTTP (DASH).
transcript
Matt McClure: Hey everybody, welcome to the Demuxed podcast. Today we have Andrew Sinclair joining us. Why don't you give everybody a quick intro, Andrew?
Andrew Sinclair: Hi everyone, I'm Andrew. I'm a video engineer working at Brightcove. I've been working in the video space for about 10 years now, in a variety of roles across broadcast, infrastructure, web apps and other general things. Mostly in the last few years I've been focusing on DASH, which I believe is what we're going to talk about today.
Matt: Perfect. Real quick before we jump in, as Andrew said, we're going to be covering DASH. We'll go briefly into a high-level overview, but also talk about exciting things that have happened in the last few weeks. Or as exciting as DASH revelations can be.
Just to give a quick, high-level overview about some other stuff that's going on, Demuxed tickets are officially on sale. So if you didn't hear before, the date is set for October 13. Once again, the Foundation of Open Media Standards will be the day before, on the 12th.
Steve Heffernan: Demuxed.com.
Matt: Demuxed.com, that is probably an important thing to know. All the links you need are there.
Let's start by giving a quick primer for everybody that's not involved or knows a lot about the segmented streaming world. Let's talk about what exactly DASH is. Somebody want to give us a brief overview on DASH?
Andrew: I can kick us off there. DASH has been an evolution of a whole variety of standards. I was thinking about this yesterday, really going way back to the first of the adaptive bit rates, which is probably Move Networks, which is going a fair way back now.
Everyone was quite excited all about that, and Move came and exploded onto the scene and then quietly disappeared after a few years of trying, as no one really caught on with the adaptive world, and have never been heard of since. I think they got acquired or something. I don't know if anyone else can remember what happened to Move.
What happened next, I guess, is some of the bigger guys started to develop their own formats in that space. Probably the most notable ones to come out of the adaptive landscape is obviously Apple's HLS, the Adobe HDS, and Microsoft's Smooth Streaming.
They had all stuck around for a fair while and had all their pros and cons. Apart from Apple attempting to be slightly non-proprietary, the rest were all fairly proprietary standards. The best way to get the information on them, initially, was to reverse-engineer.
There were a few other components that contributed to DASH, but the good thing that came out is eventually everyone got together.
Well, actually, initially Apple were part of the party, with the DASH formulation, and got together and started creating the DASH standard. I think, as anyone who has ever experienced the evolution of a standard, it can certainly take some time.
Steve: Apple was initially involved with the DASH-IF? Or was this pre DASH-IF?
Andrew: I think this is pre DASH-IF. This would be into the formulation of the ISO standards work.
Matt: For anybody that isn't aware, DASH-IF is the DASH Industry Forum. That's what Andrew is referring to, talking about everybody coming together and working on the standard. As we know it today, it has been an iteration by this group of people called the DASH-IF that's made up of Microsoft, the Alliance of Open Media, which involves Google.
Steve: Akamai and a bunch of other big names.
Andrew: If you look at the actual DASH spec, a transport stream is a valid subformat, basically. You can use the DASH manifest with TSes, basically.
Matt: I guess we should go into a little bit about what segmented streaming really is.
Steve: Take a little bit more steps back.
Matt: I guess everybody that's watched video online at some point has encountered progressive download, especially today. There's an MP4 in a video element somewhere, and then your browser can buffer that as much as it can and just plays it back.
But the browser handles all the playback, assuming that the file has a MOOV atom and that the server that's hosting it support something called byte range requests. Then your browser will just request bits of the file as it needs it.
Steve: It's basically: you have a player, you give it a single file, it plays back that single file.
Matt: But it can be smart around buffering and things like that. It doesn't have to wait for the entire thing to download, assuming the MP4 is well formed, it doesn't have to wait for the entire file to download to start playing back.
That was actually, initially, a big deal, because I remember watching videos on eBaum's World way back when, and the initial view links were literally, "click to download." You'd go to click the thumbnail to watch the video, and it would just give you a link to then "download some Quicktime file or RealPlayer file," or something like that, and actually view it locally.
So being able to stream an MP4 was actually a pretty awesome advancement.
Andrew: Yeah. It was, definitely.
Phil Cluff: But what's the problem with that?
Matt: Why don't you tell us, Phil?
Phil: Well, the biggest problem with that is people don't want just one MP4, they want an adaptive experience. They want to see changing bit rates depending on how good the connection is.
If you've got one MP4, it's very hard to jump to a different one at an arbitrary place and give someone a better or worse quality experience. And thus is born adaptive streaming.
Matt: A lot of these adaptive streaming solutions such as DASH and HLS are generally just a manifest file that your player downloads. And then that contains references to all these other smaller segments of the file, and then your player can pick different renditions from this manifest as it deems its own abilities.
HLS for instance, you have a master manifest that links to different bit rates of smaller manifests. So your player basically initially decides, "Okay, I have this much bandwidth, I'll pick this manifest." And then once it gets the sub-manifest, it just plays. I mean,
M3U8s, which is the HLS manifest, is literally Winamp's old playlist format, which I think is pretty amazing.
Steve: We should say these manifest files are just text files, really. They're just text files that have pointers to other text files or media files, and that allows the player to then go and grab these different things as it needs them.
Phil: Interesting they accept HDS, which does actually have a binary manifest format.
Matt: Really? This is Adobe's HDS?
Phil: Yep.
Matt: True story?
Phil: Yep, Adobe's HDS is a binary manifest format.
Andrew: Just to make it easier to reverse engineer.
Matt: Exactly. So, on that note, let's talk about what the other options are. You briefly alluded to a few of these: Smooth Streaming, HLS. The Move Network's initial, what I have to assume was the best of all of them, just died an unfortunately premature death.
Let's talk about a few of these and how they compare to DASH, and why DASH seems to be having all the momentum right now.
Andrew: For the closest ties to DASH, HDS and Smooth fundamentally are fairly similar in that they rely on an MP4. And really, when we look at who threw the most into the DASH court from their proprietary stuff, it was probably Microsoft, heavy users of the early evolution of the fragmented MP4 format.
In that sense, along with an XML-style manifest, Smooth and DASH are fairly similar. As Phil alluded to, HDS is also fairly similar and, correct me if I'm wrong, also relied on a fragmented MP4 underneath. And it had a similar kind of thing.
Obviously, early days with HDS, it was very closely tied to Flash, and I can't remember what version of Flash it was. Requires going back a while now, but it was one of those versions came out and new adaptive streaming support. Smooth, the way it could play was Silverlight initially, too. So we're in early desktop days here.
And then of course it was HLS, which was fundamentally different; it had that Winamp format. Still not entirely sure why it's got the Winamp format and they persist with it to this day. Every time I look at it, it's still like, "Right. That's different."
I guess the fundamental difference, though, with the Apple stuff too, is it has the transport, the TS format underneath. Which, for all of those that work in the video processing and encoding space, adds its own whole level of different processing to create an MP4 to a transport streamer.
They're quite different for packaging, obviously. The same video codecs live through all of this stuff, too, which might be worth mentioning. Most adaptive streaming has been and is to this day H.264, AAC audio. They're sort of the fundamental blocks that sit underneath this.
Also coming from that HGS and Smooth is that MP4 format which has evolved into the ISO standard for the base media file format. I remember, as one does, getting quite excited when that became a standard. Thinking things would be quite nice and easy, and here I am still battling with the base media file format on a daily basis.
But hey, it's good. And if anyone's worked with, as we were talking about earlier, the progressive MP4 format and how you then chop one of those up to be delivered, it really takes the guesswork out of it.
Rather than saying, "Hey, I think 15 minutes and 10 seconds through the video is about here," we can be a lot more accurate now based on the way that that file is fragmented up.
Phil: I actually once heard a story from one of the people in the San Francisco Video meetup where he told me about why Apple picked transport streams for HLS. Has anyone else heard that story?
Andrew: I haven't heard that specific story, no.
Steve: I've just heard it's a really old format. Like, it's an old broadcast format or something like that.
Andrew: Broadcast, yeah.
Phil: From what I hear, it was really down to the chips that existed in the first-gen iPhones. The first-gen iPhones had hardware transport stream decoders in them, so you could pretty much just stick a transport stream down onto the hardware and it would decode and render it. And from that day forward, they stuck with transport stream for simplicity and backwards compatibility.
Steve: That's funny. Maybe they had a Winamp decoder too.
Phil: This may be complete conjecture, for the record.
Steve: It seems like they just reused a lot of the existing technologies. Maybe that's a good strategy.
Matt: To be fair, HLS' market penetration is, I assume, still greater than DASH. I mean, it's probably a little bit different, now that Netflix delivers such a vast majority of premium content online, that I have to assume that them switching over to DASH really ate away at the market share numbers of HLS.
But I still have to assume that, after all this time, HLS is still pretty up there in terms of what people are using for adaptive bit rate streaming, especially since for a while it was you had to at least deliver that with Apple. And you still do.
Anybody who's delivering DASH right now is probably also delivering HLS stream.
Steve: Yeah, just from the requirement of all iOS devices you have to deliver HLS on, I think that maybe Netflix is using DASH. But I think the sheer number of companies doing streaming video, most of them are going to try and stick with just one format instead of making two and doubling their CDN bills and everything else.
Andrew: There's nothing wrong with the transport stream either, really. I mean it segments pretty well and all that. It's got a lot of body of knowledge behind it for broadcast in terms of quality, and it's fairly light on delivery. So it's not a bad format.
Phil: Some people criticize the overhead, right? You don't necessarily need all the stuff that is used in transport streams for packet counting and all that sort of thing, when you're delivering a HTTP chunk, is the argument I hear a lot about why we probably shouldn't be still doing transport stream chunks.
Andrew: Yeah, correct. There's a lot of stuff in there which is assuming you're spraying it out over a UDP network and packets can arrive in any old order and you've got to reassemble it.
Phil: Which you don't need when it's just chunks over HTTP, right?
Andrew: Nope, you have the TCP underneath, you sort of know it's going to get there; you don't need to retransmit. It's not so bad.
Matt: For a while it felt, especially with HLS, kind of the big use-case before people really cared about adaptive bit rate streaming for everyday delivery, was for the live use-case.
At least for a little while, especially in the Video.js world, anybody that wanted less than 10, 20 seconds, anybody that wanted to get close to real-time latency, basically required RTMP. Which meant that we accepted a poll request to the Flash, the SWF for Video.js that then nobody else really knew how it worked.
It was this terrifying black box that we just refused to touch for like two years.
But what does that world look like today? Twitch.tv is probably the closest to real-time I've seen with HLS, especially in the wild with people that are delivering live video to millions and millions of users. But everybody else that's trying to do anything close to real-time, that can afford to not use a CDN, still feels like they use RTMP.
So, could we talk about what these latency requirements mean? For the everyday streamer that doesn't really care about latency, what's the major pull of HLS over RTMP?
Andrew: There's obviously a fundamental problem with adaptive streaming and live compared to something like RTMP or RTSP or the good old broadcast UDP stuff we were talking about before. In those kinds of streaming protocols you encode, you produce a packet, you send the packet, right? It's pretty low latency in that program. Your slow part is the encoding, whereas in adaptive your sending out chunks or segments of packets.
So you've got to get a bunch of those, decide how many of those you want to group together, and then send it. So the default kind of setting for HLS is about 10 seconds, can be anything. Generally, low-latency live streaming would be around two seconds, but you're still going to have two seconds of writing out that file before you send it, or buffering it up. Versus where in the lower-latency stuff, you're pushing it straight out.
Phil: Even with a small-chunk size, you still can have a lot of the same problems around, for example, buffering behavior.
The smaller they make the segment size doesn't necessarily mean a better experience for a user. It can often mean a lot more buffering.
Even if we say a two-second segment size of real-world latency on top of that, because you have to reload manifest frequently, you have to have several chunks in buffer. You can't just have one chunk available at one time. You're still going to end up in a situation where you're going to end up to be four to six seconds, best case, even if you have a two-second segment size.
People seem to think you can get down to about two seconds with HLS. Realistically, that's very difficult. Even best case, I think Twitch are doing four second segements right now, as far as I can tell, and they're still having to hold a good amount of those chunks in cache to give a good buffering behavior for clients.
Steve: Yeah, that makes sense. What you're saying is, essentially, that there's diminishing returns, right? Because you want to buffer a certain amount of data before you start playing back the video anyway, to make sure that it doesn't start rebuffering later, right? And so no matter how small you get that segment size, you still have that requirement.
Matt: Absolutely. And not to mention the fact that the smaller you make the chunks, eventually you start running into the fact that you're losing all of your encoder efficiency. Because you don't have big enough chunks of video to make informed decisions on how to encode that piece. So you end up with actually, overall, a much bigger video.
Phil: Glossing over the fact that people can offer to be on latent networks, right? If you consider a network that might have a couple-hundred-millisecond latency on it, suddenly, if you're doing a two second segment, you're doing HTTP request every two seconds to go off and get the chunk. And on top of that, you have to keep refreshing your manifest in HLS.
So you're doing a good few HTTP requests every couple of seconds, and if you've got a good amount of latency, that really starts to add up on the network, right?
Matt: Absolutely.
Andrew: Combine that with the player becoming a little less efficient with smaller chunks sizes too, being that it has more opportunity to change bit rates. So you could be shifting up and down a bit there as well, which I guess are not problems with your straight RTMP-style sprays.
If you get down to four to six seconds though, it's pretty good with adaptive streaming. If you think cable broadcast, if you were at about five seconds, that's pretty good. There's still delays. All networks have latency, whether it's all terrestrial or it happens to be bouncing up and down over a satellite. Certainly I don't think anyone's solved this super-low latency adaptive streaming problem, despite a few claims out there.
Steve: There's a whole other problem, and that is real-time video for two-way communication type of stuff. I don't know that anyone's trying to use HLS or an adaptive format for that. That's going to be in the realm of web RTC and technologies still, like RTMP.
Phil: Right. So, 20 minutes in. Shall we get moving?
Phil: Got distracted by a brief history of HTTP live streaming. Really, at the core of DASH, we've touched on this a bit, we've got the ISO BMFF format. The MPEG4-Part-12 format. Which is, as I'm sure a few of you have heard, is what we refer to as "fragmented MP4" in the industry.
What does fragmented MP4 look like? Pretty much looks like any other MP4 file. Generally, at the top of your file you're going to have f type and a MOOV atom, then you're going to have MOOF and MDAT tuples cluttered through your file.
In general, when we look at DASH we look at two types of DASH: one is a fragmented MP4 file that is segmented, and one's a fragmented MP4 file that's not segmented. What we mean by that is whether it's literally different files in terms of its delivery and generally on disk as well.
It really boils down to two ideas, and what we call these is generally referred to as DASH Live Profile and DASH On-Demand Profile, where DASH On-Demand profile is the fragmented MP4 files that aren't then segmented onto disk.
And the way that works is all the segments are fetched by byte ranges. So we would have a byte range of a known set of atoms in the file, and we would byte range in to grab those.
Andrew: It's interesting to note too that that fragmented MP4 file, while it's often considered as part of DASH, is actually separate from DASH; it's its own standard. People are out there using fragmented MP4s, obviously with different streaming technologies. I mentioned Smooth earlier. Netflix have also used a slightly different kind of manifest setup and stuff along with some of those key ISO BMFF underlying files.
Another interesting one that goes hand in hand with that base media file format, and the fragmentation, is the common encryption format. I know we'll talk a bit more about encryption a bit later on, but I think that and this fragmented MP4 format, they are some of your key components that allow all the different things that you can do with DASH.
Once we have that file format, we can then create a map. We can create multiple files which represent our different renditions for our adaptive streaming. Then we can put our manifests over the top and a whole lot of different encryption schemes on top of that as well.
Phil: I think this is interesting because a lot of us talk about Netflix doing DASH, and on the desktop it's kind of not really true from that perspective, right? We know Netflix is doing fragmented MP4 delivery and they're using the typical browser APIs you use to display DASH content.
But really, is it DASH if it doesn't use a DASH manifest format? It gets a bit debatable at that point.
Andrew: Same with YouTube as well, while we're talking about that. It's those same building blocks there but not everyone has to use the DASH manifest on top.
The DASH manifest has its pros and cons and it's probably one of the less elegant components transitioned across from Smooth Streaming. While it is a bit nicer than Smooth Streaming, big XML documents are probably not everyone's favorite in this day and age, either.
Phil: I hate to hear you say that. I love my XML documents.
Matt: You shut your face.
Steve: It's also just the fact that it is a document in itself that has to be fetched. Whereas if you know that a certain webpage is dedicated to a specific video, you could potentially just deliver all that information with the initial HTML page as opposed to waiting to request another manifest and put another step in the process of getting to starting to play that video.
That's my understanding of what YouTube is doing. I think it's more on the lines of that, delivering some of the data more early on so you don't have to have that in-between step.
Phil: One of the interesting situations here is, like Andrew said, the kind of dream of DASH was, "Here's a manifest. I can chuck it in my video element." Kind of as you can, if you're using HLS on Safari, I can totally check the source in my video element to be HLS manifest.
A long time ago, part of a dream of DASH was that we'd have browsers where you could just do the same thing. But that's turned out not to be true, right? We've ended up in a completely different world where we've ended up building EME and MSE in the browser, which is what we're now using to be compatible with DASH.
This dream of "Here's an XML file, we chuck in the sources list" just doesn't exist. And I think that's also really interesting from a perspective of, part of the argument around "Hey, let's use an XML manifest" was this should be something that's easy to parse and easy to deal with in the DOM. But that's not true, right?
We're not dealing with it as a DOM element. We're loading it in through XHTP requests, and this is where the argument comes from,
is DASH really well thought through for the browser market?
Matt: We can talk about why SASH is a better alternative there. Or why SASH came around at all. But I think this is a good segue into browser support in general for DASH and what the differences are here.
As both Steve and Phil have alluded to, I think one of the big features of HLS, especially on iOS devices, or only on iOS devices and Safari desktop, was that you could just chuck this thing into a video element just like a progressive MP4, and it would just play back. That included live, HLS manifest, all the other stuff.
The downside of that is you have zero control over playback as a developer, so you're stuck in the same black box of just the video element.
Phil: I think you should have control. There are things like the adaptive bit rate switching, right? You have no control of that if you're just chucking HLS manifest in video element.
Matt: Or you can do what Video.js does, where you theme the video element and things like that. But for the most part, you're assuming that the playback is a black box, which has ups and downs.
The positive there is that it's incredibly easy to put an HLS video on a webpage and just have it play back. With DASH, however, with great power comes great sadness in building a lot of players. But the good thing is we have MSE. The bad thing is we have to use MSE, right? It's this double-edged sword.
Steve: MSE being media source extensions and API that allows us to just push bytes into the video element.
Matt: Right. So instead of just chucking a file blindly into a video element, you actually have to push bytes into this API.
Andrew: It sort of opens up the player capability and the browser. That's probably one of the most exciting things, or least exciting, that's happened, depending on the way you look at it.
Under the umbrella of DASH is MSE, you know? Previously, think of all the different players we had to deal with. Whether it was a plugin or it was Flash or it was Silverlight. At least now we've got this ability to say, "Hey, I'm going to throw something in this HTML5 media element." With a bit of tweaking, it should hopefully play.
That does allow us to do things like DASH in the browser. I think that's been one of the main advancements under the DASH umbrella, really.
Steve: I really like the approach that the browsers, the W3C, has taken here. It's like, "Hey, on the easiest side of video, where you're just trying to play a simple file, we're going to try and make that as easy as possible. You give us the file. We'll play it back.
"But otherwise, if you're trying to do something advanced like adaptive streaming or linear editing, things like that, we're just going to open up the door and just give you the pipe to push bytes into and let you take it from there.
"We're not going to try and be in the middle and try and be smart. Somewhere in the middle where we're trying to automate the adaptive streaming side of things, where ultimately there will be further disagreement and people will get it wrong." And we'd have even more problems if they just didn't open the door for us. So I like that approach that they've taken there.
Phil: I think there's two sides to that one though. Because one side is "I would love my black box where I don't have to care about any of this." There's a chance the same thing might work everywhere. And the flip side is when we get to the MSE and EME world, the standardization across browsers there has taken so long and is still in a very bad state, realistically.
Media source extensions? Yeah, it's pretty well supported across modern browsers now. EME? Very variable across browsers still. And there's a lot of work going on still to try and standardize the CDM API, and all that sort of thing, to try and get some sort of interop into the availability of these APIs.
Matt: Just to define there, EME is encrypted media extensions, which we'll talk about briefly in a second. CDM is a content decryption module. So these are both things around delivering DRM content within a browser, and we'll go over why people have their pitchforks out about that.
But just to wrap up the MSE, things that make it cool, another thing to point out here is that this isn't just cool for delivering DASH. People have also started delivering HLS cross-browser via MSE.
Granted it's an extra step, because you also have to take in the TS segments and then transmux them to MP4 so you can chuck them into the MSE API. But people are still doing it. We're doing it with contrib-hls in Video.js, and I think that's, especially for people that are kind of stuck in the middle and don't want to deliver both HLS and DASH, that's an option for people right now, which is cool.
MSE is not just cool for the DASH world, it's kind of cool for anybody that's able to get the right bytes into the MSE API, which is pretty awesome.
Phil: This is getting to a lot of the internet now. MSE is available on a good, I think 60% was the last number I read, of the internet. Which is pretty huge if you think about it.
The places where we're struggling now, realistically, are old IE, right? Real big problem with corporate networks: a lot of people with old machines, not wanting to upgrade or can't upgrade. But realistically, if you've got a modern IE or if you've got a modern Chrome, or if you've got a modern Firefox or Safari, you've got a good, solid implementation of media source extensions, which is great.
Andrew: Let's just clarify, for that spot you mean 60% of browsers? Because the next challenge you've got is there's browsers, and now there's a gazillion other video playing spots that don't support MSE. And also that'd be desktop browsers too, because MSE support on mobile is sort of another thing altogether, isn't it? Where it's completely different.
That's been the other problem with DASH at the moment, is the way desktops have been getting better, it's still been fairly hit-and-miss on mobile. Obviously zero support, effectively, on iOS. Though some people have done some interesting stuff to port some iOS-based players, and we'll see how that pans out with some new developments coming up.
The other side, Android is being a bit all over the place with its DASH. Their guests are quite often relying on third-party players. There's a lot better support built in from the ground up from 4.3 onward, so the future looks fairly good there.
However, and I think all you guys will agree,
every one of these players has its own little idiosyncrasies and does everything a little bit different to everywhere else, which can be quite challenging.
Matt: The irony there is that that's roughly the same story as HLS on Android.
Phil: Yeah, I mean if anything, HLS is worse on Android.
Matt: Let's move on to the early adopter. You mentioned device support; that's a great segue into what I wanted to talk about next, which was how things have picked up recently around device support and who's actually delivering this.
The early adopters, YouTube and Netflix notably, were the ones that initially struck out down the DASH path before everyone else had really even started. HbbTV, which is a European OTT...
Phil: What does it stand for, Matt? Come on, what does it stand for?
Matt: (inaudible)... TV.
Andrew: Googling, googling.
Phil: "Television" is the TV, I'll give you that much of it.
Steve: I knew that part. I actually have no idea what Hbb part stands for.
Andrew: Hybrid. I never forget which way they go round, but hybrid broadcast broadband television. Just because we needed another acronym.
Matt: This is definitely something the video industry needs.
Andrew: Basically an attempt to standardize internet-connected TVs for the purpose of broadcasters is really what it is. There's a whole little sub sort of culture, for want of a better word, under there called the Open IPTV Forum, that produced a whole bunch of standards on how stuff should work on these TVs.
Now we're talking about browsers and mobile being fragmented, and smart TVs is just another whole world completely. They need a lot more standardization. HbbTV was an attempt, but no one really seemed to adopt it that well.
But one of the good things they did do is they were heavy adopters of the whole DASH specification, and that's certainly how I got into DASH, is through the HbbTV stuff.
Apart from all these bits, like fragmented MP4 and MSE and EME and stuff existing around a place, the HbbTV is really sort of a full-stack implementation, with the exception of the browser, where they do this strange OIPF-based specification, HTML/JavaScript hybrid, completely unrelated to anything that would run on a browser.
Steve: Awesome. As far as I could tell, HbbTV was actually a pretty great forcing function. I remember that was kind of the thing at Brightcove for a little while. Which forced us to actually implement a lot of stuff, especially on Zencoder.
It was HbbTV that really got the ball rolling for us to start doing a lot more stuff with actual monetary backing, because now there were people that were actually going to pay for it that needed it. Not like the NAB-cool-factor thing, you know what I mean?
There's actually checks behind this whole thing that people really wanted to give us money to implement it. So HbbTV, I assume, not just for Brightcove but for other companies as well, was a forcing factor to actually get people to pay attention.
Matt: Was it a requirement in the EU, along some lines? What was the deal there?
Andrew: The way it worked is that broadcasters tend to have these associations where they get together, particularly you have the free-to-air kind of broadcasters, more so than the pay TV/cable ones. They get together and said, "Look, we are all going to put out HbbTV services that all have a combined target dates and testing criteria, and all those kind of things." And they went out with, I guess, all of it at the same time, which that obviously gets people like us coding away.
Phil: One of the markets where that's had a lot of penetration as well is not just Europe, it's Australia, right? Where you're from.
Andrew: That's certainly how I got into it, yes. And that was very much what happened here, too, was all the broadcasters got together and went, "Look, we're going to push this out. We're going to pick these target devices."
They all got very excited about it and ran into all sorts of complexities of free-to-air television that have got nothing to do with technology. But it is available now, so if you do happen to have one of those TVs, you can play it.
Phil: I think you're the only person with one in Australia, right Andrew?
Andrew: I have more than one.
Steve: Let's move on to EME a little bit. We talked about this briefly earlier, but encrypted media extensions, this falls right in line with broadcasters actually wanting to support a streaming format online that doesn't require something like Silverlight or DASH to deliver.
Obviously the first question that people are going to have in that world is, "How do I apply DRM to this thing?" And encrypted media extensions was an attempt by the W3C to bring DRM to the browser without needing plugins like Silverlight and Flash. So this of course has sparked some controversy, but Andrew's actually worked with this a lot. Is that a decent synopsis?
Andrew: Looks good. And I just jump back again too. For me, personally, having done a lot of work with DRM and dealt with things like Widevine and PlayReady, Adobe's multiple namings for its DRM, Apple FairPlay, 15 other complete proprietary formats,
the thing that underlies all this, which is enabled by EME in the browser, is that common encryption format.
Once again, everyone getting together and saying "Look, we're just going to use the same kind of blockchaining on IES encryption, and we're going to do it roughly the same." That was the big thing for the DRM world that just made things so much easier.
Now there's always the complications on delivering keys and everything which is, as I think Phil mentioned before, is still where things are a little bit all over the place on EME and the browser. But that building block has really, at least, got us starting on these things.
As DRM does, it always brings up a lot of contention and conflicting views on why we need it. But, hey, the people that pay want it and they won't release the content without it, so we do need to implement it.
And particularly in the browser, it had been heavily plugin-reliant before. Quite over the top in terms of how closed in it was, and pretty bad on the CPU usage and all that kind of thing. Sitting there with your Flash player open, decoding and decrypting everything at the same time. Hoping you can finish a movie before your battery runs out is always a bit of a challenge.
Steve: Sterilizing every man trying to watch a video on a train with his laptop in his lap.
Phil: One of my friends once had a program on his laptop where it would look at his remaining battery and change the playback speed of the movie he was watching to make sure he could finish the movie before his battery ran out.
Steve: I'm going to assume the end of that story was amazing, but I didn't hear any of it.
Phil: Oh no, sorry. You'll hear it on the podcast later.
Steve: Splice in some laughs.
Phil: Oh wow, guys, that's... Thanks.
Matt: That's kind of the same thing we did at Demux with your puns last year. I'm kidding. They were great.
Phil: That keeps coming up. It really hurts.
Matt: So let's talk about why this is so conflicting. I think the first thing to point out is that EME, for all intents and purposes, is a better experience than the way it was five years ago. Flash could only take up one core of your CPU, so you end up just chucking that thing at 100% while it's trying to decode, and you melted your lap.
If you were lucky, you could get through a video without killing your battery life, not to mention the fact that Flash itself is just kind of a constant source of fun, and zero delays.
So to me it's an obvious win over that world, and especially as Andrew said, the people that actually have the content are just simply requiring it.
I think no matter what your feelings are about DRM, if you want to watch content online, you're kind of stuck with it.
I remember I got in an argument with somebody one time about why this was required. They were like, "Well, I only watch videos on this one website. They don't have DRM." And the number one piece of content was some Minecraft movie.
I was like, "All right, man. Yeah, that content doesn't require DRM, so that sounds great." But for everything else, if you want to watch Captain America online within two years, it's absolutely going to have DRM on it.
Phil: Even longer than that, right? Talk to the guys out at MUBI; they're licensing studio content from five to seven years ago. Hey, it's still going to have to have DRM if you want to deliver it online. It's certainly not even a couple of years. It's a big deal still for older stuff.
Matt: Exactly. And so I see DRM as a zero-sum game. But I also see it as one of those necessary evils where, especially if you're somebody like a person that's building the technology around this stuff, like us, collectively talking, if you're just building players and things that deliver this content, but not the content itself, we really have very little power over what these content providers will give us.
If we say, "No, we're not going to apply DRM," they're just going to say, "Okay, this other vendor will. So, see ya." I think people like to pretend that as the people building this technology we have the power to not do it. But we don't.
Phil: I completely agree. It's the studios and the contracts that are the problem. We just implement it. I completely agree.
Steve:Chrome is actively dropping support for these other plugins that do DRM, right? Which is somewhat saying. I mean, I guess now that they have EME, they're allowed to do that.
But if we were to, say, drop support for both plugins and not have EME, what that does is just push the movie studios over to native apps, right? And then we're off the web completely, and I don't think anyone here wants to see that direction happen.
Phil: I think what's actually really interesting here is how people view DRM and EME extensions in terms of not being a plugin. Because the ultimate deal is, it is a plugin. You go to your Chrome settings, it shows as a plugin, actually.
Steve: A mandatory plugin.
Phil: Well, actually it's not. When you clean-install Chrome, you don't get a Widevine plugin. You actually have to go to a website that tries to use Widevine, and then Chrome downloads the plugin in the background for you and installs it for you.
It's actually really interesting that people kind of see it as no more plugins world, whereas really it truly is a plugin and it's a more pure version of a plugin.
Andrew: It's fundamentally, I think, what people don't like about DRM, isn't it? To be secure it's very hard to have a completely open standard. And to implement something that's secure and obfuscated and is not an open standard, it's got to be a proprietary piece of code.
Particularly in certain open browsers like Firefox, it's been quite challenging for people to implement, because then at the other side of it there is not really any open DRM standard there.
You're all talking about the Google standard, your Adobe standards, your Apple standards, your Microsoft standards. So there is that element that then needs to be incorporated with all these browsers.
While there has been these good open elements of making things more cross-platform and standard, it is definitely where it's fallen down is with EME and the browsers. It's why we have Chrome with Widevine, IE with PlayReady, Adobe strangely enough with Firefox, and FairPlay with Safari. So, surprise surprise.
Steve: Did Marlin ever make any traction?
Phil: Marlin hasn't appeared in any browser yet. There was some brief talk of it appearing in Firefox, but then the whole Adobe thing happened. And of course, the most interesting thing there is the latest Firefoxes also have Widevine.
Someone pulled the Widevine CDM out of a Chrome build and put an API around it, and you can now use Widevine in your Firefox install. And the reception that got is huge. People are really excited about that, actually. The idea that we might actually get to a place where Firefox is workable EME is really huge, big deal.
Steve: That was my understanding of Firefox's mentality behind it. The Firefox crew was not super-excited about DRM in the first place, but they knew if they were going to implement it, they weren't going to do it in a way that was forcing proprietary technology.
They wanted it in a way that you could have options, right? And so I think early on Marlin was going to be that other option, but it sounds like it's moved to Widevine, which is cool.
Phil: And Marlin is certainly out there in terms of devices. Marlin's actually one of the approved HbbTV DRM technologies, right, Andrew?
Andrew: That's correct, yep.
Phil: Obviously Marlin's actually also the standard for something called UView out in Europe as well; it's also the DRM technology used there. That's the only place I've really seen it in use. I think the dream was in HbbTV really that a lot of people would use it and the vast majority of usage I've seen somehow ended up still being PlayReady on HbbTVs.
Andrew: It was, definitely. That's once again, with some of the browsers, with all the other devices, once we implement DRM, you're implementing someone else's proprietary code to do all that decryption and key exchange and everything. So it's a matter of who you pick and how you implement it, which is up to every player maker or browser maker.
Matt: Before we move on, what is the reception in the wild looking like right now for EME? I know Netflix, MUBI, YouTube, it feels like a lot of the big players have actually started moving this direction already. And Netflix has been doing it since before it was public API, as far as I can tell. At least in Chrome. But how does reception look in the wild right now?
Andrew: It's certainly a very real thing now, I guess. It's certainly moved from being, as you said, pre-release, some particular customers being able to get access to the APIs, to fairly good support around the place amongst either people implementing it themselves, or service providers providing solutions that give you desktop DRM in the browser without a plugin, across as many browsers as you can possibly support.
It's looking pretty good, and hopefully we'll need less plugins as we go forward, particularly for players, which really has boiled down to less Flash and less Silverlight. Although there's been quite a few other plugins around there, the Google Widevine one was another one that people may have been forced to install for a while. So yeah, looking good and getting better.
Matt: I think overall that seems like a positive thing to me.
Phil: I agree, and there's some really interesting stuff going on. I'm looking forward to FOMS. FOMS should be great this year. Reminder: that's October 12 in San Francisco.
Matt: Fomsworkshop.org, if you're interested.
Phil: I think there'll be some brilliant conversations there. VME specs had a really interesting point. I don't know how much you guys were paying attention, I'd love to kind of dig into this, I was thinking maybe we should have a DRMcast at some point, where we just sit and talk about DRM for an hour.
Steve: And get all of two listeners.
Matt: Just like the one that we do on ad specs. Those two would probably be the raging successes.
Phil: Who's interested about VAST 2.0, guys? No one? Okay, just me.
Matt: I need more interactivity in my ads, please. I need to punch monkeys as they come across. Remember that banner ad back in the day? That's what I want in my videos.
Phil: Slight side note, we actually spent a lot of time talking about interactive ads over the last seven days, and boy was that a painful conversation.
Andrew: Yeah, the one thing that everyone loves more than DRM. Ads.
Phil: I actually want my ads to be DRMed. That would be great. Don't want people seeing my ads.
I think one of the interesting things is the Free Software Foundation has been anti-EME since it came about. They had a big thing, "Say no to DRM in HTML5." And they're still running on this direction.
I think, over the last four or five weeks they put forward a suggestion as to how we can get to a place where the FFS will support EME. Which I think is really huge, really exciting, and I really hope that's something we go over when we get to FOMS later this year. That'd be really cool.
Matt: Absolutely. I remember the blog post you're talking about. We'll post a link to it on the Demuxed Twitter account, @demuxed on Twitter, after this.
Phil: Well, there's a plug.
Matt: Follow us, please. Be my friend.
Phil: Like and subscribe, guys.
Matt: We're running a little bit into time constraints here so I wanted to move on to the big news of the last WWDC. Clearly it was not the new iPhone or the MacBook Pros with OLED touchpads above the keyboard.
Phil: I'm so bitter about that. I'm so bitter. I want my new MacBook.
Matt: Just crushed. I tried to use a Pixel for like six months waiting for a new MacBook to come out. Thank god I just bit the bullet and bought a new one.
Phil: That's like two years ago now.
Matt: Yeah, imagine using an SSH client that's actually like a Chrome app. It's not fun. I know some people use Pixels very well for web development. Not for me.
But I think some of the biggest news out of WWDC16, and I'm obviously biased because video news was cool to me, but some of the things that they quietly announced around video support in iOS and desktop Safari was pretty astounding.
Some of this, specifically the big one which was fragmented MP4 support in HLS being officially supported. Cheering. We're going to insert some cheering into this, right, Ted? Yeah, fragmented MP4 support officially being in HLS. This is one of those things that I may have heard from people that heard drunkenly from people at parties that worked at Apple, that this was going to be a thing that was supported or you could quietly do in Safari as of like six months ago.
But actually having it be announced and in the wild is pretty cool. So, what are the implications of this, for the video world? Clearly we're not going to be able to standardize around manifests yet, but what does this mean, just being able to say, "Okay, we can do away with TS delivery and just go to fragmented MP4, pretending that iPhone 4s didn't exist anymore in the wild." What does this mean for everyone?
Phil: I think it's huge. It is a huge step in a more unified direction. To a certain extent, I guess that we'll come on to the flip side of it in a second, but to a certain extent it is a massive step forward. We've got Apple finally sitting there and saying, "Yeah, TS probably isn't the best thing to deliver over HTTP."
As we talked earlier, you don't need a lot of the features that are built into transport streams these days. I'm really appreciative that Apple have really sat down and said, "Okay, here we are." But I think, like you said,
the flip side is we're not going to see a standardization on manifest delivery at this stage.
If Apple were going to go in that direction, they would've announced by now and we would be in a place where we were delivering a single manifest format everywhere as well as media format to most places.
I think though, as you also touched on, realistically this is great for anyone who upgrades to what is it, iOS 10? This is going to be great for everyone on iOS 10. Who else is going to get it? No one.
I think this is the difficult situation we're now in. It's not a case of "You can now deliver one thing everywhere," it's a case of, "Hey you still need fragmented MP4 because you want to do DASH and you want to do HLS with fragmented MP4 to whomever can take it." But unfortunately, you're still going to need transport stream-based HLS for a much longer time.
Matt: Hey man, 2026 is not only going to be the year of the Linux desktop, it's also going to be the year of fragmented MP4 everywhere. It's going to be great.
Phil: But of course that's the other side I was going to come on to. Yeah, it sounds great in theory, but let's face it. The flip side here is, and it's a bit debatable, but I'm not entirely convinced we're going to get fragmented MP4 everywhere, because of the Alliance for Open Media.
The flip side of this is we have VP9 being packaged usually as WebM when you want to do DASH with it. There is a proposed format for containing VP9 chunks within an FMP4 container; it's one that Netflix have been working on. It's how Netflix do a lot of stuff already.
Matt: That sounds like a turducken of terribleness.
Phil: There's nothing wrong with the box model, right? There's nothing wrong with the atom model. I'm not hugely opposed to the atom model, by any stretch of the imagination. It works very well. It's well understood as a reason we're standardizing around that stuff.
But at the same time, I don't see the Alliance for Open Media when they publish AV1, as it's going to be called, are really going to settle on a place where they package that using MP4.
Andrew: Like you said, it sums up the whole standards and adoption landscape, really. Isn't it silly, as we step one step forward, we take two sideways? And keep going.
So it does hold a whole level of hope, where now there's a whole lot of clock cycles that potentially could be saved in a few years, once everyone's on iOS 10, in terms of not having to generate that TSing code as well as the MP4ing code.
And really, if you're just having to generate another manifest, that by all looks at itself looks fairly straightforward. There's obviously always a devil in the detail in how it all works, but yeah, you almost translate your SIDX, which is your index in your fragmented MP4, straight into this HLS manifest format, which is an easy way to deliver these same files.
And also then having them say they've adopted the common encryption format as well, which as we mentioned before is my favorite little one, it's pretty encouraging, really.
Matt: How does that work without EME support? Explain like in five, how does that work without EME support in mobile devices like iOS? Which I assume, I didn't hear anything about that coming down that pipeline.
Andrew: You're going to need a native app to access the APIs and everything to play that FairPlay content, so that's generally what's going on. And how you need to set it up and authenticate your app and do all the key exchange and everything. But, however, that's pretty unusual to have a premium movie video outlet on a mobile without an app anywhere. So that's probably not so bad.
Matt: Yeah, that's fair. Finishing off with kind of the high-level things we want to talk about, Phil, do you want to tell us about CMAF?
Steve: What the heck is CMAF?
Phil: What the heck is CMAF?
Matt: It sounds like another standard. And if there's one thing I love, it's more standards.
Phil: The best thing about standards is...
Matt: You can always have one more?
Phil: No.
Matt: No standards?
Phil: There's just so many to choose from. The best thing about standards is there's just so many to choose from. Come on, guys. Call yourselves software engineers.
CMAF is one of these things that's been bouncing around in MPEG user groups for maybe a better part of a few months now. It stands for common media application format, for presentation of segmented media.
The idea is this is an extension on top of things, in particular DASH and HLS, but going to a place where we decide what formats are good for containing those things. Really it's kind of a standardization around ISO BMFF making fragmented media files.
It sets a load of limits, kind of one track, how the time spans work. It also starts to set up agreements around what protocols can be in there, what media types can be in there, what caption formats you use, and kind of how you do DRM systems and that sort of thing. It's quite exciting. It's really a standard that's a collection of standards, would be a good way of arguing about it. It's quite interesting, though.
It should hopefully get us to a place, again this is a dream, where everybody agrees on the same sort of things.
I think what's really interesting is Apple are involved and, as we kind of said, Apple were involved in DASH in the early days. This suddenly encompasses more than just DASH. Which is good. It suddenly still continues to have HLS in there, and that sort of thing. But I think it's certainly a stepping stone.
Steve: Wait, so help me understand where this fits in the ecosystem. Is it a competitor to DASH? Is it something that might be inclusive of it? Where does it fall in the scheme of things?
Phil: The way I would read it is DASH is part of CMAF, so we get to a place where someone can say they're CMAF compliant. Which would mean they can do DASH and HLS and MP4 and we can use the common media format, the ISO BMFF with it.
It's kind of mostly a collection of standards, which everyone agrees to a particular set of them and then hopefully everyone can say, "Are you CMAF compliant? Yes or no?"
Steve: So it is super opinionated, I would hope, right? It is going to say "DASH or HLS," not try and be an umbrella where it's like, "You can use either, as long as you do it this way."
Phil: I really think that, overwhelmingly, it's going to be DASH. I'm not 100% convinced where Apple is sitting in this. Apple's involvement is, certainly they're going to be standardizing, as far as I can tell, around the ISO BMFF stuff within CMAF.
There's some minor changes there, mostly restrictions. Nothing other than that, not too exciting, but really just focusing on that area around manifests and that sort of thing?Good luck. I'm not 100% sure how that's going to kind of play out.
Andrew: Yeah, it's got some fairly broad and enthusiastic goals, which is great. And obviously these things will take a while to work through, but it certainly I think it addresses some of the deficiencies of where DASH sort of leaves off and where some of the initiatives around the edges have been going on DASH at the moment.
Like, how do we improve that network delivery in the variety of different scenarios you can deliver over a network? Whether it's streaming, download to own, live broadcast, there's all that kind of stuff I guess hopefully it'll tackle.
And also there's a mention in there too of trying to tackle the other area where it's been somewhat challenging with DASH, which is the server and client-side ad insertion as well and bringing in a bit of a closer alignment with some of the other ad insertion standards like there's SCTE-35, VAST, VMAP, etc., which, for anyone who's tried to do ad insertion in DASH, it seems challenging at the moment.
Phil: Nonexistent would be phrase I would use.
Andrew: Some standards brains on it would certainly be good.
Phil: What amazes me here is we've just talked about DASH for one hour six minutes, and what we haven't talked about is the big elephant in the room, the big DASH problem. Anyone else notice that, or was it just me?
Steve: We touched on some of these things a little bit. Design by committee is always a problem.
Andrew: I think we could start with, hands up, who's read all of the DASH specs?
Steve: Nobody's hands went up, in case anyone's curious.
Phil: You realize we can't see each other, right? From our experience, we've got a lot of collective experience in the room about making DASH manifest, making DASH media, playing back DASH media, loading manifest, parsing things, building players.
The one thing that we will all appreciate is just how distributed DASH features are.
DASH is a huge spec that encompasses tons and tons and tons of features and tons of ways of doing things, and no two players really implement those sets of features.
Certainly, if you're on a Hbb 1.5 device versus DASH.js in the wild, versus Bitmovin's player, versus any number of the other players that are on the market, you're going to have a different set of DASH features supported. And that's a really big problem right now.
There is no one DASH manifest that's going to play everywhere. I think the bright side out of that is 99% of the time it is a manifest problem. If you have to generate seven or eight different manifests, or you have to dynamically generate your manifests, that's actually relatively easy.
Overwhelmingly, the media we deliver's going to be the same. I very rarely come across a device where I have to manipulate my media to make it play. It's usually "I don't like your particular way you do manifests." But that's a big problem, and there's been a bunch of work and a bunch of people try to improve that, right?
HbbTV 1.5 and 2.0 add limited subsets of the DASH specs, DASH 264 or DVB DASH, as it's called in Europe, does the same sort of thing. So I think, from that perspective, there are places that are trying to get better with it. But I don't think by any means a solved situation right now.
Matt: The fact of the matter is, there's probably 40 companies in the DASH-IF at this point. And they all argued to get their own little bits and bobs here and there in the spec. So you know, I think we've made our bed. Now we need to, well, we are laying in it, so...
Steve: Now we need to burn it.
Matt: Now we need to burn it.
Phil: What we need is a new spec.
Steve: SASH.
Matt: I think that that's probably all we should cover today in terms of the DASHcast. But looking forward, we've touched on things like the Alliance for Open Media, so it would be really interesting to get somebody in and talk a little bit about that. Maybe even go into more about DRM and EME and what exactly that means for both us as engineers working with it and the end user.
Phil: I would really love to get the guys from ESFF in on that. Let's sit them down with a studio guy on the other side of a table, metaphorically fight it out, why not?
Matt: I think that covers it for today, at least. So thank you Andrew for joining us all the way from Australia. I'd like to mention that we were across three time zones for this one. But thanks, Andrew, I appreciate it. This was great.
Andrew: Thanks for having me.
Matt: Well, until next time, everybody. This is the Demuxed podcast. Talk to you later.
Subscribe to Heavybit Updates
You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.
Content from the Library
Demuxed Ep. #19, Password Sharing and Patent Conflicts
In episode 19 of Demuxed, Matt, Phil, and Steve discuss video meetups, particularly the return of IRL meetups and the...
Demuxed Ep. #16, Real-Time Video with Kwindla Hultman Kramer of Daily.co
In episode 16 of Demuxed, Matt, Phil, and Steve are joined by Kwindla Hultman Kramer of Daily.co to discuss real-time video,...
Jamstack Radio Ep. #62, Educating Developers with Scott Tolinski of Level Up Tutorials
In episode 62 of JAMstack Radio, Brian speaks with Scott Tolinski of Level Up Tutorials. They discuss Scott’s experience creating...