1. Library
  2. Podcasts
  3. Jamstack Radio
  4. Ep. #26, Video and Image Optimization with Cloudinary
light mode
about the episode

In this episode of JAMstack Radio, Brian and Ben are joined by Robert Moseley, Director of Social Engineering at Cloudinary, an end-to-end media management platform for developers. The group discusses just how much of the internet is images and videos, and poorly optimized ones at that, and how tools like Cloudinary can make major improvements behind the scenes.

Robert Moseley is Director of Social Engineering at Cloudinary, an end-to-end media management platform for web and mobile developers. Previously Robert was Senior Solutions Engineer for Ensighten and worked as an analytics and optimization consultant for Adobe where he worked on tools including Scene7.

transcript

Brian Douglas: Welcome to another installment of JAMstack Radio. Back on the line for another episode is Ben.

Ben Mischenko: Hey.

Brian: Hey, Ben. And then as our guest from Cloudinary, we've got Robert Moseley.

Robert Moseley: Hey, Brian.

Brian: Rob, since you're a guest, do you want to explain who you are and what you do at Cloudinary?

Rob: I am Robert Moseley and I am Director of Solution Engineering at Cloudinary. Basically, on my team we take a look at customers, prospects of ours that are out there in the wild and figure out the cool things they can do with our tool.

Cloudinary, for those that don't know, is kind of like an end-to-end image and video management service. You upload your assets in the Cloudinary, then it's all API based. You can manipulate them however you see fit.

Brian: Rob, you mentioned that Cloudinary is video and image management. What sort of management are we talking about when you talk about videos and images?

Rob: Think about the web today, if you think of all the bandwidth on the internet, actually, 65% of it is images.

The web is built, you know all the web languages, so you have JavaScript and the various libraries out there. You have CSS and HTML, then you have the backend languages.

But a JPEG image isn't written in any of those languages, an MP4 video isn't written in any of those languages. So you actually have

the vast majority of internet bandwidth is these foreign objects that can't be dealt with with the standard pipeline of coding. You can't edit an image with JavaScript.

Basically what Cloudinary does is we've built an interfacing language where you upload your assets, whether it's an image or a video or Photoshop file or anything like that.

In Cloudinary, we give you a URL for it and then right in that URL you pass parameters, say like the width and the height you want, what crop mode you want, what format you want, what compression you want, saturation adjustments, brightness, gamma, and sorts of different really cool things like that.

Brian: Image handling is something that I've had practice with a lot for a lot of these Rails apps I throw together, a lot of tutorials. I know the basic Rails tutorials will have you add S3 and stuff like that.

But I just realized, Ben, since you're on, I know the Netlify CMS was a project you worked on, is actually going through a bit of a facelift. And there's also questions about image handling, within the CMS itself. Do you know if there's a solution for that yet?

Ben: Currently we're right about to roll out a media library that just the simplest possible thing and just stores image in Git. But we've actually been talking about building integrations for third-party services like Cloudinary because storing them in Git quickly gets out of control. So there's nothing solid in the CMS about that yet, but it's definitely something we're looking at pursuing.

Brian: I think for Netlify, the marketing site that I'm working on, we have the same sort of issue where a lot of our images are hosted within the repo. So every time you clone that repo, all those images come along for the ride. Which is a bit painful once we get to, we're at the Gig- mark now for like blog post images. So we've explored Cloudinary as a possible solution to make that work.

What are some other benefits that you could probably look to see about not just manipulations but for handling images on-the-fly, Rob?

Rob: Well, I mean, the big thing is it's like most of those just are static files. There's also the CDN that you have to take into account. So you could get all your image manipulation down perfectly but if you're serving them from some Origin server in Iowa, you know any of your visitors that aren't in Iowa are still going to get a really poor performance.

Distributing across globally in a CDN is something to take into account as well, and that's also just built right into the platform. Kind of going back to the management systems integrations, it's actually a super common integration because if you think about who's using a content management system. I mean, content management system were built so non-technical people could go in and edit content.

Brian: Yeah.

Rob: And that's typically who's uploading the assets. So take a media company, let's say like CNN or Buzzfeed or something like that,

they'll have an author that will upload an image to attach to an article but they're not thinking about what's the right format for this image.

Or what quality settings should I be using, or is this cropped correctly, is it the right size, and how do you even handle that if the site's responsive?

So basically the whole idea here is to give developers a tool to automate that entire workflow. Upload it, make an automatic decision, what's the best format for this image? What's the best quality settings for this image while maintaining the visual quality? Then how do you crop it and do everything else that you would do there. And then also distribute it across globally for CDN, because your visitors could be anywhere.

Brian: Yep, I read your blog post about the optimization. The actual title was "The Laughable Curve." Do you want to talk about that curve a little bit more?

Rob: Yeah. If there's any economics geeks listening, there's a concept in economics called the Laffer Curve. It has to do with government revenue and tax rates. There's basically, you can keep increasing tax rates, but at some point if you increase tax rates above a certain level, the amount of money that the government is taking in actually goes down because of the additional burden that the high tax rate has placed on the economy.

Basically what the Laughable Curve is, is it's very similar.

You can experience really, really good performance benefits by lowering and lowering the quality of your images. But at the some point the images look laughable. And it actually hurts performance or it hurts the user experience.

So there actually is a point where you balance the visual quality of the image and the file size and then how you discover what that is. And if you can find that point you can deliver your images at the right quality and balance the file size with the visual quality, you know then you're kind of completely optimized, at least as far as your assets go.

One thing that we like to do is run experiments because Cloudinary, all you have to do is put a parameter in the URL to generate the image at various quality settings. If you want it at 10% which would look terrible, you can do that, just put Q10 in the URL and it'll generate that image.

So you can quickly iterate and actually like run these A/B tests and find out for each segment, what do they prefer?

Are they willing to have kind of like a crappy looking image, if it means that the site actually loads for them? Probably.

For a desktop user, maybe they want a super, high-quality image that's triple resolution. You know, who knows, right? So we give you the ability to test these things and figure out what that is.

Brian: Yeah, so you're saying you can do A/B tests on the Cloudinary side? So you can see whether or not there's a certain point between 100K, kilobits, difference, or anything like that?

Rob: I mean, we can, but typically there's a ton of A/B testing tools out there already. So it would just integrate right into those. Because they all work either on the backend, something like SiteSpect can manipulate code on the server side, as actually a proxy. Or even on the frontend you can do that with like wtarget or Optimizely or any of those guys.

All we have to do to change the image that's being delivered is just manipulate the URL string and it tells Cloudinary to generate a new image.

So all those tools could do that so typically they have reporting and all that so we just set it up in there and let it run.

Brian: Our approach to Cloudinary, on some of my test branch that I have it working on, is we're using Gulp, we have to gulp tests, basically. One to upload the images and the other one to actually hot-swap the URLs for production. So when we run our production build we're using Cloudinary basically for production use only.

Is that pretty much the basic use case? The most basic use case for Cloudinary, getting it up and running?

Rob: That's definitely a pretty basic use case. It sounds like you're using the Gulp test to upload using our upload API, we'll actually handle and store the assets for you and in the upload response you get the URL to manipulate the URL to format the image and cut it and whatever else you want to do.

Another common way is just to remotely fetch your assets. Everybody has something that works, but nobody has something that works really well. And so ripping everything out and replacing is sometimes a really tall task. There's also the ability to just remotely fetch your images from where they already exist and then proxy them through Cloudinary and manipulate them and then cache them at the end. So it's another way to implement like that.

Then there's even more advanced implementations that marketing teams use, integrated with like marketing automation platforms or within their Google Display Network to dynamically render images with different content depending on who's actually viewing it and what you know about them. Which is kind of more of a marketing style use case, but still solves that bottleneck of actually creating assets the same way that we do for developers.

Brian: Did you get most of your image handling experience from Cloudinary or is this something that you had like more of an interest before you walked into this job?

Rob: Yeah, that's a good question. I definitely got a lot better at it since I worked here, but previously I was at Adobe. I worked on a product called Scene7, which does some image manipulations. Adobe bought it a while ago, I think it's rebranded in part of their content management system now. It does some similar stuff to Cloudinary but not nearly as advanced or as developer friendly.

Got a lot of experience there and then when I came over here, there's a guy named Yan, that actually invented the FLIF image format which is really, really cool. It has a lossy and a lossless component. NASA uses it for astronomy images. A lot of hospitals use it for things like X-ray images because it maintains a high level of detail while also having extreme compressibility.

Learning things like that from him has just been amazing and there's all sorts of people within Cloudinary, kind of like, hire the industry best, the absolute experts in the field, so we learned a ton there.

Another really cool thing that Yan did, when you talk about image optimization, we were talking about the Laughable Curve before. It's really easy in Photoshop to say, "I'm going to save this image at 90% quality. It still looks really good to me and now I'm going to save it as 80% quality. Okay, it still looks pretty good. Now I'm going to save it at 70% quality. Well, now it kind of looks bad.

I know that I want to be somewhere between 70% and 80% quality, but that process took me probably like three minutes to do. And so if you're dealing with millions of images and image formats, maybe even things like user-generated content, obviously that's not a scalable way of building web images.

What Yan did to tackle this problem is he actually built a perceptual metric, it's an algorithm that, it's almost AI, it sees like a human.

It recognizes the image artifacts that a human would see.

So things like a blockiness in JPEG, which we've all experienced when you're aunt shares a meme on Facebook that's been shared a million times before you'll always notice this really terrible quality and has like a blockiness, especially around the text and things like that. That's due to the photocopier effect. It's really, really present to humans when the quality gets low.

There's also things like color bleeding and ringing, and basically, this perceptual metric sees all these things like a human and agrees with a human that this image is really crappy-looking and that this image isn't good-looking.

Which allows us to actually scale out image compression to say that, for this particular image, I know what the image settings have to be. And I can do this with a machine as opposed to manually.

Now we can do this at scale, billions of images a day or an hour and compress them as much as possible while maintaining what looks like, to a human, to be a very good-looking image.

So he does all sorts of crazy stuff and there's some other stuff that we're working on here as well but, yeah, long answer to a short question. Learned quite a bit being at Cloudinary.

Ben: I was wondering, Rob, what kind of functionality do you guys have for like different image formats and transcoding between them? And what is your support look like for manipulations of different formats, is that pretty widespread?

Rob: Yeah, it's pretty widespread. I mean, there's only a handful of really web-friendly formats. Which we've all probably seen before. There's JPEGs, there's PNGs and there's GIFs. And those are kind of like the defacto web standard. All browsers support them and so that's pretty much what everybody uses.

But now we're seeing like more modern image formats. You guys might have heard of the WebP format Google developed. A really interesting story behind that image format. It compresses on average around 30% smaller than a JPEG but it's only supported by Chrome. So very, very few people are using WebP today.

Then Microsoft has an answer to WebP called JPEG-XR, similar thing, very cool. Compresses about 30% smaller than a traditional JPEG with the same visual quality, but only supported by Edge. So no one uses that format either. Then there's even another format called JPEG 2000 and that's only supported by Safari. And so you get like all these really interesting, neat modern formats nobody really uses.

One of the things that we can do is actually, each image, take a look at it and figure out what image formats can I use. For instance, if the user's in Chrome I know that WebP is an option, that I could deliver this image back and they'll be able to render it. If they're in Firefox I can't use WebP.

So we'll do like a user-by-user decision, and say okay they're in Chrome, I can use WebP, now I'm going to go and figure out, is actually WebP better for this image then do an on-the-fly format conversion? Because formats are really just compression algorithms for visual information, so they have different strengths.

Some images might be better as a JPEG. Some images like illustrations and logos are better as a PNG, and some images are better as WebP, and so on and so forth. So we take all that work off the hands of the developer and

you wouldn't know this stuff unless you're a PhD image scientist. So we're kind of like the PhD image scientist for you.

All you have to do is tell Cloudinary to make a format decision for me, I don't care how it happens just do it. So we handle all that for you.

Brian: I was going to ask about that. So I can upload all my images as WebP and Cloudinary will make the decision for me, like if someone's looking at Edge?

Rob: Yep.

Brian: Will they do that on-the-fly conversion for me as well?

Rob: Yeah, if you uploaded all your images as Photoshop files, like PSD files, we can still do that. Flatten it, rasterize in a web friendly format.

Ben: Oh, really.

Rob: Which actually leads to some really interesting use cases. We have one where it's a T-shirt builder, and so when you think of, they upload a Photoshop file with multiple different layers, there will be the model. So it's got a guy that's wearing a shirt. Then there will be like a shirt layer and then maybe like a texture layer or something like that.

We can actually take, and we do, we take just the shirt layer and manipulate the color, change it from white to purple or some like RGB value. We can do that on-the-fly. Now they really only have to upload one Photoshop file, then on-the-fly they can choose that shirt to be any color within the RGB color spectrum.

We can make that conversion and it's all flattened and rasterized and sent back as a JPEG or a WebP or whatever. But some really cool manipulations can happen with especially things like Photoshop and Illustrator files.

Ben: Nice, you were talking about your format auto-detection. Does that require the JS library, or do you sniff the user agent on URL requests?

Rob: Yeah, we go off the user agent and the accept headers at the CDN edge. So no JavaScript necessary, which if you're starting to use JavaScript, then basically you're saying, "I don't want my images to preload." And the performance benefits of using WebP if that's the concession you have to make are probably pretty debatable.

Ben: I was reading a couple of your blog posts that you posted and you touch on personalization a few times. In particular in one you were talking about how it can end up like A/B testing for a bunch of different sections can end up providing a lot of data that you can't really act on because the content can't keep up with that many divisions.

You mentioned some stuff about programmatically generating content. I was wondering if you had any more thoughts on that or if you could expand on that at all?

Rob: Yeah, that's one of the cooler marketing-type use cases, but it's kind of the same issue. It's content generation bottlenecks. Let's say that you want to personalize based on whether they're male or female or what was the last product that they looked at, if you're a retailer, or what was the last article that they read. Maybe he wants like a category affinity value.

Basically there's all this data that's being gathered by DNPs and whatnot about all your visitors. But if you actually want to show a relevant experience, you actually have to have somebody creating the content. So now we're back to that manual process where you can't do image optimization with a human being because it just takes too long for a human to do that.

You can't really do personalization with a human being and have them creating the unique content experience for each person, because it just takes them too long.

What ends up happening is you limit yourself and the amount of personalization you can do. Basically, the example I give in that article is Williams Sonoma. They have 10 top-level categories like "knives" and "cookware" and whatever else. That's the only thing they can personalize on.

If they're in the knives section, you can only show them knives stuff, but you can't create a hero image, in the right format, the right size, the right dimensions for every single knife product that you have. Because it would take somebody in Photoshop to do that and if you have 10,000 products that's way too much time.

So what Cloudinary can do is it can resize and manipulate and add text overlays and image overlays and stuff like that and you can come up with a single URL and just pump data about the user into the URL and then we generate that programmatically. So just lifting that content creation bottleneck, removing that from human hands and putting that in the machine hands.

It's kind of like instead of using a hand-pushed plow if you're a farmer, and now you have like a combine to do that work.

Ben: Nice, that's really cool.

Brian: Wasn't there a contest with CodePen not too long ago where Cloudinary was looking for the coolest use cases for some of this manipulation? I might be conflating two different things together?

Rob: Yeah, we do quite a bit of that. One of the fun things that we've been working on lately is machine learning and like AI stuff. One of the first things we came up with as just a fun project was style transfer.

Brian: Oh, yeah.

Rob: I'm sure you guys have probably seen some of this out in the wild before, there's a few libraries out there for this. We're the only ones that can do it on-the-fly.

If you took a picture and you want it to look like a Bob Ross painting, you can upload a picture of a Bob Ross painting, upload your picture of a mountain, and then apply the style of the Bob Ross painting onto your picture and the outcome is your picture but it looks like a Bob Ross painting.

Some cool filters and fun things that you can do with that. You can take a picture of a skyline during the day and make it look like a picture of the skyline at night. All sorts of different cool things. That was one of the competitions we did and we do all sorts of those things.

Once you open up the capabilities of image manipulation through an API and put it in developers hands, the possibilities become endless, really. There's a million things that we haven't thought of yet.

Brian: Awesome. One thing we haven't touched on yet is we haven't really dug too much into video itself. I know video files can be large and cumbersome to deal with and also generally for hosting it can be painful and costly. What's your solution around video at Cloudinary?

Rob: Video is in even worse shape than images today and you can pretty much go to any website on the internet right, and with video it's even worse.

The other day I was on Gopro.com and I noticed it loaded extremely slow. So I looked and they had one of those hero videos that runs in the background, underneath some text or whatever. That video was a 60 MB video file. It was insane.

One thing with video is that it's even harder for developers to try to handle these things. We've actually opened up our platform to video as well where we can automatically determine what's the best codec to use. So video formats like MP4 and WEBM and MOV are really just container formats. Within that there's different codecs or ways of encoding and compressing information.

As an exercise I grabbed that video and loaded it into Cloudinary and requested it with the optimal codec. Think of that as auto-quality for video. Then we went from 60 megabytes down to two and a half.

These are things that most people don't think about, because they don't know anything about video. Even less than they know about images. So it's just another way, the same idea. You upload the video, we give you a URL and then you can manipulate the video with that URL.

Take out still images out of it or overlay audio or extract the audio or concatenate videos together or add effects, overlay text and time. When that text shows up, change the format, change the codec. All sorts of really, really cool things that you can do.

Brian: Wow, that's pretty cool to be able to overlay text and things like that on-the-fly. We have a couple videos on our site that just kind of live there in the ether, but I'll need to see if we can optimize that.

Rob: Yeah, cause most likely somebody in Adobe Premier or iMovie that just saved the video and then uploaded it, right? And they probably didn't choose the optimal codec for web delivery. So you end up with a giant video that is way too big.

Another thing you can do with a video is adaptive bit-rate streaming. I'm sure you've been looking, watching Netflix. If you watch, like, Stranger Things 2, sometimes you'll notice the quality gets like really, really low.

You'll be watching and it'll be crystal clear then all of a sudden the quality will suck and it will be all grainy, that's due to adaptive bit-rate streaming where instead of Netflix buffering your video stream, they just choose a less heavy, lower quality, lower bit rate version to send you instead until your bandwidth availability increases. That's something that we can automate and support for you as well.

Another thing to consider with that, too, is you have an iPhone, it'll shoot video in 4K, but it can't display 4K video. So you shouldn't be showing a 4K video to screens, that it's the same idea with responsive images. If they can't display it, why? It's just every extra pixel is wasted bytes.

So showing it at not only the optimal dimensions and size but also taking into account their

bandwidth availability is a huge problem with video and it's something that we just totally automate for all the developers out there.

Brian: That's awesome. I'm starting to really geek out on the idea of optimizing videos, and also images as well, and seeing what I can probably do for my own sites. But I know we actually had a late start, so want to transition us to picks.

JAMpicks, these are picks, things that keep us going, things that we can either do while working or things we like to try out at work. I think most of the listeners got the gist of what JAMpicks are at this point. Rob, I think you actually have picks on our show notes. Do you want to go first and show the guest?

Rob: Yeah, sure. First JAMpick is the Lick Observatory. I'm in the area, California, right outside San Jose. If you look at the mountain, the biggest mountain, Mount Hamilton, you'll see some white buildings up there. It's actually the Lick Observatory, which was built in like the late 1800s.

I studied astronomy in college, so I'm a super big geek about this stuff. You can actually take tours up there and actually look through these massive telescopes and see some really, really cool things.

And then they also have amateur astronomers out there that have $20-$30 thousand,super-nice amateur telescopes. They show it to you for free, and so you can look at the nebulas and galaxies and planets. It's the rings of Saturn and really detailed clouds and stuff, it's pretty awesome.

Brian: You guys get a lot of clear skies down there in the South Bay?

Rob: Yeah, I mean, that's why they put it on top of a big mountain.

Brian: Because we have the same thing in Oakland, at Chabot Observatory, but it's cloudy every night. So it's almost useless.

Rob: I think the mountain's a bit higher and we're blocked from the fog and things like that by the Santa Cruz Mountains, so it's usually more clear.

Brian: Cool, and hopefully I'll be able to take my son down there sometime.

Rob: Yeah, it's amazing.

Brian: Cool.

Rob: Then also another thing work-related is using SVG placeholder images. SVG is kind of like a vector format. You can build images, there's libraries out there that mimic the layout and the shape, general shape, of a much larger JPEG image that you can use while you're lazy loading, your much larger JPEG images.

These things can only be a kilobyte, so it's been fun playing with that and figuring out cool use cases for SVG placeholder images.

Brian: Cool, I'll definitely check that out. Ben, you got some picks for us?

Ben: Yeah. My Pick is, I don't know if you guys have ever opened one of those LaTeX, or TeX PDFs. They always have that distinctive font, Computer Modern. What I didn't know about those was there's actually a whole family of fonts, and one of them is a mono serif font which is a rare find, which I've been using for coding lately.

I also found out that whole font family, somebody on their site has them all set up for web use. You can download the whole font family with all the formats you need for web fonts. I've been using that lately and the mono font is called Computer Modern Typewriter Text. It's really great if you're in a Debian-based system, you can actually just get it from the repos. But it's pretty fun.

Rob: You can actually even upload those font files to Cloudinary and use them to overlay on top of images.

Brian: Wow. Coming up with the hot tips. All right. More to check out.

My picks, actually, I've got two Picks. One pick is Super Nintendo. My son who's four years old was Mario for Halloween. And the only reason he knows Mario is because of Super Mario Run on iPhone, it's the only game he's exposed to Mario from.

I have a Super Nintendo, so I blew the dust off of it and showed him like the actual Mario. And he had a blast and we played Aladdin and Lion King. Brought a lot of memories back over the weekend, probably going to do that again next weekend.

My other pick is flame graphs, which is it's more of a Go thing. I think I learned it from the Go community, but you can actually do API requests and calls and see exactly or similar what you can do in the Chrome console. But this is more for backend, and we started using it when we were exploring GraphQL.

Our CTO added it to the project. I'd never heard of it and it's actually pretty cool. You can dig deep just like you can do in the Chrome Console between network requests. So that's flame graphs and that's my picks.

Rob, thanks for coming on and talking about Cloudinary for the listeners. They can definitely check out cloudinary.com to check it out, see if it's right for them. And see how it compares to their image handling today. And Ben, thanks again for coming on and sitting with me to talk with Rob.

Ben: Thank you.

Rob: Thanks Brian, thanks Ben.

Brian: And listeners, keep spreading the JAM.