Ep. #29, The State of Open Source & Docker Security
In episode 29 of The Secure Developer, Guy sits down with Liran Tal, Developer Advocate at Snyk, to discuss the state of open source, Docker security, and developer infrastructure.
Liran Tal is a Developer Advocate at Snyk and Node.js Foundation Security Working Group. He was previously Senior Software Engineer and Full Stack Team Leader at Hewlett Packard.
In episode 29 of The Secure Developer, Guy sits down with Liran Tal, Developer Advocate at Snyk, to discuss the state of open source, Docker security, and developer infrastructure.
transcript
Guy Podjarny: Hello everybody, welcome back to The Secure Developer. Today we have Liran Tal joining us here. Welcome, Liran.
Liran Tal: Hey, Guy. Thank you for inviting me.
Guy: Thanks for coming on the show. We have a lot of topics. We have a data-packed element here, Liran. You work at Snyk here. You ran a couple of security reports we're going to talk about.
Liran: Indeed.
Guy: These are the state of open source security, talking about open source security as a whole, and one on Docker security.
Before we dig into the details on those, tell us a little bit about yourself. How did you get into this wonderful world of security?
Liran: I've been doing that since an early age. Security has been a large part of my life since then, from the early days of 2600.com and NRC channels and stuff like that. I've been running my own BBS and getting acquainted with the digital world at that point in time.
I've been doing that for a while. Then in the last maybe couple of years I've been diving more into Node- specific security. I managed to write a book about it, with all of the best practices and guidelines for developers, how to secure their Node applications.
A lot of Dot and OWASP involvement and stuff like that got me into working as full time helping secure the system at Snyk.
Guy: We go from nerding out on the infrastructure to getting to securing the apps on it. Cool, let's dig into the report.
First of all we're talking about these two state of X security type of reports, what's the motivation for doing these types of reports in the first place? Why bother?
Liran: A lot about it is exposing this information out, sharing a lot of those insights and exploring the state of the ecosystem around security. And a lot about the report, the stuff that I like about it is it covers 360 degrees of the ecosystem.
It's not just focusing on application libraries or operating system, it's combining all of those different data points by serving people and asking them questions, maintainers and developers, about their practices in security.
A lot about it is measuring the state of security in open source ecosystems and communities, and of course the software that we have.
Everyone is building software based on open source. You can't run away from it, and it's good once we have the visibility into what's going on there.
Guy: There is a lot to unpack there. One aspect is you can't optimize what you can't measure. If we put some measurement on it, then we're able to maybe make it better.
Have you indeed seen conversations that flowed from those conversations? Like, from the stats that were published?
Liran: I think the Docker stats really surprised a lot of people, the idea and the notion that people are surprised by the amount of vulnerabilities they are shipping with their Docker containers to production is something that hits a virality points around social media.
People were surprised about the amount of vulnerabilities present in their Docker images, what they build their containers on, and that really made a good hit.
Guy: This is a bit of a recurring theme on the show, which is the importance of visibility in security. Security is naturally invisible, so doing things that raises it a little bit to your awareness.
T he fact that there is a security concern or let alone a security breach, is worthy of highlighting. Let's switch into the actual report.
First of all, how did we get to this point? What data got collected, how were the reports built up and structured?
Liran: We looked at different data sets. One of them is--the data elements of it are looking at Snyk related data from customers and all the users that we have, scanning how-- Looking at how people or developers use software in their projects.
How much of that is vulnerable? What is their versioning schemes around it? We also looked at open source repositories. Things like the ecosystems, the registry is GitHub related data itself.
Lastly, had a human element into it, which is asking several hundred people in a survey "What is their security posture around open source security, specifically?" Trying to measure a lot of the human element and blend that with the data that we collected.
Guy: Do you find the-- Was the data fairly consistent? Were there discrepancies between the data that we saw and the answers people said?
Liran: Not a lot of those. The general trend being that, there is an uprising trend in terms of more vulnerabilities that we find in open source libraries.
In terms of application libraries, operating system libraries-- The trend is up, meaning we are finding more and more each over your vulnerabilities in the ecosystem.
There is an interesting trend around the survey itself, whereas developers or maintainers state that they want to own security, and on average maintainers have 6.6 out of 10 skill level of security. That sounds like a medium, that's fairly high to rate yourself on an average.
At the same time when we asked maintainers on "How did they find out about vulnerabilities?" Half of them find out about it when other people are opening public issues on the project.
There's the effect of maybe people rate themselves as having a high security knowledge, but at the same time they don't find out about the vulnerabilities themselves.
Guy: Right, the actions don't necessarily speak for that. We start with good intent and then we proceed from there.
Liran: True.
Guy: OK. Interesting. Putting aside the fact that we as humans are pretty lousy at assessing our skill levels, we have these data sources.
So you run the survey, you ran the analysis on a bunch of data, you collected some data from Snyk usage. You collected all of those into these two reports.
Let's-- I know the reports are fairly intermixed a little bit, in the data and maybe the insights that they have, but let's still strive to break them apart. Let's start with this state of open source security report.
What are the key insights? We're not going to go through the full data set from there, but give us a highlight or three.
Liran: Sure. As you said, open source software maintainers, they have good intent but they're not well equipped to handle it. They do need a little help with tooling and stuff like that to help them raise this stuff.
Some highlights for example, only 20% of maintainers we asked say they have a high level of security knowledge. As we said before, only half of them find out about issues when users are opening security issues for their repositories, so there's a discrepancy there between how users rate themselves to what really happens.
If you think about it, when we asked maintainers "How do they audit your codebases to leverage information about vulnerabilities, to expose it?"
One in four maintainers don't even audit their codebases. That's truly a high number.
So, go through your packet JSON or whatever project, count one out of four dependencies that didn't go through a security review and through a security audit, code review and best practices and stuff like that. That's virtually not even there.
It's a bit of terrifying stats to have, especially in terms of the tooling that we have in the ecosystem, but specifically for Snyk that makes it really easy to find those vulnerabilities. We'd want to see more people leverage the developer friendly tooling that we have to raise those.
Guy: It's a scary stat and yet somehow not entirely unexpected. It's great to see maintainers care about it, but we have such high expectations from open source maintainers when you build almost single-handedly or a very small group, you build a piece of software that gets downloaded millions and millions of times a month.
The bar is high. Your impact, you're doing a lot of good, but that also means that the responsibility on your shoulders is pretty heavy.
It's good to see the intent's there, but it's indeed a little unsettling to hear about security practices not necessarily being there.
Liran: It's good you mentioned, we're going back to the intent part, because the positive or the optimistic stats to take out of it is--
When we asked developers "Who do they think should own security for their application?" 81% of them said developers should own that.
That's a really good trend. That's a high number in terms of the statistics and showing that full stack development, at least that term and that terminology brought with itself a lot of responsibility into developers.
Meaning they don't just focus on front end, now we have the full stack so its front end and back end. The DevOps thing is empowering developers to own their infrastructure. A byproduct, a lot of those obstructions on software owning a Docker file but not the actual infrastructure allows developers to take more ownership.
As they take more ownership around security, that's great. That's the good security posture that you want developers to have when they build applications.
Just have that mindset.
Guy: Indeed. You have to start there, otherwise nothing will happen. That's insight one on the maintainer side. Sounds like maintainers are well-intentioned. They do accept them as maintainers, and as developers should own this responsibility.
They potentially slightly overrate their security skills, but still a bunch of them, only about 3-in-10, if I understand correctly, rate it as high. Hopefully over time we see improvement there.
We did have Marten Mickos from HackerOne on the show talking about the internet bug bounty, maybe they can use those. I believe in last year's report there were a bunch of these examples and recommendations repeated this year around having a disclosure policy, and the likes.
Check out the report for concrete advice as a maintainer on what you can do.
Liran: Maintainers and developers.
Guy: Yeah, maintainers and developers as a whole. Indeed, let's shift a little bit. This is a highlight of insight on the maintainer side, what about the consumer side? What have we learned, what did you find out on the consumer side?
Liran: As developers we try to get insight into once developers develop their apps, they have a lot of CI infrastructure to run tests, run accessibility, stuff like that.
An important part of that is being able to address security as well and CI shifting security to the left as trying to raise vulnerabilities much sooner than finding out about them in production, or God forbid as a data breach.
One of the questions was "What security testing is being used in CI?" 37% of developers are not even at all using any security testing during their CI. We also asked about Docker images scanning, for example. Out of which only 14% of developers said they do Docker scanning.
Guy: 14%?
Liran: 14%, yes. It's a fairly low amount.
Guy: Yes.
Liran: It correlates a little bit with the ownership that developers have around DevOps. As in they haven't truly embraced the idea of owning the DevOps pipeline, with Docker file managing that.
Even if they have good intentions of owning that, there isn't yet enough visibility around what that entails in terms of security risk and stuff like that.
Other interesting information is as we asked them about "How do they find out about vulnerabilities in general in their CI?" Not just about scanning.
We learned that 36% of users use a dependency management tool to handle scanning, which is a fairly good amount of that, yet 27% of them have no automatic or proactive way of raising information around vulnerabilities.
There is still a large amount of users that even though tooling exists to help them find out about vulnerabilities and CI, they are naturally using any of that in their CIs.
Guy: Interesting. Only about one in four of a new stress vulnerabilities, pick on the infamous stress vulnerability, a new one of those. Or there was recently a jQuery one that came out, only about 1-in-4 developers would automatically be notified.
One of those, and a bunch of others would either not know about it or would have to do some proactive exploration to find that out.
Liran: True.
Guy: What about the data bit? I know one of the aspects in the report talked about when you consume open source most of the vulnerabilities came not through the libraries you consumed, but rather indirectly? Do you want to tell us a bit more about that?
Liran: That's a interesting insight that we were able to figure out. As you develop there are dependencies and libraries that you directly consume directly using your app, but to the nature of open source those libraries may be built on other libraries that other people have written.
Out of that we were able to conclude, out of data that Snyk has on users scanning their projects, we're seeing an interesting trend where most of the vulnerabilities are not coming from the libraries that you're explicitly using, but those indirect dependencies that they're bringing with them.
For NPM for example, that percentage is around 75-78%. Meaning as Snyk detects vulnerabilities for NPM projects, 78% of the time it's going to find it in nested dependencies and indirect dependencies that you didn't explicitly bring into your project, but they got brought in by other dependencies through the dependency chain.
It's a worry around managing dependencies in general.
Guy: You'll have to dig deep, that is, indeed we talked about the high bar for maintainers but we expect us developers to do a fair bit as well.
Not only do you need to care about the dozen or so libraries that you put in your package JSON file , those vulnerabilities in those would only account for again every fourth vulnerability that might be disclosed.
You're really-- Three out of those four would be in libraries that you never ever ask for. It's libraries--
Liran: That are just coming in.
Guy: Yeah. That those libraries in turn pulled in. A little bit biased here, but I'd say a little bit of the realm of the tools.
OK. On the consumption side we have this insight from the data side around vulnerabilities coming more from the indirect. We have it seems not a terrible but highly exciting amount of CI security testing, a little bit more daunting in the Docker security side.
Maybe indeed let's use this as a bit of an opportunity to shift to the Docker security report. Before we move off, i f somebody wants to read the full set of data for the state of open source security, where should they go?
Liran: It's at snyk.io/opensourcesecurity-2019.
Guy: OK. Let's switch to this Docker specific insight. You took this data you collected, or you took the survey data, you took some additional analysis of data, and you released this state of Docker security report. Really, container security I imagine.
Tell us what were the key learnings from there? You already alluded to a couple from the broader report, but what else have you learned?
Liran: Sure. Vulnerabilities wise, we just scan through 10 of the top most popular images on Docker Hub, which is a fairly popular central place to find your Docker images.
We went through that, going by the amount of popularity of images, took the top 10, scanned all of them with their default tags for that. Which is what people intend to use. We found out that for each of them, every one of those top ten, they include at least 30 vulnerabilities inside them when you use that.
Guy: Which-- Let me just echo that back. Every single one of the top 10 had at least 30 vulnerabilities on it?
Liran: True.
Guy: How come? Like, why? These are the top 10 most popular images on Docker Hub.
Liran: Yes.
Guy: Why would they not have zero vulnerabilities?
Liran: Good question. Choosing a base image is a crucial part of building your Docker containers. As we're learning, simply Docker images built themselves they have a lot of vulnerabilities inside them.
These could be general purpose images, for example, the node image has a little bit more than 30 vulnerabilities. It's something like 580.
Guy: Ouch.
Liran: Just a bit more. It hurts me, as I am a node guy. But--
Guy: If I'm not mistaken you're also on the Node Security Working Group.
Liran: Yeah. We're trying to get that, of course, but the roll stats are that as you scan the default node tag it's bringing 580 or something like that amount of vulnerabilities inside them. That's true, the idea that the base image, the default layer that the image is built upon is bringing with it a lot of libraries and tools.
Compilation headers and tools that you may not need that's linking back or dropping into your base image for whatever app you're building, because a default thing for you to do is build your app and start a Docker file from a node:10 or something like that.
This is where visibility is coming into play. You're not really understanding, or people may not be truly aware of what they're bringing in when they're doing something like a "from node:10" and they're bringing in a default image that has over 500 vulnerabilities.
Guy: It's interesting. You're describing a bit of a Russian doll of these base images element.
On one hand the fact that these top images have vulnerabilities is something you as a consumer of those base images should know, like you're pulling down a whole bunch of vulnerabilities you should know what's there and address it.
But you're also saying that those base images themselves, one of their key reasons for being vulnerable is because they in turn use other base images that also have vulnerabilities?
Liran: Yep.
Guy: Is that how we got to that 580 number in node world?
Liran: Indeed. There's the technical part of that. The Node:1 is built on a popular image that's called the build back ups. It's an image that is intended to build other base images upon. It's something that gives you a collection of cross compilation headers, you can go and do NPM install or jam bundle, whatever. It can do all of that by default.
The idea is to give you an out-of-the-box good experience when you build the base image. All of these tools work seamlessly, you don't need to install anything else.
As it turns out, node, in the default node image at least, is based on this distro which is Dubnium based and brings with it all of this goodness of tools. But at the same time, a lot of vulnerabilities that are pulled into them as well. There are better choices in building base images that we could make.
Guy: What are those, for instance? Two questions come to mind, one is what do I do? I still want to use Docker. I still want to use node. I felt like node, using that base image appears to be a fairly reasonable decision.
How do I help myself out here a little bit? Maybe a related question is, are there examples of base images that do it well? That maybe, dare I say, even get it down to zero ?
Liran: Right. Some considerations that you could do is try to see which tag brings with it lesser vulnerabilities, meaning there is node 10 but then there is a collection of tags that exist in this Docker repository that you can pick and choose, cherry picking your own specific flavor of the OS that nobody is running on.
For example if you were to choose node: 10-jesse, you would get this amount of vulnerabilities, but if you would choose node:10-seed or maybe even Alpine, you would get less vulnerabilities simply because those base images are built differently in terms of the dependencies and the libraries that exist in them.
Guy: OK, cool. Step one is to choose the variant of your base image that is the best.
Liran: Choose wisely.
Guy: Yeah, choose wisely indeed. How do I find out how many vulnerabilities are in each of those images?
Liran: You have it in the report, and you can use a tool to scan Docker images. Snyk is a way to scan your Docker images, it allows you to get this visibility into the amount of vulnerabilities you have across the image.
It even gives you an idea of which other similar based images that you can choose based on what you have, but different tags that would allow you to choose and use a different one that has lower vulnerabilities count.
It could be a smaller change for you to make or a major one depending on your preference of both mediating the risk as well as making sure that your application doesn't break and runs well.
Guy: Docker Hub gives you some stats about vulnerabilities as well.
Liran: It does.
Guy: It's when you scan, so Snyk would show you the specific "You're using whatever node, you can switch to node Alpine. That would reduce your vulnerabilities to so and so," but then also Docker Hub itself will at least show you if you browse around, book or manual, but if you browse around you can also see some of this data on the Hub itself.
Liran: They do expose some of that as well.
Guy: Cool.
Liran: I'm happy you're bringing this up, because there is however an interesting aspect of that, and that is containers-- Long living containers and stuff like that. Let's talk about it a bit, because it relates to the visibility of data.
If you go and scan your Docker image and see that it has maybe zero vulnerabilities maybe, or a low amount of them and you're good with that, you're shipping that out. That is one aspect of mitigating the risk.
The thing is, if you're using that image in production, you deploy it and it's running there for a week or two and during that time a new vulnerability surfaces that affects tools or libraries in that base image. You may not have visibility into that.
Some key takeaways around Docker images is if you can rebuild that image often, meaning monitor it and see that there aren't any new vulnerabilities surfacing on that specific image tag that you're using.
As you do that continuously you are going to release new containers into your production environment that have less and less vulnerabilities. We are not only releasing containers due to tests and fixes or features that we develop, but also due to the idea of lowering the attack surface due to vulnerabilities that we want to mitigate against.
One key takeaway is definitely to scan your Docker images that you have in production in development of the lifecycle, and then be able to deploy a new version that you can rebuild the image up on.
Guy: OK. This is quoting some stats from the report, correct me if I'm wrong here, one is we talked about choosing-- the stats from the report talk about the vulnerable images, select wisely using the tools at your disposal. Two is rebuilding often, which would get rid of--
How often was it? That if a vulnerability was found, it can be fixed by a rebuild?
Liran:
20% of the vulnerable Docker images can be fixed by a rebuild. That's what we found. You can mitigate that by rebuilding images.
Guy: That's amazing. This is a whole topic for a whole conversation on its own. It is a topic for many outside security, but the non determinism of Docker builds, that sounds like a pretty high ROI that if you rebuild often you can fix 20% of the vulnerabilities that you encounter.
OK, cool. That's a second takeaway, and a third was to scan in the first place.
Liran: Obviously, try and scan in the first place. We're seeing a positive trend from the survey around Docker images as well as we've seen around developers embracing security.
As a stat out around it, 68% of developers believe that they should be responsible for container security. So that's a good positive trend all around, yet 50% of those don't scan their OS layers for vulnerabilities.
It's the same conflict in terms of there is a good intention, but not enough action on it to mitigate the security problems.
Guy: To me, all in all, that's the way change happens. I know you start from appreciating that it needs to happen, and then you go from there. You can take the glass half full or the glass half empty a little bit here, but cool.
To read up the full Docker security report, check it out also on the Snyk website.
Before I let you go I like to ask every guest that comes on the show if they have one security pet peeve or security tip? A bit of security advice for a team looking to level up their security, what would that be?
Liran: Mine would be to try to leverage, coach someone from the team to be a security champion and empower them to take actions on that.
I found that previous experience leading teams, I found that very helpful and impactful to have someone from the R&D team owning security and loving it, trying to help the rest of developers through doing a lot of AppSec work.
Guy: Cool. Security champions is a really good theme, it's been mentioned many times on the show. Thanks for coming on the show.
Liran: Thank you for having me.
Guy: Thanks everybody for tuning in, and join us for the next one.
Subscribe to Heavybit Updates
You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.
Content from the Library
How to Start an Open-Source Project
How to Start an Open-Source Project Why is Heavybit posting a first-principles guide on how to create an open-source project?...
Navigating Markets in Open Source As Your Startup Matures
What to Consider About Market Forces in Open Source Why is Heavybit posting this extensive interview about how to navigate...
How to Think About Positioning for Open Source
Positioning Open Source for Your Community (and Yourself) Why is Heavybit posting this extensive interview on thinking through...