Ep. #37, Dagger with Solomon Hykes
In episode 37 of The Kubelist Podcast, Marc and Benjie speak with Solomon Hykes about his new project, Dagger. This talk explores CI/CD in depth and examines the difficulties teams face when implementing it. Finally, Solomon looks back on the lessons he learned while building Docker and how they ultimately influenced the creation and trajectory of Dagger.
Solomon Hykes founded Docker in 2007 and served as the company’s CEO until 2013 when he shifted to the role of CTO. In 2019 he founded Dagger, a programmable CI/CD engine that runs pipelines in containers.
In episode 37 of The Kubelist Podcast, Marc and Benjie speak with Solomon Hykes about his new project, Dagger. This talk explores CI/CD in depth and examines the difficulties teams face when implementing it. Finally, Solomon looks back on the lessons he learned while building Docker and how they ultimately influenced the creation and trajectory of Dagger.
transcript
Benjie De Groot: Now you've got this thing called Dagger, and honestly it's really cool. I'd love for you to just tell us quickly what is Dagger? What's the elevator on Dagger? Then let's just talk a little bit about maybe where it came from, where you guys are at and where you're planning on going, then maybe a little bit of community talking because I know this is an open source project as well.
Solomon Hykes: Sure. So Dagger is my new startup. I started it with two close friends who were the earliest employees of Docker, and even DotCloud before that so we went through the pivot together and everything. Sem Alba ran engineering, basically, and Andrea Buzardi was the very first engineer and wrote the first line of code in the Docker repo, among many other things. He was the lead on Swarm.
Anyway, we got back together and built a team around solving more problems in this general space, and we started by just starting over as beginners. We just went out there and spoke to as many software teams as we could about their problems to make sure we didn't make any assumptions because 10 years is a long time, some things remain the same, others change.
So what we learned is that CICD is a terrible pain, and teams have basically stopped expecting from their CICD tooling to actually continuously integrate and deliver their product. They've given up on setting that expectation that everyone building your product can actually integrate all of their work in one place, and then deliver it all as a whole, and then do that continuously.
Literally, the words continuously integrate and deliver, that's not happening in practice. You've got pockets of it in pockets of the team, but as soon as you have more than one team shipping one cohesive component, so let's say you're large enough that frontend team, backend team, maybe AI team, mobile, whatever, you've got these silos now. Inside that silo, everything is great.
You've got native tooling, you're developing super fast, you're collaborating super fast, but now you want to run integration tests. You want to deploy a staging environment. You want to go to production. You want to do a bunch of things that involve not just your silo, but the other silos, so you want to integrate and then things break down.
It's slow, hard to test, hard to collaborate on, and so it's a constant source of problems. So Dagger aims to solve that by instead of throwing everything away and starting from scratch with a new buzzword, you just start from the CICD you have and improve it so it can actually integrate everything. The way you do that is meet developers where they are, so we trot this engine out called Dagger, on top of your existing CI platform and then you write code for it, you express your pipeline logic in code, and then you can run those pipelines on the developer's laptop and on your CI runner.
That gives you a few things. Developers can actually run the full suite of CICD pipelines early as part of the development process. They don't need to run all the pipelines all the time. But you have this shift left process where, okay, you're participating in the integration process early so things like getting an end to end environment from different pieces and them aiming your tests at it in an automated way becomes easy.
By the way, I'm going to plug Shipyard here. If you're using Shipyard, you can plug your Shipyard environment as part of the whole pipeline. The point is, there's a lot of tools out there and all of them need to be plugged into your CICD pipelines together, that's the whole point of the I for integration. You're supposed to integrate everything, and if you can't actually integrate everything from all your teams continuous, then it's not CICD. It's broken.
Benjie: Right. Well, thank you for plugging Shipyard, by the way. I will send you the check that I promised you at a later date and time. So that makes a whole lot of sense, and I will say, and I think Marc can talk to this even better than I can, yeah, continuous is a little bit more of a promise than a reality, is the way I would put it.
I think everyone aspires to it, but it's tough. So that really makes a whole lot of sense, that that's the problem that you wanted to tackle. When did you guys start? You started talking, I mean, this part is really interesting for our listeners, so you went out, you had the team, you had this awesome core team, you went and talked to a bunch of folks.
What was the impetus of saying, "I want to go raise some money, I want to start a company, I want to start this open source project"? How did you decide it was going to be open source? I know that you guys have made a few changes a little bit there, based on Q and all this other stuff. Just talk to us a little bit about that origin.
Solomon: Yeah. So in our case, I don't know how well this applies to a first startup, but coming out of a first one that was, let's say, very tense. 10 years of my life, lots of ups and downs, serious burnout at the end. The way we got the band back together was actually by talking not about a specific problem we wanted to solve, but about how we wanted to feel going to work every morning.
The team culture, the work environment, the values. Okay, what are we going to do with our life? And how do we build a team that's awesome, and can find the perfect problem to solve in the market? And then crush the perfect solution, and keep executing and scaling, and the whole time you're happy you're there? Because it turns out what I learned the first time is if you worry about the last part, you're there at the end, actually you're never going to get it because it's too late once you're scaling and everything is set.
The culture is set, the team dynamics are set. If you want to optimize for a group of people that are just awesome at what they do and they move as one whilst happy they're there, they love working with each other, and they have shared sets of values and a common culture, you got to start with that. So that was our approach actually, and so I think we were successful.
I mean, it's 22 of us now, still early, but what happens is we built the team culture we wanted and then over time we built the team we wanted, and along the way we've started this process of discovery. You look for problems to solve. Those things co evolve, the culture and the problem, and then from there the product. So that's the first thing.
The second thing is, well, pretty quickly this discovery process pulled us in the direction of what we were talking about before, deploying your app, integrating your app. A CICD problem, because it was just so obvious that the CICD is a terrible experience for every team above 10 people universally has bad CICD. Or CICD they've learned to live with. So then the process was I think two years of constant prototyping and iteration.
I think we shipped 60 prototypes and the first 30 was just the three of us, then the next 30 there was always an audience, always someone that externally we showed it to. Then eventually we got to an alpha product in 2020, and then another alpha product every three to six months, continuously iterating and changing things until we launched a year ago.
So March, 2022. That's when we launched Open Lane, and that was open source. But in 2021 we actually spent a whole year iterating on and shipping and even charging money for a proprietary cloud product, no open source. That was 2021, and then we learned that that was never going to scale in our case, and in our case we needed an open source engine.
So we backtracked and we basically open sourced half of the product as an open source engine, the one that made sense as an open source engine. The other half is an optional cloud service that we're shipping now actually. We started onboarding our first customers on that, and we haven't launched it publicly yet. But that's the other half of Dagger. Dagger as it exists today is a hybrid product, it's an open source engine plus a proprietary cloud service. But it took many iterations to get there.
Marc Campbell: I think that that story is super important, by the way. Off on a tangent for a second, you often see the Hacker News, it's like, "Here's this new project. Dagger." And you're like, "Okay, great. I understand this." But there's two years of prototyping and experimentation and validation with customers that led up to that really simple articulate message in a polished alpha product. There's always all of that work that leads up to that that you don't see.
Solomon: Totally, and we still got it wrong as we had to relaunch six months later. Benjie, as you mentioned, you alluded to that. Even after all of that, you could argue maybe we launched too late. We should've just gone to a broader launch sooner. I would actually argue that you should definitely do that. But either way, whether you're iterating in the open or in a partially closed community, what you said is exactly right. There's just endless, endless iteration and you never get it right on the first try. Anyone that seems to be getting it perfectly right on the first try, that's just an illusion. That just means they're good at marketing.
Benjie: Such an understatement, it's hilarious. All right, so let's just get a little pragmatic for a second because let's say that I am a loyal Kubelist listener. Hello, loyal Kubelist listeners. And I want to kick the tires on Dagger, I'm currently using Argo or I'm using Circle or I'm using GitHub Actions. What's a good place for me to start? What's a good, cool, little side project? Obviously I'm going to go check out Dagger.io, but give me some little, pragmatic, how I can kick the tires to see the power of this thing?
Solomon: Sure. Yeah, so the first thing is if you're a small team of devs and you're shipping an early version of your product, and you don't spend a lot of time doing DevOps or CICD work, you've set it up and forgotten about it while shipping like crazy. So relatively small, early teams, you probably don't need Dagger right now. Feel free to play around with it because it's cool, but the main target today is a slightly larger team that they've been shipping for a while now, they're starting to grow.
Maybe they just got their first full time DevOps hire, or someone from the dev team just has to spend half of their time or more on tooling and automation and CICD, and you start needed to retool. That starts being a bottleneck that's slowing you down in a bunch of ways. That's where Dagger can really help.
There's got to be some sort of pain related to CICD. Typically the pain can take a few forms. One way that it manifests itself is someone is spending way too much time changing YAML files for their CI, maybe it's Circle CI, maybe it's GitHub Actions, maybe it's GitLab or a Jenkins File. And then they think they got it right but they're not sure, so they're going to commit, they're going to push, and then they're going to wait, and then they get an error.
"Oh, I forgot a tab." Then they're doing it again. So you're back to the Stone Age of software development because it's software, your pipeline is software, it's 2023, you deserve to be able to run the damn pipeline locally in a second and know if you made a type before you commit it. That's the very basics. So that's a paint point, that's usually how you get started with Dagger. You're fed up with that and you hear Dagger can help, so you just show up.
Benjie: So I'm using, let's not name it, let's say Jenkins because that's open source and that's not going after any company.
Solomon: Yeah, all those are equally bad in that sense.
Benjie: Sure. No comment on any of those things. But yeah, they're hosted is really what it comes down to, and they're remote, and so the issue is that if you want to make a change, it's a relatively long cycle. Maybe a few minutes, which isn't that long, but relative to you developing code, for example, that starts to add up if you have a bad letter or your YAML file has an extra space or a trailing whatever. That stuff starts to add up and you don't actually get that feedback, so it brings this instant feedback loop locally for my CI so I know when I'm messing up a configuration file or something, rather than having to wait five minutes for it to run and then grab a coffee.
Solomon: Exactly.
So what happens is it's so paralyzing that basically you do as little of it as possible, and as a result you don't fully leverage the power of these pipelines because the continuous word, I would argue, is super important because as a software team you want to go fast, you want to ship improvements to your product fast, you want to adapt to changes in the market or requirements fast, and so you can develop the feature or the new thing or whatever it is. You get to have all the cloud infrastructure in the world ready. But until you have a pipeline that can continuously take the next change and automate the process, taking the line all the way to production, with all the steps in between, you're not done, basically.
The example we use these days in our sales conversation that comes up a lot is the AI feature. "Oh my god, we've got to add AI to the product. The devs have a prototype. Look, we have this agent that does blah-blah-blah." Okay. So now that's like saying, "We have a prototype of a new car." But you're not done when you have the prototype, you're done when you've shipped the car to the customers, and that's a manufacturing problem so you need the pipeline in the middle.
What's happening now is the experience of iterating and improving and testing those pipelines is so bad that everyone's settling for very basic pipelines and you're staying as far away from it as you can. No one is having fun innovating on their CICD pipeline, it's a chore. So it's a competitive advantage if you can overcome that, and actually make the process of improving your pipelines as fun and productive as the process of shipping the rest of the code because then you're going to ship those features faster, you're going to get a few more tests in there.
You're just going to get something out there faster, more reliably, it becomes a competitive advantage. That's the area we're trying to break down. We're trying to make those delivery pipelines, whether it's build, test, deployment, and all those custom workflows, it should be code so you should be able to run it locally. The second thing is the developers using those pipelines should be able to understand them and customize them, and sometimes create their own and they should be able to reuse each other's pipelines, even though they use different tools and languages.
So you want the frontend team, if they say, "I need a staging environment right now,"and so they have their own thing, using whatever, Vercel or Netlify, their own thing. Own build on everything. A bunch of NPM scripts. But now they need to do end to end scripts, so they also need to standup a backend environment, and then they need to run the backend test suite and they don't know how that works.
There should be a standardized way to do that, maybe the backend team is using Shipyard. Just taking that example randomly. Okay, but how do they trigger that Shipyard deployment? What's the test tool? They don't know. So anyway, you need collaboration across development teams. You need local execution. You need proper testing, and you need all that to be in languages that these dev teams understand so they can participate in it.
Benjie: Right. And so Dagger actually is in multiple languages, is that right?
Solomon: Yeah, exactly. There's a Python SDK, there's a Node.JS SDK, so JavaScript, TypeScript. There's a GO SDK. Then all of this is based on a GraphQL API, so really any language that supports GraphQL, you can write what we call a controller. So you can write 10, 20, maybe 50 lines of code for an elaborate project that basically tells the Dagger API, "Here is how to run my pipelines," and then those pipelines will run locally in development, and then also run on top of your CI runner, on top of Circle CI or on top of GitHub Actions, et cetera, and they're the same.
Benjie: So it's kind of just extending the whole concept of portability, again, the theme between... Not to simplify, but the theme between Docker and Dagger, besides the D, is portability.
Solomon: Yeah, portability.
Benjie: Portability of code, and now portability of pipelines, really.
Solomon: Totally. And by the way, it's all built on containers so part of this is we got containers now and it turns out that you can't standardize every application in the world on a single architecture, because there's too many different kinds of applications out there. Sure, a lot of them use containers, but all of them will never use containers and those that use containers will not all use containers in the same way.
There's going to be fragmentation. The application space is too vast. But all of those applications, I can't tell a software team what their stack will look like in five years, but I can tell them what their pipelines should look like because that's going to remain the same. Everyone's pipelines should be running in containers all the time with no exception.
There's no good reason not to do it, because a pipeline running in containers can ship any sort of application, whether or not it runs on containers. See what I mean? So we're focusing on containerizing the pipeline, instead of containerizing the app. If you happen to containerize the app, that's great. It's a great fit. But you don't have to. There's zero opinion on how your application should run or where it should run. The only opinion is you need a pipeline to ship it, and you need those pipelines to run in containers for portability, and then you need an API on top of that to express the pipeline logic in code.
Marc: So Solomon, I want to go back to the beginning when you talked about one of the things that just was really good at Docker was the community. Open source, building this community, really focusing on that and getting that traction. Can you talk a little bit about how you're focusing on community at Dagger now?
Solomon: Yeah. So we're spending a lot of time on it, community. We're doing what you would call community led growth. Right now if you're successfully using Dagger, the chances are very high, almost 100%, that you started out in the open source community and you spent some time there, you interacted, you asked questions, you helped other users and gradually you followed a journey to being more successful. That's very similar to what we learned at Docker.
I think the market is more sophisticated, more people are savvy in doing that, so it's expected. We do all the expected things, and we spend a lot of time being nice and helpful, and encouraging that engagement, building things together. The big difference for us that we've discovered is that Docker was a pretty single player tool, you can adopt Docker on your own very quickly.
Then later you'll pull people on, but you don't really need anyone else to use Docker in your team to be happily using Docker. At least for the first phase. Dagger is different, because if you notice how I pitched it earlier, it's really about a team solving a problem together that's fundamentally a collaboration problem. We're here to integrate everybody's work in one place, and like I said, if you're just one team doing everything with developing the same thing with the same tools all day long, you don't actually need anything but the most CI.
So that has affected our community and the dynamics of it because the people who show up, sure, they're asking questions like, "How does it work? And I have this problem, of course. What cool things have other people built? Here's a cool thing I built." But also there's a lot of conversations around how do I help my team get onboard with this? How do I help this part of my team who's using this language? How do I convince my boss? How do I convince the infrastructure team?
Or people are saying, "Someone is trying to make me use this Dagger thing, convince me." So it's interesting that the multiplayer aspect of it changes the community dynamics. But the fundamentals remain the same, the best thing you can do in a community is provide ammo and get out of the way. We have this biweekly community call where Dagger users show up and they show something cool that they've built or they did a presentation about how they got their team to use Dagger, and that's way better and more fun than a Dagger employee saying, "Here is what we should do."
So everyone loves it, we learn a lot, and then we share all of it on YouTube so it's great marketing content. If the product is a good fit for it, I mean, it's a lot of fun also because you're building the thing all together. You're hanging out on Discord all day with your users and you're happy to be there, they give you feedback. It's more fun that way, I think.
Marc: Yeah, that's really cool. Giving customers that platform to really get up there and talk about what they're building on top of what you're building, and you learn from it, they get an opportunity to share. That's awesome.
Solomon: Yeah. There's lots of little things you can do, so for example, one thing we do. Of course we're a remote team, that just happened because we started during COVID, and so we're on Discord. Our community is also on Discord. At Docker we were on Slack and then we had a community Slack, but they were separate.
This time it's the same Discord, so it's just a private area where the team hangs out, and then right there, just a few channels above, there is where we hang out with the community. So you're just in those rooms and switching back and forth all day, it's a completely integrated experience, and a lot of times we'll start a conversation in the private side, and say, "Oh wait, let's just discuss this on the open side," then we start over on the other side.
So it's very seamless, and it seems like a little thing, but actually those little things, they have an impact in how you structure your every day. There's lots of little, subtle things like that.
Marc: Yeah, I get that. We have a private Slack, but then we have projects that we have contributed to the CNCF and there's a public Slack there. In it, often that disconnect of leaving one Slack account and going to a different one, conversations end up not intentionally in private, but they're there and you're like, "Oh, there's a lot of effort to move it out." So that's actually a cool side effect too.
Solomon: You get it.
Benjie: So what about roadmap, Solomon? What's next? What's the big next six months to a year for Dagger? Do you have that worked out? Or are you still iterating, you think, to figure that out?
Solomon: Yeah, both. We have a plan and a direction, and we're also iterating like crazy. I think we just reached a milestone where we actually have paying customers now, so that's new. Because of course the open source engine that you see on our website and our repo, that's free, of course, and we'll never charge for that, so we've got to charge for something.
So one thing we did differently from Docker is we're starting the work much earlier to build a fully formed product and business because until you've done that, you can't guarantee to your community that you'll be around for the long run. So you're putting that community at risk of either ending up unsupported, or being a battlefield for fragmentation and drama later. So it's better to figure it out early, what the model is, and then you explain what it is and then people can say, "I'm in," or, "I'm out."
So we've been doing the work since the beginning of the year, we decided, "Okay. We're going to just sell something, and talk to our power users who are successfully using Dagger today in production with their team, and figure out what's missing and how we can solve these problems in a natural way and charge money for it." So we did that and we closed our first customer, so we're still very early.
But that forced us to ship a lot of product and also learn a lot, just listen a lot to what important problems do we solve that are valuable enough that someone will pay us for it. So we learned a lot, and so the next step is to take what we shipped and what we learned that is right now hidden at the bottom of our funnel with a relatively small number of teams that we're helping.
We'll package that into a story that's for the whole market, so that probably means a more clear explanation on the website of what Dagger is and also of course launching the full product, including the part we can charge for. So that's the next six months, because right now if you go to our website and the reality is it's going to resonate very strongly with, say, 1% of the DevOps community.
They'll see things like your pipelines are code and they run in containers, "Oh my god, that's exactly what I was looking for because it will solve these problems, X, Y, Z." But we don't exactly spell it out, what problems we solve for you right now. Actually I think the pitch I gave earlier which was a little rambly, that still it's better than what we say on the website.
The reason for that is the website we shipped six months ago, the latest version, and what I explained today, that's what our customers are telling us today. So they actually know why Dagger is valuable, better than anyone, and so we got to go and tell that story better to everyone else so that it's not just the 1% of the DevOps community that can get it and get started. But the other 99% of the DevOps community also because everyone has this problem.
Benjie: I think that it's interesting just in general, the way you approach stuff. It's clear to me that you've learned so many lessons over the course of your career in the way that you're approaching building Dagger and the lessons you learned from prior companies, obviously. So I think that's really interesting, and a good lesson that we're learning every day, I think all of us are, probably. It's not how you think about it or how you talk about it, it's how your customers think and talk about it. I have one technical question for you, what about caching?
Solomon: Yeah, caching is a huge part of this. The thing is when you say your pipelines should be code, okay, that makes sense. That'd be cool, if the dev team could understand what's going on and write their own, et cetera. But what API will that code target? If the API is just the underlying operating system, you're reading files and you're executing commands on your Linux machine or your Mac machine or whatever, then actually the result will be terrible because you'll have something very slow and the APIs are not a good fit for CICD pipelines.
So you need an API that's specialized and specifically that's what most of the two years of iterating was, just figuring out the right design for that. The right design, we think, is it should be a dagger. The Dagger API, that's why it's called Dagger, it's basically an API for describing a Dag, which is a sort of graph. Each node in the graph is a very simple operation, and then the lines, the edges between the nodes of that graph, are data flowing from one node to the next.
So one node, one operation might be execute this command in this container, or pull this Docker image, or pull this Git repo, or modify this file. Those are building blocks, turns out from those building blocks you can build any CICD pipeline if you have a good composition model and abstraction model. And so that's when the caching comes in, one beauty of that model is that if you do it right, if you engineer it correctly, each node you know all its inputs and all its outputs because those are the arrows.
Then you just compute and digest each input and then from there you record that, you do that same thing for the output, so you basically have a full map of everything that happened every time you run data through that Dag. Picture an assembly line in a factory or a supply chain, and in each run you're recording what happens and then that's a really powerful picture to have because basically you're looking at an X-ray view of your actual software supply chain.
Once you're running your pipelines on Dagger, Dagger will show you back what your supply chain actually is, and along the way it can cache it. So if it finds a specific node, if you ask for a specific node to process a certain set of inputs like, "Run this container from this image with this command and this blah, blah, blah." Then if it's done it before, it will just fast forward and give you the result from last time.
That's exactly, by the way, how Docker Build works because Dagger is built on the same tech under the hood that Docker Build is built on. It's called Built Kit and it does a lot of that magic. When you're in Docker Build you have the whole Dag of what to do. We basically decided that that is so powerful, you can do way more than build with that. You can do any sort of CICD pipeline, build, test, deploy, and all these upper tree workflows.
All of that stuff should be in a Dag, and all of it should be cached by default. The result of that is a lot of times when you switch over from your old school YAML shell script monstrosity... Sorry, artisanal script to Dagger, and just the caching, you get a massive boost in runtime. It just runs faster, like twice as fast, sometimes 10 times faster. It's really crazy.
We get a lot of the credit for that, but really we're forcing you into a model that's just way more efficient, and then we cache the hell out of it by default. So it's like switching from manually optimizing your assembly code to switching to a higher level language and the compiler does all of the optimizations for you. It's that kind of switch.
Benjie: Right. So Build Kit is the compiler in this particular analogy?
Solomon: Exactly. Yeah, Build Kit. We're wrapping Build Kit so over time we're adding pieces here and there, and we used to support Vanilla Build Kit, now we ship our own locked in Build Kit. But yeah, it's really 90% of the magic under the hood is Build Kit.
Benjie: So layers, really. We have layers. You've got lots of layers.
Solomon: Yeah. But Build Kit I think is one of the most underestimated open source projects in this space right now. It's just so powerful. You can think of Dagger as an effort to really leverage Build Kit to its full potential.
Marc: So Solomon, is there anything else that we haven't talked about that you want to share?
Solomon: One thing that I do want to say, I guess, is that if anyone out there is interested in the topic of CICD and specifically feels like we should look at it with fresh eyes, CICD can be a boring thing for many but some of us think it doesn't have to be boring. There's a lot of cool work to be done, and so if anyone out there wants to geek out with fellow CICD geeks who want to demand more, it's always fun to talk and exchange ideas. That's the most fun part for me, so just find me, reach out and I'd love to talk.
Benjie: Yeah, if you go to Dagger.io, you can get that button right to the Discord server. Apparently you can harass Solomon all day long on that Discord, so go find him. All right, cool. Well, thank you so much for coming on. I thought that this was a really great conversation, and I learned a lot. Didn't expect to learn as much as I did, but really learned a whole lot, and really appreciate you coming on. So thank you so much, Solomon Hykes, of Docker and Dagger fame.
Solomon: Thank you.
Benjie: We'll talk to you soon.
Solomon: Thanks, guys. Thanks for having me. It was really fun.
Content from the Library
The Data Pipeline is the New Secret Sauce
Why Data Pipelines and Inference Are AI Infrastructure’s Biggest Challenges While there’s still great excitement around AI and...
Jamstack Radio Ep. #112, WebOps with Josh Koenig and Steve Persch of Pantheon
In episode 112 of JAMstack Radio, Brian speaks with Josh Koenig and Steve Persch of Pantheon. This conversation explores WebOps,...
O11ycast Ep. #29, Testing in Production with Glen Mailer of CircleCI
In episode 29 of o11ycast, Charity and Shelby are joined by Glen Mailer of CircleCI. They discuss testing in production and...