1. Library
  2. Podcasts
  3. The Kubelist Podcast
  4. Ep. #41, vCluster with Lukas Gentele of Loft Labs
The Kubelist Podcast
66 MIN

Ep. #41, vCluster with Lukas Gentele of Loft Labs

light mode
about the episode

In episode 41 of The Kubelist Podcast, Marc and Benjie speak with Lukas Gentele of Loft Labs about vCluster. Together they dive deep on how vCluster works and some of the new use cases that the project enables when running on larger, more mature enterprises. This technical discussion explores the problems Lucas and his team set out to solve when they created vCluster.

Lukas Gentele is Co-Founder & CEO of Loft Labs and a maintainer of vCluster.

transcript

Marc Campbell: Welcome back to another episode of The Kubelist Podcast. Today Benji and I are joined by Lucas Gentele, CEO and Co-founder of Loft Labs and maintainer of vCluster. Welcome Lucas.

Lucas Gentele: Awesome to be here. Thank you so much for having me on.

Marc: Well, thanks for joining. So to get us started, will you just share a little bit about your background, kind of leading up to how you started Loft?

Lucas: Yeah, we originally got started with an open source project called DevSpace. It's a CNCF Sandbox Project. By now we handed it over to the CNCF kind of governance model last year and we got that project started in 2018, originally with the goal of building DevSpace Cloud a task solution.

I feel like a lot of founders in our space, like when you think about Docker and Docker's roots, right? Everybody has to build a path first, fail with their path and then pivot to something else. So we've pretty much done that exact same journey. But one thing that we realized while building DevSpace Cloud is multi-tenancy is really hard in Kubernetes.

We were running a multi-tenant Kubernetes set up ourselves and while people didn't want to pay for our path to be the solution for handing out Dev environments, et cetera, we are wondering, they still need to somehow do that. They still need multi-tenancy in their own Kubernetes environments. How do they do that?

And then we started investigating and that wasn't a really great solution for this. So we launched another open source project called Kiosk that essentially did a lot of multi-tenancy plumbing, we called it multi-tenancy extension for Kubernetes. And that became pretty popular quickly. And AWS put it in their best practices guide for multi-tenancy.

And we're like, "Oh wow." So AWS team also sees other as a great answer. And then we started investigating, okay, what are actually the limits of Kubernetes and what do we need to do to overcome this? And the results because Kiosk was a lot of automation rather than actually reinventing Kubernetes. But vCluster was taking it a step further and actually solving a lot of the underlying problems that Kubernetes multi-tenancy has. And that's how our journey got started.

Benjie De Groot: So wait, we're going to dive into all that stuff in a second, but I want to back up even a little bit further. Lucas, just tell us about, I sense a German accent here. I've known you for a little while now. By the way, super happy to have you on. Mark, I don't know if you know this, Lucas and I have run into each other and both awkwardly been like, yeah, we're going to set up our tubeless.

Like we almost have set this up like 100 times. So I personally am super excited to have you on, but I'd love to know a little bit more about where you came from, like when did you get into your stuff and in general, not a resume per se, but just like a high level thing and then also specifically where and how you discovered open source and how you decided that you wanted to contribute to it in the first place in DevSpace and eventually Kiosk and that should vCluster.

Lucas: Yeah, that's an interesting question. I think when we got started with open source with DevSpace in the very, very early days, we had no clue about open source. So I mean we did occasional contributions obviously, and comments in open source projects, but we didn't know how the whole open source as a core foundation for starting a company would work out, the intricacies of everything related to the CNCF and that project graduation model, I think everything was much, much earlier when we got started.

The roots of all of this is and you're right, like I'm a German born and raised in Germany for the majority of my life, and then moved to first to Berkeley for a SkyDeck Accelerator Program when we were still building the past. That's how I made the transition over to California and then been living in San Francisco for the past five years.

But the origins of building DevSpace essentially come from me starting a business in college. Since I'm 16 I've been starting to build websites and online shops and those kind of things as a kind of single person company is what I've done for several years. And then oncologist started hiring every good engineer I met and we got deeper and deeper into the stack, obviously with the rise of containers, Docker, then, obviously things like Mesosphere back in the day, right?

And then Kubernetes, there were just so many exciting topics that we took on a lot of these contracted jobs to work on cloud infrastructure for different SaaS companies or small medium sized businesses, right? We were essentially just this team you could hire to build custom projects with. Nothing product wise, right? It was purely a services business.

And given that we were getting really deep into Kubernetes in a lot of these projects because that was a very fascinating technology that seemed like the future to build things on. We had this problem of building on Kubernetes wasn't easy and there needed to be a way to be faster and more repeatable and just having a better setup on how do I update an application and iteratively deploy and develop it.

And that's when we started building DevSpace and then we launched it on GitHub not knowing anything about how to successfully launch an open source community, et cetera. But yeah, a couple of folks detected it, including Michael Hausenblas, which was a Red Hat back then. He put it I think in his newsletter. And then a whole bunch of folks discovered it through that and we're like, "Oh wow, this I guess is the power of open source. This is really cool."

And I guess we're hooked on open source ever since. My co-founder, Fabian and I, since we saw that initial spark and all these ideas coming back at us about what to build in DevSpace, we're like, "Oh wow, open source." That is the way to build software going forward.

Benjie: So what year do you start your career as a consultant or professional service person building websites wherever you can. What year is that?

Lucas: Oh, I think that started way back in like 2008, I think. Again, like I was still in high school at that time, right? I got going.

Benjie: I mean this is a lot of our audience. It's a similar story. I was a very high priced Drupal consultant a long time ago, which is embarrassing, but I'll tell the truth.

Lucas: Oh, in the early days I did a lot of things with PHP as well.

Benjie: Yeah, yeah. Everyone has to start somewhere. But yeah, so, okay, so you started off that. So 2008, obviously that's way before DotCloud, let alone Docker. And so then as you kind of got more and more advanced, you built out this business. Where in Germany were you out of curiosity?

Lucas: Yeah, so I went to college in Mannheim at the University of Mannheim. That's where I met my co-founder Fabian as well. And he was actually one of the first people because as I said, I hired every good engineer I met, right? And he was one of the first people that I wanted to hire.

He was already working for SAP at the time and they paid him really well because he was a very talented engineer. He was kind of a student doing part-time work for SAP. It took me the longest amount of time to actually convinced him to give up his job there and start working with me on this much more interesting cooler projects.

And obviously when we had the idea for DevSpace and starting this journey of building the past ultimately years later, he was obviously the person I turned to, to say, "Hey, do you want to co-found this with me?"

Benjie: Right, and so as you start building more and more, you start seeing these problems. You're excited by all these things. Tell me a little bit about DevSpace just a 30-second. What was DevSpace? What was that?

Lucas: Yeah, I think back then there were two projects predominantly out there to develop applications with Kubernetes, right? Obviously everybody knows how to develop with Docker, right? Docker did a great job of Compose as this kind of tool to run locally to connect to your docker instance with and hook in your volume, right? Surmount your local laptop into a container and then iterate over it. Very smooth experience.

And we were obviously targeting Kubernetes and we're thinking, okay, why doesn't something like this exist for Kubernetes? We're doing some research and there was a project out there, Draft, which I think the Azure team got started on. And then there was Scaffold out there, which still exists. So today Draft is discontinued as a project at this point. And we looked at both of these solutions and it did the same thing.

What they did is whenever you change that, they were basically watching your code repository. And whenever you change the line of code, or I think sometimes you have to press enter depending on like how you could configured it. It would rebuild your images, tag them, push them to registry, and then instantiate them in your cluster again.

Which obviously is a much, much longer process than just changing a file and seeing it immediately in your mounted volume inside your Docker container, which is what Docker Compose had. And then we are thinking, "Okay, what can we do about this? How can we actually create something that is a lot faster than this?"

So we investigated how can we maybe mount the remote volume from your local machine into a Kubernetes cluster that doesn't run on your local machine. Today, obviously running Kubernetes locally is super painless with Docker.

Benjie: I'm going to have to interrupt you on that one, Lucas. We're a fact checking organization here and I know--.

Marc: Less pain, how about less pain?

Benjie: Less. I'll give you less pain. I'm not going to. Sorry, we stick to the facts here, okay? And that is a false opinion. No, I'm just kidding. I didn't mean to interrupt you, but I'm going to have to disrespectfully say it is still a pain in the backside. No, but I didn't mean to interrupt you. Keep going, keep going.

Lucas: Yeah, I'll take that back. So let's say there's more options and it may be easier to spin up a locals cluster today, but back then we were really optimizing for what if your cluster runs remotely? Because that's what a lot of people had go in at that time.

And obviously mounting your file system into a container in a remote system that runs across different VMs is really challenging. So we came up with the idea of like, how about we synchronize the code into this pod and then we built the application in that pod rather than you skip the whole image building and pushing process. You don't need a registry in the middle, right?

How about we enable this hot reloading experience but just synchronizing your local file changes directly into the container. And obviously that's only for development. It's not something obviously in production you want to have a single binary in a distroless image, et cetera.

But for development, how about you have set up this Dev container, right? Where you then synchronize things to, that was the original idea of DevSpace. And then it evolved into how do I do dependencies and need different helm charts?

How do I orchestrate microservices across eight different Git repositories, right? So many different requirements came up afterwards as the community were like, hey, this is my setup, what can I do about it? And DevSpace really was designed to be this Docker Compose, but for Kubernetes that's essentially the pitch for it.

Marc: Yeah, there's definitely like a long tail of problems that you have to solve when you do that. I actually remember early days of Replicated, we were doing the same, right? We adopted Kubernetes obviously, and then we started giving every engineer to your point it was painful to run Kubernetes locally on a Mac and the number of microservices-- just like you'd run out a memory, see it was an intel so the fan would run, the computer would get hot all the time.

Lucas: Oh yes.

Marc: And so we started giving every developer like a VM in the cloud and then running Scaffold and then using VS code remote extensions. You actually weren't synchronizing the code, you were just like your terminal window was an SSH session into that pod that was running and that was a one way to solve it. But to your point, DevSpace and others, there's other tools.

Tilt was around for a while that we actually spent quite a bit of time with really innovating and making this experience better because fundamentally there are differences. You mentioned kind of one key one there, you want to run like a distroless image that's like zero vulnerabilities. You don't need a bunch of debugging tools in production, but you definitely want all those in Dev and you want to run Kubernetes in Dev and in prod because you're going to just detect problem. The closer you match your production environment to Dev, the better you are.

Lucas: Yeah, absolutely. And I think we were really proud to be so early in that space and invent this synchronization mechanism that didn't exist. And we saw over time Scaffold adopted that synchronization mechanism. Tilt ultimately adopted it as well. And there's a whole bunch of other solutions out there like Okteto and Garden and probably 50 other projects exist that do exactly this piece today.

And then I think there's a really interesting project out there as well, Telepresence, right? Which is CNCF project, which was super early as well. They address the problem similar to what we are thinking about as well in terms of mounting volumes in there and proxying traffic and running. They actually ended up with running one service locally and then proxying traffic to the other microservice that run in your cluster is definitely another approach as well.

It's still not a solved problem these days and there's no, I would say golden path, but there is a lot of options now in a lot of different tools that people can use to get there much faster without having to reinvent everything and do a whole bunch of custom plumbing.

Benjie: Yeah, I think that it's really matured. And a call out, we had Richard from Telepresence on Kubelist episode eight, and we had Nick from Tilt episode 25. So if you guys want to listen to some old backlog episodes about this problem, but let's keep it going.

So you attempted to build a PaaS off of this particular DevSpace thing and what ends up happening is what happens a lot with PaaS, like you said and you did not really get the traction that you were hoping for and you were struggling to monetize. Is that a fair assessment?

Lucas: Yeah, I think individual developers love the experience of DevSpace Cloud because if you don't get a Kubernetes cluster by your company and running it yourself locally is a pain, then getting this isolated kind of namespace, which is what we were offering at the time in a hosted environment with a generous free tier is obviously super appealing option.

But when it came to how does your company adopt this now and make this the standard and pays for these environments, that was a really critical piece that I think is very difficult because then IT gets involved, folks that manage your AWS accounts, right? Folks that want to have more control over that infra and despite obviously us running on top of AWS as well, right?

It just was much more difficult for them to have that layer in between. So I think commercialization really was a problem. And then we realized, okay, they seem to be interested in experience. There seems to be issues of handing out namespaces and fractions of a cluster. Handing out entire clusters is something that some companies have resorted to over the years, but it's very, very expensive.

So what is a great solution, right? And that's when we started investigating this with Kiosk and then ultimately with vCluster multi-tenancy solutions for Kubernetes and how to address that for a company that wants to run this and build this themselves, but on top of solid technology and solid building blocks rather than using a PaaS or doing everything completely from scratch.

Benjie: So vCluster is I think why we're here to be honest with you. But I do want to understand Kiosk a little bit because I think it's interesting. What does Kiosk do very quickly and why didn't you go further with Kiosk and why did you decide to do vCluster by itself?

Because I think understanding your open source journey is really interesting. Just looking at going from DevSpace, which looks like it has a whole bunch, still has activity, good amount of activity, has a bunch of contributors, over 200, 300 contributors, you then decided, okay, we can't monetize this, let's get Kiosk going. And then what happened there and why did you then go to vCluster? Like what's the difference? What was the thought there?

Lucas: Yeah, I think Kiosk was an experiment, right? We realized okay, the past doesn't work so people don't want this as a service, right? But they do want to have that platform. They just want to build their platform themselves. And today actually looking back, right, everybody's speaking about platform engineering and that literally is exactly this building, this platform.

And the question is, what are the building blocks to build your platform on top of? And Kiosk essentially was helping you with automating the process of how do I provision name spaces? So we had this abstracted resource called Space, which effectively represented a namespace, but it set up a whole bunch of things when you created it, right?

So it's a controller, it runs on your cost Kiosk and they created this space resource, and however you create that, right? That's up to you how you expose that via CLI or spin it up via Terraform or RCD on a pull request, right? Like that's up to the company and how they set up their platform and their processes.

But ultimately once that resource gets created, a controller reconciles it and actually creates the namespace but not just the namespace, it also sets up the rbac and the network policies and the resource quotas and all these other things, right?

And then essentially helps you to streamline that process of like, okay, we want to hand out namespace as a service, right? And we want to build that as a platform where Kiosk is a great building block to start with so you don't have to start from scratch. That was the idea. And that was for us a good way to test the market.

We wanted to test, okay, the PaaS obviously is something that companies won't adopt. Individual users love it, but a company doesn't say, hey, we're going to build on this, right? And everybody is using this PaaS, so what about kiosk? What about we give them this building block, will they build a platform themselves and run it in their own AWS clusters on top of this?

And as soon as we saw that experiment go through the roof, we started working on vClusters as next level experience because ultimately a namespace is very restrictive in comparison to getting your own cluster. So we try to figure out a way to solve that then the result is key cluster.

Benjie: So with Kiosk, how did you test it Lucas? I think that's a good question. A lot of people in the open source they have a product. How did you test it? How did you? What were the signals?

Lucas: Obviously on Reddit, we wrote a couple of blog posts about it. I think we posted on Hacker News potentially, I'm not 100% sure even, but just like the classical channels where folks hang out. At that time we already had a lot of experience with with running DevSpace and we saw where other people were spreading the word around DevSpace. When we got started, we had no idea, right?

But then from the experience of DevSpace, we knew exactly where do people discover these kind of tools? And then we were posting about Kiosk there and we're just looking for the responses and the response was really, really positive. People are really curious.

And what got us inspired was really a lot of people started suggesting features and where like an AWS started recommending Kiosk, right? And their best practices guide for multi-tenancy. And we're like, okay, so there's definitely a need here for a multi-tenancy solution, but Kiosk is just the first step in that direction.

Marc: So I want to go back and talk about the multi-tenancy model. And some of the problems. I think it's going to be a good transition into what you're building and focused on now, but you kind of set that up and talked about that developer experience and a platform team being able to have it.

But did you find that there was, or did you think about upfront and target or just like also support the use case of distributed teams running it in, like Kiosk in production for different parts of the organization to have namespace as a service, right? And then there's full autonomy for the teams to be able to manage their application, but still some layer of shared platform experience for the platform engineering team to manage the infra.

Lucas: Yeah, I think that's always the difficulty, right? When you're looking at platform engineering kind of initiatives these days in a lot of the G2000 companies, it's always about standardization and golden paths and what is the recommended way. But then at the same time, you don't want to take the freedom away from folks, right?

You want them to move fast, you don't want to be a roadblock for them. You want to enable that velocity for folks. And I think that's a fine balance. That's I think why vCluster is a much nicer approach than anything namespace based because in a vCluster you are ultimately admin, right?

That's what we recommend for folks to do. You hand out the vCluster, it's like a VM inside your root, right? Like you can do whatever you want, but you run on the shared underlying physical machine. The same happens with a virtual cluster. And then I think the beauty is because it's still running on the same underlying cluster, you can now share an Ingress controller, but if this particular tenant wants to test out the newest version of Istio, they can roll with that as well and deploy it themselves.

So you can deliver for 90% of use cases and developers, you can deliver something out of the box as an organization and provide some standards, but you're never stopping somebody from, hey let me try out Argo for deployments instead, if you just hand out a namespace, well, you can't even deploy Argo because it has CDs and it needs to be installed at the cluster level.

And I think that was one of the biggest things we realized, what happens if people need to do things across namespaces? What if they need cluster wide resources and CRDs? And CRDs became very, very popular at that point in time when we were actually in the process of launching Kiosk and starting to think about the idea of a virtual cluster.

Marc: Yeah, and if not like obvious here, one of the challenges is our CRDs have to be installed cluster wide. So you end up needing pretty elevated access to install and manage them. Also, if you're installing namespaces and just managing who has access to each namespace, there's going to be conflicts between versions and expectations there.

Which totally makes sense as to why you took it one layer deeper. But I'd like to understand a little bit more about that architecture of vCluster. Let's say I understand hypervisors and virtual machines and I understand containerization C groups and that layer of run times, vCluster seems like it's got a foot in both. And a little bit of ability to blur that line a little bit. Like explain the architecture a little bit. How does it work?

Lucas: Yeah, it's really important to understand that if you want true complete like hard multi-tenancy, you're going to need vCluster plus x, right? So if you want isolation on the virtual machine, right? That runs your pods, right? Then you will want to look into solutions like cutter containers, right? Or micro VMs or I think there's a concept of noteless Kubernetes, right?

There's a lot of options there to essentially address that layer of isolation on the VM level because it doesn't necessarily tackle that part. But it is the layer on top that actually allows you to share a Kubernetes cluster itself. Because one of the biggest things like even let's say you're using cutter containers, right? Your containers are neatly isolated now, right?

But the problem that is still shared is the Kubernetes control plane and the cluster right resources, right? With the vCluster, what you're doing architecturally is you're launching another control plane inside a pod and then you're exposing the service of this pod through Ingress, load balancer, right? And that's the API server that you're now talking to.

So if you have an EKS cluster and you have five V clusters on top, you're not talking to EKS as API server anymore. You're talking to each individual vClusters, API server that can be exposed to be an Ingress for example, right? And the beauty of this is you can have a different Kubernetes version, you can have have a different Kubernetes distro running in your vCluster completely independent from the underlying cluster because you have your own control plane that is encapsulated inside a container and it has its own state as well.

So besides the API server and the controller manager that we take from any of the so distros that we support, which currently is K3s, K0s, vanilla Kubernetes and EKS, we take those two components, but we also attach a data store to it. So you can use an external, its a dCluster, you can also use SQL light in a persistent volume mounted into that control plane pod if you're using K3S. So it can get super lightweight but it can also get more heavyweight by attaching an entire at CD cluster.

But it has its own state. So when I'm creating something in the vCluster, it has at first no impact on the underlying cluster. But once I create a pod, right? So let's say I create a COD which creates a stateful set, which then creates a pod, right? That pod will end up on the underlying cluster.

The other resources are entirely capsulated inside the vCluster state and will never reach the underlying cluster. And since the pod itself runs on the, we call it host cluster that is underlying cluster, you then have to make sure if you run true hard multi-tenancy, that you use something that cut the containers, et cetera, to then isolate these containers from each other.

Marc: But that has to be on the host cluster is where you need that layer of isolation. So I can create a large EKS cluster, secure it the way that I want to. My platform team or my SRE team is managing that. If multi-tenancy is a core goal, I can throw containers, gVisor or something along this on there. So that's the host cluster, what do you call like the other cluster?

Lucas: That's just a virtual cluster, right?

Marc: Cool, and so how are you doing that? So normally right, like the cluster has the CRI getting a little bit technical for just a minute, not too technical I don't think, but the cluster has the CRI which is responsible for actually scheduling the workloads, which we mentioned a few, but the kind of the common one with those vClusters that you mentioned would probably be container D. If I install K0S or K3s inside as my vCluster, how are you making that pod run on the host cluster?

Lucas: Yeah, that's an excellent question. So all these higher level resources, right? Whether that's like CRDs or stateful set deployment, right? All of these are just entries in your data store, right? Nothing needs to actually be run and scheduled.

But as soon as you create a pod, which ultimately always kind of is the goal in Kubernetes, then the question is how does that pod get scheduled? How does it actually land on the machine, right? How does all of this work? In a vCluster We don't have a scheduler. Instead we have a so-called syncer, and what that syncer does, it has a cube context to both the underlying host cluster and the vCluster. So it watches in the vCluster and it's all within that vCluster container.

It's kind of baked in there. So it runs alongside that API server and that controller manager and it swaps out the, the scheduler and then that syncro essentially watches if a part is created, right? And what it does instead of scheduling it to a node 'cause the vCluster, you don't have to, if, if we had a regular scheduler you would have to attach nodes to the vCluster, which is a lot of complexity and that would make it a real cluster ultimately not a vCluster anymore.

But what the vCluster does, it doesn't have any nodes, it just copies the part down to the host cluster. That's why it has two quick contexts. It has a context to the underlying cluster and it has a context to the virtual cluster API server and then it copies down that part to the underlying cluster and then the underlying cluster scheduler would schedule that pod, right?

And it will use the same CRI et cetera as the underlying cluster. Same container runtime same setup, same Kubelet, right? It's ultimately as if your tenant would've created that pod directly in the underlying cluster. So vCluster is a way to elevate privileges for somebody, if you're already using namespace, multi, multi-tenancy without actually elevating their privileges.

And the really interesting pieces were really locking down the vCluster part with access to the underlying cluster to only create these resources. So the vCluster can't create a deployment in that namespace anymore because it doesn't need to, right? The only thing that it does, it creates the pod and it syncs the pod status up.

So you'll still have visibility into things like image pullback off and like all of these kind of statuses on the pod. But the pod in the underlying cluster, you know we're mapping from N namespace in the B cluster to one name space in the underlying cluster. So we renamed the pod, we make a whole bunch of changes. In your cluster you don't see anything of that. But in the underlying cluster, if you were to look, that's where you see all the containers running from that particular vCluster.

Marc: I think that like speaks a lot like first of all, that's like really creative and solving a good problem in a good way. And also like speaks a lot to like the extensibility and kind of the forward thinking of the Kubernetes API and the architecture of Kubernetes to enable that type of, I don't know, it's like orchestration, maybe like a word to define that, but it's cool.

Lucas: Yeah, it's definitely a different solution than than anything else out there. Right?

Benjie: Yeah, it's interesting. So the syncer is kind of the secret sauce or not the secret sauce, the open source sauce of vCluster, but okay, so I'm thinking about this from a security perspective and from more of a foot gun perspective. So I want to know how you're protecting me from a developer at my organization from shooting themselves in the foot. And so that's what I'm not sure I fully understand

Marc: Benji, you in this example, just to clarify, you are the infrastructure team managing that host cluster?

Benjie: Sure. That's a good point Mark. So I'm on the platform team or I'm a DevOps person, and first off I'm making a questionable decision. I'm letting my developers run their own Kubernetes clusters.

I don't know why but I'm doing that not because they can't just because it's complicated but they need to do that for our application needs its own full cluster and this is an amazing, obviously an amazing solution for that. So I give them a vCluster and I assume when you say gimme 1.27, right? And it's whatever, some distro of it.

And so now I give you a Kube, my developer, I give them a Kube config and they can just get straight to that cluster, right? Like they can connect directly to the Kubernetes API now and they can install or do whatever they want so they can install a stateful set for example, right? Obviously.

So when I actually install a stateful set, what does that actually do, right? Like that's needs to be on every node that is running any pods. So does that mean that a pod that is running on a host node gets that installed on the node but that node might have shared other pods on there? Does that question make sense?

Lucas: So are you asking are nodes shared between multiple vClusters?

Benjie: It sounds like they are, but if I need to install something that's a stateful set, right, that needs to live on the node, right? So how is the stateful set isolated from pod to pod even if they're two different vClusters?

Lucas: Yeah, I mean vClusters by default don't have allocated nodes, right?

Benjie: Right.

Lucas: But the syncer is a very, very flexible component. So when the syncer synchronizes a pod down to the host cluster, the first level of security you have is everything you have in underlying cluster still applies. So if you have admission control, OPA, right?

Anything in the underlying cluster that still has effect on that pod, if you're denying privileged pods, if you're restricting your entire organization to a certain image registry, everything has to be pulled from our internal registry and images has to be imported and scanned first, right? That still works, right? The beauty of the vCluster is you don't have to change anything.

If you're already doing that for real clusters and you're handing out limited non-admin queue contexts or you're handing out namespace and a namespace based system, then you probably already have these restrictions in place and now you're layering in vCluster and these restrictions still apply. If you have network policy on the namespace, they still apply, right?

Additionally vCluster will create additional restrictions like limit ranges, network policies, et cetera. We have a lot of options that give you a default set of isolation on that Kubernetes layer, but ultimately the nodes are shared, right? Because the scheduler is going to put these pods on the nodes that are available in this cluster.

But one thing you can do in the syncer is you can customize a lot of things. So you can tweak the resources during the sync process. One of these tweaks is for example, saying you define a node selector. And that way you can say hey I have couple of different business units, right? And this business unit always needs to end up on that node group and this business unit needs to end up on this node group or these types of nodes should actually be statically allocated.

You can achieve that by essentially configuring the syncer in a the cluster so your tenant doesn't need to worry about this infrastructure piece, right? For them it just looks like hey my pod got scheduled on whatever node. But underlying you as a platform team can define where does what go. And if you decide to share a node, then again like things like GVA is a cable container et cetera, we'll give you that level of isolation to actually be able to share a node more securely than it is if you just use vCluster standalone. That's typically how the security model looks like.

Marc: But in the example you gave, I don't know the way when you describe it, like one of the benefits that comes to mind is larger organizations end up with all kinds of security and compliance like things, right? You might have kind of to your point, this team, this business unit like for GDPR reasons or for whatever the reason is they need nodes to run in the US or not in the US or things like this or or whatever this is.

And you don't want to have all of your entire engineering team who's writing Kubernetes manifests to understand which label selectors, which tolerations how to actually do this. Instead you can manage this, they kind of get, I don't know, the developers that are running in the vClusters kind of get a vanilla Kubernetes experience.

They're just like I know Kubernetes, I just deploy my Kubernetes application. I'm not burdened with like all of the challenges of compliance or all the challenges of how we actually want to manage this infra. And you have one team who can like really focus on that and kind of abstract that completely away from the developers. Is that like one of the strong benefits of it?

Lucas: Yeah, for sure. I mean it's definitely creating that clear separation between what's running inside the vCluster is typically what the application teams are concerned with and what's running in the underlying cluster, right? So for example, if you were to do namespace based multitenancy today, it's like this OPA is running a different namespace then yours.

But I have to really lock you down to make sure that your pod can directly interfere with this OPA or bring it down or consume all the resources, right? The vCluster sets up a lot of boundaries for you to do that while it's still giving you the freedom to feel like you have your full on cluster. So a lot of folks that start using vCluster actually come from the approach of handing out individual clusters, right?

The biggest amount of clusters we've seen in a company at this point is over 3000 clusters, which is a lot of clusters. And for them obviously being able to say, I mean they're not going to switch over to one giant cluster that's pretty unrealistic, but they may be able to reduce it to 50 or 30, right? And that way the complexity becomes a lot less.

Benjie: So you saw one customer with 3000 and you think they could get that 3000 out of 50?

Lucas: No, I don't think we exist long enough to see that full rollout. But yeah, I think people are taking those kind of plans into action.

Benjie: But that's the scale that we're talking about and that's helpful for me from a context perspective because I'm just like, well I got my seven clusters.

Marc: Well wait, doesn't that create some some interesting like other challenges for you? Like what if my host cluster is EKS-126 and my vCluster is K3S-129 and I have a developer that writes a Kubernetes manifest that uses some of the 129 APIs. What do you do in that case?

Lucas: Yeah, I mean the cool thing is the vCluster and the host cluster need to agree on very few things and one of them is the pod spec. So for example, we saw the introduction of ephemeral containers, I think that was like, was it 119 or something like that, right? So if you have a vCluster that supports ephemeral containers and then your host cluster doesn't, right?

Then obviously we can't just make it up and emulate that piece, right? We have to tell you this part of the spec is not supported but as long as you use parts of the pod spec that are there in your underlying cluster, you won't have any issues.

And the pod spec honestly doesn't change very much. It is only that things are being added to it, additional capabilities, and then obviously yes, we we definitely also recommend upgrading your host cluster regularly, right? Obviously to stay stay up to date your host cluster, shouldn't be five years behind, right? Obviously that's not the goal of this.

Marc: Sure, yeah, the version SKU policy is pretty well documented. Does the same versioning and everything then apply? Like let's say I don't know Ingress moves from V1 beta one. Like do the host and the vCluster have to also agree on that or do you do anything there? Does vCluster handle that?

Lucas: That's a really interesting question and that ties into what does the platform team want to offer as shared services versus what do they want to allow the tenants to do? So by default actually, because Ingress is such a popular thing for people to request in a vCluster to just work, right? Like an application team wants to create the Ingress resource without having to run and maintain their own Ingress Controller.

Again, if you take the example of running 3000 clusters, right now, they have to run 3000 Ingress controllers, which is kind of annoying, right? And then you have to migrate them individually, which is also annoying. With our approach you can essentially say, hey in the host cluster, if you as a platform team decide Ingress should be something that is a centralized, standardized service that you want to make that offer to engineers, then you run an Ingress controller in the host cluster and you tell the syncer, please sync the Ingress resource.

That means engineers can now create the Ingress resource in their vCluster and the underlying cluster' s Ingress Controller is handling that Ingress is similar to a part, it will be synced down and it will apply to the same again OPA and all these restrictions, right? But then the underlying Ingress controller is handling it and wiring it up to these pods that also happened to run an underlying cluster, right?

Marc: But if the host cluster, like if they had configured the syncer to sync the Ingress resources and they had NGINX Ingress Controller as an example, but in my vCluster I wanted to write HA proxy Ingress controller, I could then deploy that Ingress Controller into my vCluster and it wouldn't sync down and then it would be handled in the vCluster?

Lucas: Actually, yeah, that's the beauty of it, right? If you have this team that says like we want to try out gateway API-- like I want to try something that is not the currently company supported gold standard that is super stable, right? We want to try something new, you can. That's the whole beauty of it.

We're not stopping you, we're not taking that velocity away from you. And then ideally what should happen in a company is like these front running teams within the company that spearhead these initiatives, right? Like one of your teams says like we've tried out Istio, we tried out Argo CD, we had great experiences and take it to the platform team and then the platform team can decide, okay, Argo CD or Istio is something we want to make as a standardized offering now for everybody, right?

Same with versions, right? You have somebody that says like I'm exploring this new version of Istio now, I'm running it myself because I can, right? There's obviously also ways in a vCluster to say, hey, we're forbidding these things, right?

So for example, you can block services to get a load balance IP and then obviously your teams cannot create an Ingress Controller on their own because they would have to have a load balance for that. But by default the idea is your teams should actually be able to run on their own if they want to.

But again, the whole goal is probably 90% plus of your engineers are not that cutting edge and they don't want to run these experiments. They just want to know how do I expose this on a domain? And if you tell them, if you create the Ingress resource, we got you covered, that's perfect, right? That's the experience that people actually should have if they don't want to maintain the infrastructure.

But you don't want to kind of slow these folks down that no more are deeper right? And actually know what they're doing and want to spearhead some of these initiatives. That's the whole balance of like velocity versus standardization.

Benjie: Yeah, I mean this is super interesting and I think what I find the most interesting about it is, again, the scale and the types of companies you're working with. You're clearly really talking to these enterprise companies that have these policies, these things that you need to do. Which kind of leads us into the next question.

So you had Kiosk, you started writing vCluster, it starts to get some momentum and now you have solved, at least begun to solve the monetization problem. So tell us a little bit about the juxtaposition between, DevSpace and vCluster and what's the difference there and how it works from an open source perspective, what the monetization strategy is and how all that has has come to play?

Lucas: Yeah, right now our monetization strategy is essentially monetizing the vCluster platform and vCluster Pro. So we're doing two things. If you run vCluster open source, there may be certain things like let's say you're starting out with a default, which is K3S with SQLite, right? A lot of our customers, when they start, especially using vClusters in production for multi-tenant production setups.

So let's say right now you're spinning up the EKS cost of per customer, which is very expensive again having to manage all these Ingress, et cetera. And you refactor that to spinning up a vCluster per customer and now you can share the Ingress Control and it's much easier to operate your production system for your managed software offering, right? Then you may want to look for obviously support, but you're also looking for a better vCluster.

So the K3S vCluster with SQLite is probably not what you need in production. And then the question is, what is it that you need in production? Do you need regular vanilla Kubernetes, then you got to manage an CD somewhere. How does that work?

So for our enterprise offering vCluster Pro, we have certain performance security and scalability features that are exactly designed for those use cases. Making the vCluster really truly high available, making sure you're operating that, that backing store in a way that scales and doesn't become a bottleneck.

If you're using K3S with SQLite, one of the things you'll immediately notice once the vCluster gets a little bigger, once you host someone there that has a lot of requests, well that SQLite has a single file data store may become a bottleneck and you can obviously host your own etcd cluster, but we also have a feature in B Cluster PRO that is called embedded etcd. You can turn it on and it runs an etcd for you alongside the vCluster that is managed by our software and it scales out pretty neatly.

You can run it as one replica and no additional part is needed. It's embedded in the code and then obviously you wouldn't scale up the replica number to two because then you have issues with like etcd Chrome et cetera, but you can spin it up to three and five et cetera. And vCluster Pro does all the wiring and maintenance and like controlling that etcd alongside with the rest of the control plane.

Benjie: So my etcd in that case is actually running in the host Kubernetes cluster as a pod, but you guys managed and maintained that pod for me.

Lucas: Yeah, it runs actually baked in the vCluster. So there's only one pod. The pod contains the vCluster and etcd inside of it, which is really interesting and there's an automated migration path from SQLite. So you can literally have SQLite till the point where you can't, and then you turn on embedded etcd, provide a license key and boom, right? You have a much more scalable cluster and you can make it HA, right?

SQLite, you can't really make HA right? It's a single file that is read from one party at the time and that often becomes a bottleneck. But that's just one example. There's a ton of other features. For example so far we've been talking about slicing up a vCluster, real cluster into vClusters, right?

And assuming that there's one real cluster then 20 name spaces and then 20 vClusters inside these namespace. But if you're running this in production for example, or if you want to guarantee the uptime for the vCluster control plane, you may actually not run or have the workloads run in the same namespace as the vCluster itself. You may actually want to have these workloads run an entirely different workload cluster and all the control planes in another cluster.

That's another PRO feature. We call that isolated control plane. And that ties into something really exciting. Now we are blurring the lines of is vCluster just a single cluster slice and dice solution or can I actually schedule workloads from one vCluster control plane to three other clusters and maybe to a fargate in the future, right? It's really interesting.

That's an R&D topic for us right now. I would say we're maybe 80% there to make it multi cluster and even multi-cloud. There's some really interesting things on the networking with the wire guard protocol that the team is working on to obviously make sure that it feels like you're still working in this one cluster and experience is the same.

And we're working very closely with CoreWeave, the GPU kind of cloud that made a lot of headlines last year because they're using this isolated control plane. We actually gave a talk at KubeCon on showing how they use that to build their GPU cloud on top of it.

And obviously as a cloud provider, they guaranteed the uptime for this control plane just like as Amazon and AWS guarantees the uptime of your EKS control plane, right? There's an SLA attached to it. So obviously that control plane runs in a different environment than your workload containers that you scheduled through EKS.

Benjie: What was the name of the cloud prevent? I didn't catch that.

Lucas: CoreWeave.

Benjie: CoreWeave? And so you're saying CoreWeave uses vCluster to provide GPUs to folks and you get a talk on that and keep going in Chicago?

Lucas: In Chicago, yeah.

Benjie: Okay, so you're monetizing, you have basically a pro version and an open source version and there's some of these features, reliability, high availability, that type of stuff is kind of the monetization path there. You said there was two ways.

Lucas: Yeah, so this is the first path. This is essentially if you need each individual vCluster to be more performant, more resilient, more scalable, right? So we have a fork of our own open source project where we add these kind of features in. It's a direct fork to our upstream vCluster project, right? But it adds these additional capabilities to vCluster.

And then the second monetization path is the platform that we're selling. And that's essentially if you are a platform team and you want to hand vCluster out to a lot of engineers, or you want to manage a fleet of vClusters whether that's in production or for this internal developer use case, you have the point where you're saying, okay, now I need 3000 vClusters instead of 3000 clusters."

So how do I make sure we can upgrade them? How do I make sure we restrict these vClusters? How do I set limits and fair resource sharing up between different tenants. And that's where our platform comes in. It has an ICUI, it integrates with Argo CD with things like Terraform, with HashiCorp Vault, right? With Rancher, which is something we recently announced.

So if you're using Rancher right now, for example, to offer self-service name spaces and to manage real clusters with our extension, you can essentially deploy our platform, hook it up to your Rancher, and users in Rancher automatically become users on our system. The SSO is connected with each other and people now can also provision virtual clusters and see them in Rancher and manage them in Rancher, right?

But you need this glue in the middle that essentially makes deployment of these vClusters work and hooks everything together. So that's vCluster platform. That's really the two components because the Pro is making each vCluster more performant, more scalable, more secure, and then vCluster platform for orchestrating and managing the vCluster process at scale and also enabling that sales service.

Benjie: Super cool, super cool. So, okay, we're getting low on time here a little bit Lucas, but I do know that we're recording this before KubeCon Paris in 2024. I know you are announcing something that you maybe wanted to share with us.

Let's talk about what you're announcing, maybe a few roadmap stuff, and then we want to kind of dive into some licensing and community questions. And then we'll wrap it up. So yeah, so tell us what's happening in KubeCon in Paris? What do you guys announce it?

Lucas: Yeah, a big announcement was the Rancher integration. I think there has been a lot of demand in the Rancher community. Actually if you go through their Git repository and you search the term vCluster, there's a lot of hits there. There's a lot of issues where people describe, hey, can't we get virtual clusters alongside namespaces and real clusters in Rancher?

And these issues, some of them are two years old. We've been obviously open sourcing vCluster, we started this in 2021, so we are about three years out there. And these calls came pretty early from the community in early 22, et cetera. So we've been building this integration and hopefully that is something that the Rancher community would like to use, right?

To extend the Rancher capability to get virtual clusters in. I think that's our biggest recent announcement. In terms of R&D, what's on the roadmap for the rest of this year? There's two very, very exciting things. One is, I think I teased it earlier a little bit. Multi cluster vClusters, right?

How can I schedule right now we have isolated control plane, so you can run the control plane in one cluster and schedule to one other workload cluster. But what if you could have the control plane in one cluster and then schedule workloads to three different clusters, maybe one GKE cluster, one EKS cluster, one on your private cloud. That would be super interesting.

But for your user, it all seems like one regular Kubernetes cluster and they don't have to bother about this underlying infra, right? I think that's a really, really exciting topic, but it's still early R&D. We've been working on this for about a year and it's not quite ready for primetime yet, but we're getting very, very close to it.

Benjie: That to me is when the name virtual cluster is really going to be a virtual cluster. I'm excited, I'm excited. You can call like the virtual virtual cluster, but I think that's really cool. I'm very excited for that one. When it literally is you're able to get that kind of separation.

I know a lot of folks that will be very interested in using that for physical separation of particular resources. Mark brought up the whole EU limitations and so this could be super duper interesting. My control plane lives in North America, my actual workload lives in Ireland, whatever. That's really cool. Sorry, what was the other thing on the roadmap?

Lucas: Yeah, and then the other thing is regarding how do I move a vCluster from one cluster to another? I think that is something really exciting as well. You can think about it like a snapshot, right? You can snapshot a VM and then take the snapshot and run it on a different server. But if you could do the same thing with vCluster.

Obviously it's interesting for doing things like backups and restore mechanisms, but also for these kind of migrations. So let's imagine you're running GitLab in your company and you run that today on top of Kubernetes and now you decide, ah, we got to tear down this cluster and redo it because we didn't have experience with Kubernetes starting out. Like I think all of us have done this where we kind of re-architected our clusters.

Or you're deciding, hey, this should actually move over to a different cloud provider region. Or even we want to move from AWS to Azure, right? How do we do that?

It's really difficult today to move workloads from one cluster to another, but if this one's in a vCluster going forward, you can think about how can we snapshot this vCluster similar to how we snapshot a VM including all the persistent data and then reinstantiate that snapshot in a different host cluster. I think that's a super, super exciting thing. And that's something that that is heavily being worked on with our team and on the R&D side right now as well.

Benjie: So the real roadmap is that there's basically everyone runs everything in a virtual cluster. Is that the real roadmap? So virtual clusters are the new cluster?

Lucas: Yeah, I think that is the goal, right?

If you are asking for a compute machine today in an enterprise, nobody's going in a data center, plugging in a server for you anymore. You're going to get a EC2 instance, a VM somewhere, right? And if we're doing a good job, if somebody needs Kubernetes in five years from now and they're saying, hey, I need a Kubernetes cluster, they should be getting a virtual cluster by default and only if they really need a real cluster, then they should be getting a real cluster.

I think if we get to that point, and again, there's a lot of work to do to get to that point, right? I think VMware didn't build this kind of VM experience and obviously AWS wasn't and EC2 wasn't built in a day, but I think today these things are the standard and for Kubernetes, I hope virtual cluster will take the same path.

Benjie: Super cool. So let's talk about licensing. vCluster is Apache 2.0, is that right?

Lucas: That is correct, yes.

Benjie: And it's not currently a CNCF project?

Lucas: Yes, that's correct. DevSpace is a sandbox project and we contributed rather late, actually I think a lot of projects over the past few years, while a lot of startups have taken the approach of committing projects super, super early into the sandbox and then evolving in the CNCF, we've actually contributed DevSpace.

And that may have also been a lack of experience on how to set up these projects and be connected to that whole governance model of the Linux Foundation. But we contributed DevSpace really, really late. And for vCluster right now we are obviously building a commercial product on top of it and we're definitely seeing the benefits in terms of governance and putting things in a foundation, but we think it's too premature to kind of take that step in committing it to the CNCF at this point.

Marc: But it's a possibility to consider in the future. Just right now you're like Apache 2.0, use the software kind of leave options open to make sure that you have control of that future.

Lucas: Yeah, I mean, absolutely. I think that it's a little bit of a fallacy in terms of saying that there is absolute kind of security around licensing and governance of a project by putting it in the CNCF. Sure it will never be made proprietary, right? Like a move what HashiCorp did with Terraform is not possible once you hand it over to the Linux Foundation.

But a lot of these projects that you see on the landscape and on all particular in the sandbox, they're driven by individual vendors, right? They're not necessarily driven by-- I think Kubernetes is really a community project driven by so many different parties, right? But then I look at individual projects in the sandbox or even some incubating projects which are primarily driven by one organization.

So you're still kind of dependent in terms of governance on that one organization to be around, right? And to do a good job steering the project. I don't see the immediate benefit for handing over vCluster to the community. Hopefully given that we are steering the project in a way that makes the community happy and that lets the project flourish, that is obviously our goal regardless of who the owner of the trade market is, right?

Marc: Yeah, I mean I think you think about you brought up Terraform as an example, it's kind of worth talking about for just a second. Like I think you're right, like there's a lot of projects sandbox for sure and incubating also that are one company as the core maintainers.

And I know that the TOC is doing work to try to encourage making that a little bit more wide for maintaining, but it's less about like the longevity of the project, right? Because it doesn't matter whether it's an Apache 2.0 license by Loft Labs or a CNCF incubating project. If there's a single maintainer and the company goes under like the value of open source, I can say, okay, great, I can fork and maintain this thing moving forward.

Going back to the Hashi and Terraform, the concern there is really around that licensing change that they made into the BSL. I mean it sounds like by yours continued saying we're just going to keep this as Apache 2.0, we still don't see the need to move it into some kind of a foundation. You are not seeing that pushback from any customers, any users of the software saying, ooh, but like what's going to happen with the licensing model in the future? What guarantees can you provide me around that?

Lucas: Yeah, I think for us we are not considering changing the license. I think we set up the project in a way and the commercial model in a way that it interplays very nicely, right? I think if obviously Terraform would've had that proprietary fork, right? Where they put some of the enterprise features similar to, let's be honest, like GitLab does it with their enterprise model, right?

It's more classical to the open core kind of approach. Then I think there is certain security around like, okay, why would this have to be relicensed? Because there is already this enterprise version of it existing. I think that setup works really, really nicely for us today and I think there won't be really need to change that.

I think that creates a lot of peace of mind for our community versus I think for a lot of CNCF projects or a lot of projects in the Kubernetes ecosystem that have been created over the years. I think there is not that clear differentiation of like, okay, what are enterprise features? How do I upgrade?

When you look at GitLab, they had that model in place very, very early and they made very clear, this is what we monetize for, this is enterprise, this is open source and here's the line and here's how we decide which goes where. And here's also how we move things from enterprise to open source over time, right? As we develop new enterprise capabilities.

And I think if you're smart about that setup of this project, then people kind of trust you to essentially not having to be forced to re-license because let's be honest, I think no company likes to do that, right? I think HashiCorp probably had a lot of internal debates and discussions around taking this crucial step, just like every company that has to re-license it's typically not something you do for fun.

It's something you probably are pressured to do because of organizational kind of business reasons. And I think we're trying to obviously learn from the past, we're standing on the shoulders of giants here, right? Looking back and saying, hey, let's try not to repeat mistakes.

And I think the community looks at, we also launched another open source project called DevPod last year. We have DevSpace, we have Kiosk, right? I think we've shown that we are good kind of stewards of the open source projects that we maintain. And I think that kind of trust is really important with the community.

Benjie: I think that's fair. I think that's totally fair. And DevPod is CNCF already, I believe. Is that correct or?

Lucas: No, DevSpace is CNCF. DevPod is just something we launched at about nine months ago. It's like a GitHub Code spaces competitor, but open source, less opinionated, more like Terraform style.

You bring your own infrastructure and we spin up these code spaces pretty much wherever you want to spin them up and then you can develop against them. But it uses the same dev container JSON format as GitHub Code Spaces does and kind of opens things up into the open source era essentially. It's a really neat solution and it's gained a lot of attention last year.

Benjie: Well, we'll maybe save DevPod for a new upcoming episode because we're pretty low on time, but really important question "kubectl" or "KubeControl," or how do you save?

Lucas: I'm typically resorting to kubectl, which I know a lot of people are not a fan of.

Benjie: That's not correct.

Marc: Wow

Benjie: kubectl, kubectl. The big mission of Kubelist is really getting kubectl to be what everyone says. Everything else is really cool that you do Lucas, but you definitely failed that question, but we'll forgive you.

Okay, if I want to contribute to vCluster, how does the community work? Are there monthly meetings, weekly meetings? Where do I find you and how do I contribute?

Lucas: Yeah, we do have community office hours and I think the first step should be joining the Slack channel. I think that's the easiest thing. Go to the website, hit the join Slack channel button. There's about, I think 3000 users already in that community. It's very vibrant because that is the quickest path to surfing as an idea.

Discuss things quickly and then obviously formalize a ticket to Github, create an issue, right? But sometimes it can be intimidating starting an issue or even starting a pull request on GitHub. So Slack is a really low key way to ask questions, familiarize yourself with the project, get in touch with people directly, right, and essentially start the conversation around things. So I always encourage folks to join a Slack channel.

Benjie: All right, awesome. And then is there anything else about the community, any events coming up in the next two months, three months that you think are kind of interesting to highlight? We're probably going to air this after KubeCon Paris, but just anything that you want folks to come check you out on or anything like that?

Lucas: Yeah, I think KCD on New York City, I think this is the first Community Days of Kubernetes in New York City. I think we're like some kind of platinum or gold or whatever sponsor for it as well. I think that's going to be a fun event.

I'm going to be there myself, but there's also going to be other team members joining and yeah, I think this is going to be a really, really exciting event to join. I think it's in May and so definitely worth checking out.

Benjie: I should be at KCD as well, so I will make sure to put some Shipyard stickers up on the Loft kiosk, which I'm sure you're going to be good with. I'll just save some space there. I do that to Replicated whenever I can as well. Just kidding.

But, okay, great. I think that I'm going to spend some more time with vCluster. I'm super interested by this stuff. I'm really excited about the full virtual cluster. I really am super excited by your roadmap. I think that's really cool where it seems like an easy path to the promise of multi-cloud excites me, I will say that.

Lucas: Awesome, cool.

Benjie: Thanks for coming on Lucas.

Lucas: Yeah, thank you so much for having me. This was fun.