Ep. #51, Performance Engineering with Henrik Rexed of Dynatrace
In episode 51 of o11ycast, Charity Majors and Jessica Kerr are joined by Henrik Rexed of Dynatrace. This conversation covers a wide array of software and observability topics including trends in performance engineering, insights on minimizing system complexity, the shortcomings of auto-instrumentation, and best practices for avoiding vendor lock-in.
Henrik Rexed is a cloud native advocate at Dynatrace. Prior to Dynatrace, Henrik has worked in the performance engineering ecosystem for more than 15 years as a consultant and vendor advocate. He also launched the YouTube channel, Is It Observable? and is also one of the producers of the podcast Perfbytes.
In episode 51 of o11ycast, Charity Majors and Jessica Kerr are joined by Henrik Rexed of Dynatrace. This conversation covers a wide array of software and observability topics including trends in performance engineering, insights on minimizing system complexity, the shortcomings of auto-instrumentation, and best practices for avoiding vendor lock-in.
transcript
Henrik Rexed: So my first usage of Kubernetes was through my previous role as a performance engineer, and at the time I was mainly dealing with partners and trying to build integrations.
At that time I had a close connection with Dynatrace, and Dyna trace were working on Keptn, one of the framework that provides and continues delivery in other use cases and, yeah, basically said, "Hey, let's build an integration."
And then I think, "Hey, I'm not afraid about it, let's code!" And then I discover a few things, I say, "Wow, okay. That's different from what I experienced."
Yeah, so basically the first week was a lot of sweating, a lot of swearing, a lot of nightmare, a lot of drinking and at the end, yes, the next week I was confident.
Or maybe I was drunk, I don't know. But at least I was confident.
Jessica Kerr: This sounds like a good time for you to introduce yourself.
Henrik: So yeah, my name is Henrik Rexed. I am a cloud native advocate at Dynatrace, prior to Dynatrace I've been working in the performance engineering landscape more than 15 years plus as a consultant, and then as an advocate for a vendor.
Out of that I have launched six months ago, around that, a YouTube channel to help the community to get started in observability topics, so more on technical level side.
Also because my heart is still with the performance engineers, I stil produce content for performance engineers and that's why I'm one of the producers of a podcast called PerfBytes as well.
Charity Majors: And what did you say was the name of your YouTube channel for observability?
Henrik: It's Is It Observable?
Charity: Is It Observable? Excellent, we will put that in the show notes.
Henrik: The name was inspired, I don't know if you remember but a few years back there was a show where there was a doctor saying, "Will it-"
Jessica: Will It Blend?
Charity: Yeah, Will It Blend? You should have such great stickers that mimic that, like Is It Observable? Will It Instrument?
Jessica: With a blender.
Charity: Yes. Because what is production, if not the worst blender ever? Whatever you feed into it, it's just going to get chopped up and spat back out.
Jessica: So I see there's videos on things like open telemetry instrumentation step by step and how to collect metrics in Kubernetes and how to build a Prometheus query something, something client, something.
Charity: Yeah. I was really excited to have you on because you and I were on a panel together a week or two past, and while we were doing that it occurred to me just how little we've really heard from performance engineers in the modern observability movement.
And this is near and dear to my heart because I used to identify very much as a database performance engineer, and talk about people who know about instrumentation and the value of being able to see what the hell your code is doing.
Performance engineers were kind of the original observability engineers, weren't they?
Henrik: I agree. The requirements when you do a load test, when you stress a system, obviously the number of data that you need to understand precisely what is going on is completely different from a normal production system.
Usually you can pick up metrics every 30 seconds, one minute or so, but then there is this aggregation and, as a performance engineer, when you start saying, "We're going to aggregate."
Then I don't know what happens but people turn red and angry because you need details, and aggregation is the opposite of details.
Charity: Yeah. I have so many stickers that are like, "Aggregates are the lies the Devil tell you."
And many other nasty... Yeah, I fucking hate aggregates. Yes, they're cheap. Yes, they're simple. Yes, they're useful for looking at trends over time.
But if you actually want to understand your code you have to look at it request by request, hop by hop.
All of that context, as much context as possible otherwise you can't correlate like, "These errors, what do they have in common?"
The other problem with metrics is the cost goes up linearly with the number of metrics that you try and collect and every metric is a question, and you can't ask multiple questions of your metrics because of the way they're stored on disk.
You really need those wide structured events really wide, like hundreds of dimensions per request, in order to tell, "Oh, so these errors are all of the ones from this ID or this type of device and this language pack and this region and this shard key and all these things."
Because if you can't change together all of these high cardinality dimensions you can't describe these very precise conditions under which a bug is reproduced or seen.
Henrik: Yeah. In fact this is a big trend with the performance engineers is let's get the raw data, as much raw data as possible.
Especially when you deal with your tests, so then you have enough materials to understand because when you deal only with averages, if you have a very low response and then you have spikes then if you start to do averages on those spikes, you don't even see them. It's like *fwip* disappeared.
Charity: Exactly.
Jessica: So you miss the most interesting pieces of data.
Charity: The most interesting pieces of data are always the outliers, right? Even 99.99th percentiles can cover over a whole bunch of sins, right?
You really need to be able to see the max and the min and, in and of themselves, they are always interesting.
Henrik: Yeah. I think from when you with a production that doesn't deal with a higher granularity, I think that makes sense because you mentioned before you're looking for trends, you're looking for patterns so that could make the jump.
But when you're testing, obviously you're spending hours or you're spending a lot of investment in saying, "We need to stress this and validate this."
And then I think it would be a big shame to say, "Okay, I did my test but I have no clue what happened."
Charity: What's the point?
Henrik: I'm pretty sure any product leader would just be very, very angry about it.
Charity: Yeah. I think that the shift here from performance engineering, it's very much about the health of the system, right?
Like what happens to the system while all these things are happening? Versus observability I think of as being very much about every single user is a test of their very own and you have to look at the experience, and end users are the highest possible cardinality for the most part in your data set.
But they don't actually care if you're system is 99.99% up, if the shard that they happen to be on is down, even if it's like .001% of the traffic.
This is the thing about distributed systems, the more complex our systems get, the more there are these little corners lurking everywhere where for 100% of people it sucks and it's completely drowned in the overall numbers and the overall statistics and you have to be able to slice and dice by every possible dimension in order to locate them.
Henrik: This remind me a story in fact, to be honest. It was in 2003 and at that time there was few profiling systems.
I mean CAWily was there for many Java environments, and I remember we were using the Mercury/HP suite at that time, and HP came up with this solution called HP Diagnosis. I remember that.
And we were so excited to say, "We have it part of our license. It's amazing. It means we're going to do traces so we're going to be able to drill down on everything." Just give that to the hands of a performance engineer, you see they will be excited, and then they launched the test and then they discover a reality. The real world.
When you do profiling at that time, at least, because things have been improved significantly in that side, but I remember we were doing a test and even with five user the system was not usable at all.
The profiling system was slowing down everything. It was a nightmare, so then we said, "All right, profiling is nice on the paper but on the real life it's completely different."
Charity: Just like trying to attach GDB to a running thread or a running trace, you just can't do it.
The amount of information is just going to drive your system to a halt. I want to circle back to communities for just a sec though, because we at Honeycomb, we just finished migrating from the stuff that I set up six years ago which was a lot of autoscaling groups and BMs and everything, to using Kubernetes for stuff.
We did this rather reluctantly, there was a bit of like, "Oh fuck. Okay, it's time." And yes, there is some technical reasons, like we'll get spikes in traffic and we weren't able to respond within seconds, we were only able to respond within minutes and scale up.
So Kubernetes solves some real problems for us, but we keep stressing the fact that it's not about Kubernetes.
People keeping to us asking for Kubernetes monitoring solutions and everything, it's like, "Okay, we can give you that. But that's not what you care about, what you care about is the performance of your application for the most part, right? What you need to be instrumenting is your application?"
And yes, the underlying infrastructure, yeah, sure. But the stuff that you need to care about is your crown jewels as a company, the code that you're changing every day, the code that your users use, the code that your team is supposed to know intimately. Infrastructure should be as boring as possible, right?
Jessica: It's like the weather, you need to know what the weather is to know whether your car should maybe go slower than usual and maybe that's due to there's ice on the road.
You need to know that, but it's not what you're going to change. It's not what makes your business special.
Charity: You need your car's dashboard, you need all the information about how fast am I going, am I slowing down, am I speeding up, you need these rich feedback loops.
Jessica: So you can make decisions about driving the car.
Charity: So you can make decisions, exactly.
Henrik: Yeah. But if you were driving a diesel car since, I don't know, many, many years and suddenly you shift to an electric car and then suddenly you discover that you have to charge it every night. It's the same thing, it's a change of mind.
I think Kubernetes is great, but you deserve to understand... It's like there is no magic without tricks behind the scene, so you have to understand the tricks that are happening behind the scene to handle them in your day to day job.
I think you make the good point, there are in the company, in the organizations, there are people that are mainly focused on applications, users, business, so they will be very, very up to looking at how is my code running, how are my users satisfied and so on.
Then there is another side of the organization which operates clusters for several, several, several project and they are-
Charity: Your platform teams?
Henrik: Yeah. And those, they have other expectations.
Charity: I come from that side, right? So I'm not trying to diss on my people when I say this, that it's increasingly a commodity.
Every year we move further up the stack. When I was starting out I had to get a taxi to go to De Cola in the middle of the night to flip the power switch on the MySQL server when it went down.
I don't think about that anymore. Nowadays even people who work at kubernetes don't really think about Linux anymore, right?
And we're moving up and up the stack, which is good because it means we can do more powerful things with fewer people and less time.
But I think it does mean that if you're not solving infrastructure problems for the world, if you're solving infrastructure then those are your---.
If you're solving other business problems, it's your responsibility as a platform team to be as cheap and as boring as possible, and to not get dragged down into the weeds.
To the extent possible, which is always the caveat, right? Any time you're doing something new or interesting you will have to understand what you're doing.
Henrik: But I think the main problem is that a few user when they start using Kubernetes, they suddenly say, "Oh, it's completely different."
And then they suddenly forgot about, "Oh, Kubernetes runs on servers on virtual machines so it means that the constraint that I had a few years back, I will still have then in Kubernetes." Yes, that's true. "Oh, I didn't know that."
And people just forgot about this. You could build services that will allocate ports by node port or cluster IP services, and at the end you're running on a machine and this machine has a limited number of ports and number of IPs.
So if you don't keep track on that, then one day you just deploy a new workload and then it say, "There is no port available."
Charity: This is one of the reasons that I think most people should not be running Kubernetes themselves.
They should be using Amazon's Kubernetes or they should be letting people who do Kubernetes for a living do their Kubernetes.
As soon as you've exceeded amount of scale and have custom problems and this doesn't hold true, but most people starting out and this is why I get a little pissed off about Kubernetes because I feel like it was this resume driven development for a long time where it wasn't actually solving real problems for most people who adopted it.
But it was the cool thing so they wanted to adopt it so that they could get better jobs where they were using it, it just irritates me.
Henrik: I think what I like with the notion of Kubernetes, I mean Ansible has introduced it, is the notion where I can code my app and then I can design through code how I want to deploy that, how I want to manage it.
I think it's just great and it helps you to automate, it helps so many things, so I think we are giving a great power into the hands of users so they need to understand those powers and handle the responsibilities of the powers.
Charity: Well, that's the thing. Yes, it is very powerful and you do need to understand a lot in order to use it well, which seems like a step backwards honestly, from the idea of having these composable infrastructure blocks that you can use without having to dedicate your life to becoming an expert in them.
Henrik: I think it reminds me of a few years back when the clouds were starting, everyone said, "Oh, it's wonderful, everything scales automatically, it's less expensive, great, great, great."
And everybody was drinking the marketing messages from pure hyperscalers and they were going just straight to the cloud and then a few month back they say, "Oh, in fact it's expensive."
Uh, yeah. But at least you can take one server for an hour and just delete it, I think that's a luxury so it's the same thing.
You need to understand the consequence of such a service and then manage the consequence of it.
Charity: I completely agree. You're already one of my favorite guests because you're not scared to disagree when we say something.
That's fantastic, people are always too quick to agree when they go on podcasts and it's not good.
Switching gears slightly, I saw something on your LinkedIn that I have to ask about. What is a Christmas performance engineer?
Henrik: So a few years back I started a conference in the name of my previous employee, it was Neotis that now has been acquired by Tricentis and we wanted to create a conference that will bring all the performance engineer.
So we decided to create the conference with the name Performance Advisory Council, and we did a live event, and we decided that for the live event, why don't we do a 24 hours live conference where we start from 6:00 AM in local French time and we start with Australian speakers, and then we follow the sun.
We started with New Zealand, then Australia, then India, and then suddenly we went to Europe, then once we covered Europe then we move onto East Coast and then finish with the West Coast.
So we stayed 24 hours up, I was the one moderating the conference for the 24 hours, in fact a bit more, and we had 22 speakers for conferences, and I moderate four editions of this conference.
In every edition we did different themes. The first one we were a bit shy so we utilized the Tran team.
The second one we did PAC, because the conference was called the PAC, so PAC so we did PAC To The Future, so that was the second edition.
Then we did the third, it was Jurassic PAC. And last we did the PAC Heroes, so there was a rock and roll rockstar concert, so that was my last event with Neotis.
Charity: But where does Christmas performance engineer come in?
Henrik: When the COVID pandemic arrived, we thought we were losing track with our community, so let's do every month, once per month we do a meetup where we organize and we bring eight experts, like a round table, and I was moderating this.
Then we were inviting people for one hour and a half, discussing about their different topics, performance in IoT, performance or whatever for user experience.
The first one was during Christmas, so that's why I changed the role just to announce that essentially I'm a santa.
Charity: I saw that and immediately assumed, "Ah, the biggest load of all is over the holidays, so it's like a special role where all year they're preparing for Christmas load."
Henrik: No, no, no. It was really about promoting the assets because I also changed my role for Jurassic PAC. I call myself The Doctor.
Charity: I got it. Well, that's very cute but disappointing. Now I want this to exist in the world. Wouldn't that be cool, Jess?
Jessica: No, no. I'd rather be a Christmas drunkard or something.
Charity: Well, it doesn't mean you need to be performance engineering on Christmas. It's so that everybody gets to be drunk on Christmas, you do all the performance engineering in advance so that...
Because one of my pet peeves is how everybody has this self inflicted disaster over the holidays where they're like, "Cool. It's after Thanksgiving, let's freeze deploys for a month." And then they keep writing code, it all rots.
Jessica: Depending on how much you enjoy spending time with your family over Christmas, it can be a positive to have pages to answer.
Charity: I'm not saying do stupid things, I'm not saying... But saving up all of the changes for weeks or however long just guarantees that as soon as you turn everything back on...
The worst outages of my life have all happened right after the holidays when they unfroze a freeze.
Jessica: Right. The thaw, that's kind of Jurassic Parky, what will you find in the amber in January?
Charity: It's kind of Jurassic Parky. But the point is that you shouldn't be freezing deploys, you should be thoughtful about what you merge it, but as soon as you merge it you should assume that it's going to go live quickly. If you don't want to ship something, don't fucking merge it.
Jessica: Code freezes are one of those things where it's like each individual decision of, "No, I'm not going to ship this now," is probably not bad. But in aggregate it's bad.
Henrik: But the code freeze, I remember that was mainly when we were dealing with Waterfall. I don't remember seeing code freeze.
Charity: People still do it.
Jessica: They totally do it.
Charity: Everybody still does it, especially for between Christmas and New Year's. Two weeks almost everybody does coding freezes.
Jessica: Also before a big release they'll do a code freeze.
Charity: Yeah. I want to reiterate, I'm not saying keep deploying like anything and everything, I'm just saying that it rots when it's in the codebase and it's not live. You should assume that whatever is checked into git is live within minutes, an hour at most.
Jessica: And if it's going to rot on your branch too, but then you know it's rotten.
Charity: You know it's rotten and you treat it with the appropriate care, you get a code review.
Part of why this is so bad, I think, is because it encourages so many bad habits around deploys in general because it encourages you to batch a bunch of things up.
That's terrible. You should be deploying one person's changes at a time, right, so that you can tell what actually happens, you can roll back.
Otherwise it breaks and you're like, "Okay, 13 people committed to this. Now everyone's afternoon gets ruined."
Jessica: Yeah. And you can page all of them and they'll all be like, "It's probably someone else's."
Charity: Or almost worse, they all jump in and then they'll all blow their entire afternoon.
15 people just lost an entire workday because one person's change didn't work. Also I really feel like CICD, I'm trying to drive this that CICD does not stand for Continuous Delivery anymore, it stands for Continuous Deployment because what the fuck is the point of CI, what the fuck is the point of Continuous Integration if you're not preparing it to be in a production?
Don't say, "Well, we can deploy at any point."That's bullshit, you know you don't know if it works or not until it's in production. I love that quote that shipping is the heartbeat of your company and it should be as regular and as boring and as automatic and as not a big deal as a heartbeat.
It should happen in the absence of action. You merge, it happens. You shouldn't need somebody to go, "I'm going to deploy now."
Trot, trot, trot. That's just such an anti-pattern I don't understand why people are still doing it in this day and age.
Henrik: Beware, I'm pressing enter. Beware.
Charity: The key press that brought down the world. Let's talk about open telemetry and what it means to be a good contributor the open source ecosystem?
Henrik: Anyone can contribute, I think the problem is the people mainly think that contributing to an open source project, it's mainly code, mainly technical aspects.
That's true, there's a lot of technical stuff to deal with. But out of just coding there are a lot of other stuff, documentation.
Look around, there is a lot of opensource project where I think the documentation could be really improved to increase the adoption, so people doing the really good documentation could help a lot of projects.
Also, providing training content, helping customers to adapt, so I think it's just about not only technical stuff.
Jessica: And it's not only the documentation on the site, it's not only the official open telemetry, for instance, documentation.
Your YouTube channel with the videos about open telemetry is contributing to open telemetry.
Henrik: When I first made a step in open telemetry, my first reaction was to say, "Okay, so which repo?"
Because when you just look at open telemetry there is one repo for every single agent and then there is the open telemetry collector, then there is another one for the operator.
Then you start to say, "Okay, okay, okay. Where am I going to start?" We are engineers, we work on it, we're not afraid of spending time and trying things, but I'm pretty sure that if you're under very high pressure and you need to provide results to your company and then someone says, "Hey, we need to implement open telemetry," I think that could be very stressful for someone who had never looked at the community itself.
So I think we need to have to people helping because I think open telemetry is awesome, to have a standard where a company can rely on standard which means if today I instrument my code and I decide to use one of the observability backend vendors of the market and three years later I said, "Oh, there's a new one, I want to change," then the effort of something is very minimal. I think that's great news.
Charity: It's a huge step forward. We've been needing this, I remember thinking about this eight years ago like, "Why are we so behind as an industry when it comes to standards for instrumentation and telemetry and how do we do this?" People were asking us all the time, "How do I instrument my code?"
And I'm just like, "Oh my god, we're starting from scratch with every single person because they've all been brought up on the lie that automatic instrumentation will save their life, and it won't."
Auto instrumentation is about as good as automatic commenting for your code, it's not nothing. It doesn't translate intent.
Jessica: It'll get you to the hospital, but it won't apply the defibrillator.
Charity: Yeah. It's not targeted. When you're writing the code you know what matters, you know why you did it, you know what's going to be useful to you debugging five years from now and you have the capability right there to capture that original intent and embed it in your code for future generations to thank you for it.
And relying on automatic instrumentation, it's just a very blunt instrument.
It's a great one for getting started, for bootstrapping or whatever, but it's a blunt instrument that will never actually translate your intent in the code.
Henrik: Agree. And this is what I tried to look at some materials, I didn't find any materials that were utilized so far as methods within organizations because, again, the project is still new.
But I think it would be great to have a process for companies saying, "Okay, first I am doing some testing, my app is not in production, so let's instrument at least my test cases so then when I do my testing I can at least look at the distributed traces that are being produced. Do I have any details to help me to troubleshoot my issues related to my tests?"
And then you adjust, you improve. I try to find a way of saying, "Lets have at the end of a CI or a pipeline, calculate like a t racing cover, say, "Part of all the tests that you had, you had 90% of distributed traces generated for your test cases."
And normally the test cases should be at least covering most of the risk, most of the areas of the application that will be a way of, at least before moving onto production, to be able to find a way of keeping track on how good you are in instrumenting your applications. Because I think now, like you mentioned, we all start with automated instrumentation agents, which are great, but again those will only be instrumenting existing frameworks.
But then potentially out of the framework you have a very critical method that your business is doing the calculation of prices or a payment method or risk level of a customer.
And maybe you would be interested to also instrument that part because if it fails, then in terms of business or risks for the company it's much more higher than just keeping track on if my instrument method has been open or not. You know what I mean?
Charity: Yeah. I know, totally. I feel like there's a split between black box monitoring of that type, very much it reflected the division between dev and ops, right?
You've got developers who write the code, you've got ops people who run the code, so all they can do is sit on the outside of the house knocking on the door going, "Are you okay? Are you okay? Are you okay?"
Jessica: How's the weather in there?
Charity: How's the weather in there? Right? When what you need is to blend those and to be inside the house, going, "Okay, here's where the furniture is, here's where the doors are, here's what's important."
Henrik: And also I think they're moving to the metrics, I think I'm really excited about metrics and going through open tel with metrics because now we're able to... Of course metrics, we've been ingesting metrics since years and ages.
I mean we are used to that, but we are starting to include since maybe six years, more context to the metric so then the metric itself is not just a data point on a graph, it has more meaning.
I think being able to produce metrics on a code level, to attach it to a specific context through a code execution, I mean that is just amazing.
You would then use the data differently.
Charity: I agree. Although I think that honestly metrics are a very mature technology, they've been used in production for, what? 30 years now?
And I feel like we've really come to the end of the road in terms of innovation with metrics and I think the future is really about wide events, which you append a couple of structured fields and now you've got traces where they bundle up all that context, because of how metrics are stored on discs there are limits to how much cardinality they can afford. Capturing tags is incredibly expensive.
The cost goes up linearly when you're adding more metrics. I just think that it's a dead end of a data model at this point and I'm really looking forward to seeing what we do.
Jessica: Unlike that trend about including more context in the metrics in order to hook it up with the corresponding traces and spans, because I want to know when I turned this particular corner was it wet. Open telemetry does this with the resource.
Henrik: Yeah. And then you can just search by your trace ID and then you get the metrics attached to it.
I want to say, this particular trend related to... Let's say I have a range of trace IDs and then I search for that, and then I get the distribution of the metrics related to this.
I see a lot of useful implementation where I have a bunch of people that had problems where I extract all the traces and then I search for the metrics related to this that will be a way of just... Voom.
Charity: Yeah. But see if the metric is embedded in the trace, then it's basically just a key value parent in the event, and for that, two thumbs up.
What I'm against is the metrics that are disconnected from the connected tissue of the event.
Henrik: No, no. It needs to be connected.
Charity: Okay. Then we're completely on the same page.
Jessica: Okay, yeah. Metrics in wide events of, "I'm turning this corner, I omit the event, I include whether it's wet."And then I don't have to have a detailed weather report for every corner.
Charity: Yeah. And I think this is where people get confused because I've tried to be very clear, whenever I say the word metric, I'm talking about the number with tags appended, and not the event because otherwise people get really confused.
Before we run out of time though, I wanted to talk about this in a slightly different angle which is the thing I was ranting about on Twitter last week, which is that certain members of the open telemetry, a lot of vendors have gone in on Otel, like us, like Litestep, like New Relic, like Splunk, and it should be better for everyone because if you instrument your code once then with just a couple of quick tweaks you should be able to try out different vendors, right? This is great for people.
You don't have to suffer vendor lock in, you get to try out different vendors, you make them compete for your business on functionality and effectiveness, but the problem is right now Data Dog is the main offender where they're telling their people that it's deprecated, that it's not stable, that it's not good.
First of all, these are integrations that were based on the Data Dog integrations, which they announced with great fanfare like two or three years ago.
They were like, "All right, integrations." And now they're telling people behind closed doors not to use them, that it's not safe.
There was even a guy who had a Pora class with the Data Dog collector which would let people move back and forth from Data Dog and the sales team behind closed doors pressured him into retracting it.
They're willing to support, "support", Otel in order to get data in from other vendors but they're not supporting Otel to let people move freely.
They're trying to get everyone into their locked garden and then get them on their own custom integration so that they can't move, and I find that so offensive.
Henrik: I think if you keep that position, the market will see it and it won't be very positive for them.
Charity: I hope so, they're a dominant player in the ecosystem now, so...
Henrik: Yeah. But I think now most of the people know the value of open telemetry and most of the vendors is trying to improve it, to bring value, and I think what I'm really focused, is making sure that the adoption is here because at the end, if everyone adopted, then it's a win and we are able to move and improve and cover other stuff. It's just great.
Charity: This is a situation where users are going to have to hold companies accountable, right?
It's the same any time you have an open standard or open source whatever, some kids are going to think that they can get all the upside and none of the downside, but at the end of the day you just have to say, "Is this acceptable or not? Are we going to put up with this behavior or not?"
Henrik: I think now at the moment the community is trying to cover at least the feature that will help to adopt and make the use case work properly on their environments.
But I'm pretty excited on seeing what's going to be next because I think there are a lot of things, especially with the logs. It's very consuming to build logs, log pipelines, ingest them and make it a smart way.
Charity: Man, logs need to just die in a fire, as far as I'm concerned. I'm sure logs should just be end of life, like nobody's allowed to admit another unstructured log ever.
Henrik: But I think the logs, I'm a big fan of logs to be honest because it's happened very often that the level of detail that you're looking for to be able to understand precisely a problem is only available in logs.
Jessica: That's not like you like logs going forward. That's like you like logs in the past. Is it impossible to put that detail in--?
Charity: I think we're on the same feature, I think we're just using different terms.
What I am advocating is for these arbitrarily wide structured data blocks, you can call them logs if you want to, you can call them events if you want to, you can call them traces if you want to. I don't care.
What matters is they're arbitrarily wide, they've got lots of context and it's oriented around the request of the user, right?
If you're going to do that in an unstructured way, that's just going to be a tire fire, it's going to be a mess, right? It's got to be structured, and as soon as it's structured I don't call them logs because I associate logs so much with unstructured strings, I guess. So this is probably just a terminology confusion.
Henrik: That's true. I think the main problem with the logs, I've been using FND, then I looked at Fullbit, and I said, "It's not the same way of building a pipeline."
I mean, it's almost the same way but you have to relearn or look at the new plugins available. Then I looked at Stanza and Stanza has another approach, and then I say, "Whoa."
So I think if Stanza is going to be a part of the collector, so if the open tag collector brings an agent that will be the standard, that would just make my life easier.
I don't have to one day be doing, "Oh, I'm in bare metal, Fundy is better. Now I am in Kubernetes, I need to Vendy or Fullbit."
And then people just become crazy because they have to manage different format, different things. It's just crazy.
Jessica: And meanwhile, open telemetry being part of a larger community, this is where the innovation is going to happen.
Henrik: Completely. I mean, open source for me, if you want to do innovation today you can do it in your basement, in the dark and then come in the sun and say, "Hey, I come up with this."
And then you wait for feedback and then suddenly you realize after two weeks that all you did during two weeks in your basement was wrong.
I think when you deal with open source, first of all, you're not in the basement anymore, you're open with the other people, you discuss and you can have instant feedback. When someone says, "Hey I've been working on this. You should now do this and do it differently."
And I think that's a way of improving the framework in a very fast way, and an efficient way, and I think being part of an open source is a way of being always aware of what happens, what would happen soon or maybe in one year.
If you're always in the forefront then you're always going to know what's going to happen tomorrow because you are a part of that community, you are a part of that innovation.
Charity: Well, that was quite a pitch for open source. I think we should leave it there. Thank you so much, Henrik, for joining us. This was really fun.
Henrik: This was really fun. Thank you for inviting me here.
Charity: Yay.
Subscribe to Heavybit Updates
You don’t have to build on your own. We help you stay ahead with the hottest resources, latest product updates, and top job opportunities from the community. Don’t miss out—subscribe now.
Content from the Library
O11ycast Ep. #67, Managing Infrastructure Costs with Performance Engineering
In episode 67 of o11ycast, Martin, Jess, and Liz dive deep on performance engineering. Pulling heavily from Liz’s extensive...
O11ycast Ep. #18, Real User Monitoring with Michael Hood of Optimizely
In episode 18 of O11ycast, Charity and Liz speak with Michael Hood, a senior staff performance engineer at Optimizely. They...
O11ycast Ep. #76, Managing 200-Armed Agents with Andrew Keller
In episode 76 of o11ycast, Jessica Kerr and Martin Thwaites speak with Andrew Keller, Principal Engineer at ObservIQ, about the...