Ep. #75, O11yneering with Daniel Ravenstone and Adriana Villela
In episode 75 of o11ycast, Daniel Ravenstone and Adriana Villela dive into the challenges of adopting observability and OpenTelemetry in modern organizations. From breaking down vendor loyalty to getting developers to see the value of instrumentation, they explore how to foster a culture of curiosity and collaboration to make observability a core part of development.
Daniel Ravenstone is a Staff Engineer at Top Hat with over 25 years of experience in observability and monitoring. He is a passionate advocate for OpenTelemetry, helping teams integrate best practices in observability to improve system performance and reliability.
Adriana Villela is a Senior Staff Developer Advocate at ServiceNow (Formerly Lightstep), where she helps companies improve system reliability through observability, SRE, and DevOps practices. She is also a Cloud Native Computing Foundation (CNCF) Ambassador, host of the Geeking Out Podcast, and a frequent contributor to OpenTelemetry.
In episode 75 of o11ycast, Daniel Ravenstone and Adriana Villela dive into the challenges of adopting observability and OpenTelemetry in modern organizations. From breaking down vendor loyalty to getting developers to see the value of instrumentation, they explore how to foster a culture of curiosity and collaboration to make observability a core part of development.
transcript
Daniel "Dan" Ravenstone: The problem is, is that we sometimes when we go out there and we talk to people, we get either some of the other engineers on board and they figure it out, but you still got to fight with the C level, the executives, management.
But then sometimes it's the reverse. You have C level involved, but everybody else is like, "I don't have time for that," because they're getting mixed messages or what have you.
And I think our conversations are great. We all have these conversations with each other, and we kind of already know the challenges we're experiencing. We all kind of like, "Okay, yeah, I get it. I know what you're talking about."
But we're not bringing the outside folks in to say, "Hey listen, this is your problem. This is more your problem than ours, because we know the solution.
And we're presenting it to you, but you just don't see it yet. You don't see the value of why you should be doing this. It could save you money, it could get you better, more customers or keep customer retention."
All the other little things that the people kind of forget about why we do this in the first place. I think the challenge is, really for us to get away from us talking to each other and roping in these C level folks in and say, "Hey, tell us why you don't want to do this. Tell us why you don't want to be the best of breed within your industry."
How do we get them to see the value of what we're talking about day in and day out? And that's why I like these recorded functions because it just allows for us to say, "Hey, just watch this."
I mean, I did get, actually with Adriana's video. She did a talk at Monitorama two years ago, the Anti-patterns of Observability. And I thought this is a genius way of actually kind of explaining, "this is the bad stuff to do with observability, so don't do this."
And that kind of helped introduce people who weren't even peers. And then you're trying to get into their head around, "Oh wait, so well I've been doing everything wrong." And they're just trying to think back, "Maybe I should start doing this better."
Anyway, that's introduction to how I met Adriana a couple years ago. I feel very blessed with that, thank you so much. I mean, I've learned so much from her already. So yeah, I'm going to stop ranting now, and we can kind of carry on with the show.
Jessica "Jess" Kerr: That's great.
Austin Parker: I was talking to Kelsey Hightower earlier this week, and he raised this really interesting point, 'cause we were talking about how the model of open source has changed so much over the past 10 years more or less.
And I think there's good and bad reasons for this change, but fundamentally what's changed is that people don't see open source, at a lot of the C levels, a lot of the decision makers don't see open source as an alternative way of building software. They see it as another type of product.
And I think that OpenTelemetry kind of has fallen into this rut of everyone is approaching it as a product and not necessarily as this framework that enables you to make changes in how you fundamentally think about how do your computer systems talk to each other, right.
Jess: There's a lot of nodding going on.
Austin: Yeah, I think this is going to be a heavy nodding episode.
Jess: Nodding about OpenTelemetry.
Austin: Yeah, nodding about OpenTelemetry.
Jess: Dan, you started to introduce Adriana, but who are you?
Dan: Who am I? I'm a crazy person who talks about OpenTelemetry and observability all the time. No, I'm not, no.
Jess: Welcome to the crowd.
Dan: I feel like I'm in good company here. So I've been doing monitoring and observability for decades now. I cut my teeth way back in the day learning how to work with Nagios, Cacti, OpenNMS, Big Brother.
I was asked to build out a NOC before they were actually kind of big. So a kind of cool way to learn about monitoring, that's where I kind of fell in love with it.
Jess: A NOC like a network operations center?
Dan: Yeah, which is like, we don't use anymore unless you're in telecom business, right? It kind of taught me this is the underdog role in operations, because nobody respected the NOC operator or NOC analyst.
But you were always that first person to actually identify the problem and bring it aboard. And as things started to grow and progress, we started getting better tools, we started learning more.
And then as our infrastructure and everything else got more complicated, there was still that need. And okay, so monitoring kind of had that stagnant period, where there was a lot of anger about it. We won't get into that, but then.
Jess: Monitoring had a stagnant period with anger? Okay, yeah.
Dan: Yeah, yeah, remember the pre-monitoring love hashtag?
Jess: Was it like a teenager?
Dan: Yeah, yeah, it was a teenager, yeah, yeah. It was going through that angst period, "I don't want to do this anymore."
And then we came, things actually started to blossom, and we got new tools, new concept. And then we started seeing, things started working with, well the infrastructure in the services we were providing.
And then we started into tracing and other fun things. And then eventually we even got into the whole concept of observability, this whole thought process. And I fell in love with that too.
And one of the things I think was one of the greatest, I knew we were on the right track, is when they took OpenCensus and OpenTracing and pushed 'em together and said, "We're going to just do one."
Which for once, I mean, how hard is that? But yet it is so hard. Yet somehow this community has made that happen. So we now have OpenTelemetry, which is great.
So I've been an advocate obviously for best practices and doing things properly for a long time. And I'm constantly learning new things as well because it changes every day, and there's new stuff coming out, and we're growing and expanding.
And I feel like we're way ahead of the other folks in this area. But I love talking about it, I like teaching people about it. I love talking to others about it as well.
Yeah, I currently work at Top Hat as a staff engineer. So just, I guess it means I'll tell you where I am now. Where I'm going, who knows?
Austin: Did you ever tell us your name?
Dan: Dan Ravenstone. O11yneer.
Jess: Nice.
Dan: A title I'm trying to coin for the rest of the world, yeah.
Austin: O11yneer? I like that.
Jess: Like, N-E-E-R?
Dan: O11yneer, yeah.
Adriana Villela: It almost sounds like you're going on an adventure.
Dan: Aren't we though?
Adriana: Yes. So true.
Austin: Sounds like an Imagineer.
Dan: Yes, it is borrowed from Imagineer. Because I used to be a big animation Disney fan, and they took Imagineer, and I went, "I can do this." And I also skateboard, so it has a sort of a double meaning to me.
Austin: Oh yeah. The maddest I've ever been was when I learned that Imagineer is a protected job title at Disney.
Dan: I know. I was so frustrated by that. It's trademarked, you can't touch it.
Jess: Yeah, it took something good and turned it into a product.
Austin: Hey. That's what we would like to call a callback.
Jess: Adriana, who are you?
Adriana: Oh, hello. Yeah, my name's Adriana Villela, And I think Dan and I are both fellow Canadians, so hey.
Yeah, so I am currently a senior staff developer advocate at ServiceNow, the artist formerly known as Lightstep. I guess it was ServiceNow Cloud Observability.
And I became addicted to observability when I was managing an observability practices team at Tucows. The same Tucows that you may remember, if you're old enough, the ones where you could download free Windows software.
But they don't do that anymore, they do domain wholesale. So yeah, I was brought into manage an observability team there. And I knew a little bit about observability, so then I had to educate myself properly, and went on this blog learning journey.
I learned about things, I blog about them. And that led me to becoming a developer advocate in the observability space. And I don't know.
I just feel like observability is that missing link that we need to be able to troubleshoot our systems, because we're always complaining about this doesn't work and that doesn't work. Well, now we have a means.
And so all the things that Dan was ranting about earlier, about getting that adoption, I went through the same thing at my previous organization.
One of the things that I will say that I think becomes in some ways a barrier to adoption is people fall in love with certain vendors in a way that sometimes can be unhealthy. Where they're unwilling to change how they do things, even if they're stuck with a shitty vendor because they're so invested in it, or some exec has this relationship with that vendor, and so those changes don't happen.
So those were some of the challenges that I saw when I was doing my gig in the observability practices team at Tucows. And also, it was the early days of OpenTelemetry, where traces were not GA at the time.
And meanwhile I'm telling everybody, "Hey, let's use OpenTelemetry." And they're like, "But is it even ready?" I'm like, "Trust me, trust me, it'll be a thing." And it is.
The other funny thing too is because I'm like, "Hey, you should use OpenTelemetry." And they're like, "Yes, we use OpenTracing." I'm like, "It's not the same thing."
A number of times I've had conversations with people, "Yes, OpenTelemetry, you should use this." "Yes, we use OpenTracing."
I'm like, "No, no, backwards compatible."
Austin: I'm just happy you found the OpenTracing users.
Jess: Adriana, you said something about being in love with a vendor can be an obstacle.
Adriana: It can be, yes.
Jess: Although if you have the right vendor, it can be helpful.
Adriana: I completely agree. So it's a bit of a double-edged sword, right? Because it can prevent you from doing great things, but also you have the right vendor, and it's like it fits like a glove, right?
Austin: I do wonder, because this is something I've seen very recently in OpenTelemetry especially, where I wonder if it's so much the vendor or if it's just the workflows that you're used to.
It's not necessarily like, "Oh I love," let's imagine a observability tool called Chunk maybe.
"I love the way Chunk works. And I just throw all my logs at Chunk, and I can see everything and I can search for it. And Chunk is great. And I've built all these workflows and dashboards on top of Chunk."
Or maybe someone's like, "I love these custom metrics I do in Shibainu," right?
Jess: Once you've carved the dashboards into the Chunks.
Austin: Right, you've invested so much in building things a certain way. But a lot of times what I've kind of noticed, and I think to take it back to an OTel thing, there's a pretty big discussion right now about us extending kind of our logging interface, and our logging APIs, and making logging more of a first class thing, and having semantics for it.
Because one of the things we've discovered going through the process and making instrumentation libraries, is that there are things that where it's like, okay, traces aren't quite enough, metrics aren't quite the right signal, this is some kind of instant event that has a structure.
And depending on who you ask, that's going to, some people, maybe Honeycomb people would say, "Oh, that's a wide event, or that's an Event, capital E." Some people would say, "Oh, that's a structured log."
And those people are talking about the exact same thing, but because they're coming at a different place, they're coming at it because they're a Chunk user, or a Shibainu user, or whatever, right?
They're so wrapped up in all of this other stuff that they can't necessarily get at that gooey center of, what are we actually trying to do here?
Jess: At the abstraction?
Austin: Yeah, I think Dan has some thoughts.
Dan: A couple. Actually having used Chunk before, and knowing the struggles there, I wanted to actually throw this idea out because I'm curious to hear other people's thoughts on this.
I don't know if it's just necessarily they get used to one thing, but I also feel like too monitoring and observability, and security also too, are afterthoughts when you're doing the development lifecycle.
Jess: Yeah.
Dan: So when you first sit down on the keyboard, you're not thinking about, how am I going to watch and see when this thing falls apart? How am I going to know when an attacker gets the better of me?
That's not even a part of your thinking. You can kind of start working on it, and eventually get to the point where, "Oh yeah, I got to get some logs in there."
Aside from the logs you've kind of put in there because as a developer you're probably working on your local environment, so you're spitting out log lines to your screen.
All these things are kind of afterthoughts. So I feel like the whole idea is not baked into the developer experience. And so they, when they do get used to something, they turn around and go, "Yeah, well I got some logs, and I threw auto instrumentation in, but I use the vendor whatever, I use Chunk stuff."
And then away you go. So it's not just the adoption of a new product or doing best practice. It's also changing the way they actually approach their development to a certain degree too. I think it just might be a challenge. I'm curious what your thoughts are.
Austin: I mean, I certainly will agree, right? One thing to kind of reach back into my Lightstep days. And Adriana, I think you probably know "Dev mode."
One of the big problems we saw early on during trying to get tracing adoption over was specifically that, that local workflow. Because tracing is super useful actually in a local dev workflow, but the tooling for it is miserable compared to just console.log, right, or standard out brand.
And I think a lot of people have tried to solve this in various ways, and I don't think there's really ever been a great solution. I think there are some really cool, I will say shout out to tools like otel-desktop-viewer, otel-tui, otel-T-U-I.
There's a few things out there that are really trying to kind of really build that, "Hey, here's how you look at tracing, and here's how you kind of do observability more at the local dev level." But there's still such a gap, I want to say.
Jess: Between how the software is built and the expectations of it we have in production.
Austin: Yeah, but I think specifically it's the, how tactile logs are as a dev.
Jess: Yeah, oh, okay, 'cause we're so used to them, they're right there.
Austin: They're right there, but also, it's such a clear expression of what I'm trying to do, right? When I'm trying to understand I can do console.log, got to here. And then I see printout, I'm like, "Okay, I know where the code has ran to."
Jess: Right, it's so fast.
Austin: Yeah.
Adriana: This brings up an interesting conundrum, because I think we expect through observability driven development for the developer to instrument the code because it benefits further downstream.
But I think that the conundrum is the developer doesn't see how that instrumentation benefits them, because as Austin said, we're used to using logs for everything. So you have to have that mindset shift around, how do you get the developer to start getting used to the idea of troubleshooting their own code with traces?
Jess: Yeah, how do you bring that benefit sooner? How do you bring that benefit to make it immediate?
Adriana: Mhmm.
Dan: And this is kind of the question I've been trying to sort of been wrestling a little bit with too, is 'cause I think getting some of these concepts into their hands.
So to go back a little ways, I remember when I was brought into one company, and I was asked to revamp the entire monitoring infrastructure. This was before tracing became a big thing. Kubernetes was just becoming a thing.
We just had Prometheus and it was just got released, so we had to redo everything. But one of the things I did kind of do was get a centralized logging, and I put in an elastic cluster, put it all in.
And the first thing I did is I took it and went to the developers and their manager. And went to their manager and said, "Here, check this out."
And as soon as he started to see what was going on in his applications, through the logging with all his different logs, he started to see the value of actually putting this stuff together.
So, I think getting tools, or I think what Austin just mentioned, into the hands of the developers, seeing how tracing might be able to help. But the thing is, I think we're sort of venturing in new areas where we're not even sure about whether or not they'll work.
But getting them to start thinking and seeing how there's value into, "Oh well, if I put this in now, and then I could actually get all that kind of, I can get the auto instrumentation stuff out of the way. I can kind of get, start focusing on, oh I know where there might be problems here, here, and here."
Because I know my code. I can know where it might go sideways or this function might get a little flaky when it comes to this situation.
I'm using very broad terms here. Partially because I'm not a developer myself. I don't know anything about code. I do, but I don't. I know enough to look at it, but not enough to debug it or to support it properly, so.
But yeah, that's where I was thinking that there's a huge challenge here, how to get them to see that value. And I think you've raised a couple of good valid points, and it's just how do we keep pushing that forward.
Adriana: Yeah, I think to push it forward is, I think you just have to continue being a squeaky wheel. You have to nag people to death to a certain extent. You know what I mean?
It's where that internal advocacy comes into play, where you're going to have to, you got to convince the folks up top, got to convince the individual contributors.
And I feel like you can always find an individual contributor who's super enthusiastic, who's like, "I tried the OpenTelemetry, it's so cool. Let's play with it."
So having those allies can really, really make a difference, right? Because it can't be just up top, it can't be just the ICs. Because otherwise it'll just fall flat, right?
'Cause you don't want to be told what to do if you don't believe in what's being done. But also, if you're the lone IC doing the instrumentation, no one gives a shit about what you're doing.
Jess: Yeah, we have to bring the benefits closer to both the people who are implementing it, the ICs, the developers. And also we have to bring the benefits closer to the CTOs, the directors, the people at every level who can help or hinder.
Adriana: That's right.
Jess: And we can talk to each other all day about how wonderful this stuff is because we all love it. That's why we're here. But we need to enlist not even our enemies. We need to enlist the unconcerned.
Dan: The apathetic?
Jess: The people who just have other things to worry about.
Austin: Something I've been thinking about recently is how much code. If you have an AI assistant and it's writing code for you, what is the observability of that code, right?
'Cause even before we had AI to automate copying and pasting from Stack Overflow. If you copied and pasted something from Stack Overflow and you really didn't know what was going on, right, you didn't really have a great understanding of it.
If you had a sufficient understanding of the problem domain, then you probably wouldn't be copying and pasting from Stack Overflow to begin with, unless you're lazy like me.
In which case, you're still... Bugs exist. The only software with no bugs is a software that has not been written yet.
One thing I would love to see us decide as an observability community is to maybe get over ourselves a little about what is observability.
Can we just kind of declare dentate? Can we all say, you know what, we're all on the same team, we're all rowing in the same direction. There's a lot of ways you can get to the result, but the point of observability is we want to understand what our applications are doing.
Cool, we all agree on that. Let's figure out the best ways to make that actually happen for people, and not be so like, "Oh, you have to do it exactly this way or have to do it exactly that way."
There are things you can do, there are technologies you can use, there are strategies you can employ that'll be more appropriate or better results. And there are strategies that maybe won't get you as far. But that's what they are, they're strategies, right?
And you're going to make trade-offs no matter which one you pick. So let's all kind of get to that point and say, cool, we all agree here. How do we start focusing on these important things?
On stuff that is more important, like making it easy for that apathetic checked out nine to five. I don't want to say "checked out," but someone that doesn't really feel like they have to care about all this,
Jess: Right, they're concerned with other things. And the same strategies or language or tech or tools don't help everyone. You are the person in this situation, if you're div observability team leader, advocate at your company, you understand the benefits and you have to understand the people you're trying to convince.
And what did they care about. If what they really love is looking at logs in Chunk, you have to say, "Oh yes, it's very important to be able to see your logs like that. We need to make sure you have that, and that you can scale up your work even further with the addition of analysis, of aggregating your chunky logs."
Adriana: I think part of it boils down to you can't tell them that their baby is ugly.
Jess: Yeah. Right, your baby is starling, and wouldn't it look great in this hat?
Adriana: Yes, exactly, exactly.
Dan: Yeah, that's a huge challenge though, I think. And it's what I've been trying to do myself a lot, is trying to engage with those apathetic users to say, "Hey, I know you're busy with other stuff." I don't like the word apathetic, but it seems like the only closest thing for what we're trying to achieve.
Jess: It's not that they don't care about anything, it's their apathetic toward observability.
Dan: Yeah, their apathetic towards observability because they don't see the value of having it in there. There is that challenge of sort of saying, "Hey look, there is value in having. This helps with you better understanding when your surface goes a little bit sideways."
And this may help when they, an RCA, when you're doing a five whys of root cause analysis of a situation, you go, "Oh, well now we know what the problem was 'cause we were able to get to it faster and these kind of things."
But it's still, it's harder. But you can see this with them, unless management dictates to them that they need to start looking at it, I feel that there's a bit of that challenge as well. So if leadership has, "Well we're focusing on this particular direction. These things are important too, but this is the direction we're focusing on."
I find those types of folks tend to be, "Well, that's what we're supposed to be working on, we're working on that." Yeah, I know that's important too. I know that brings value, but we're told to work on this. And this is our priority, and I only have so much time.
And of course then he gets told, "Well, I only have so many resources, I don't have so much time. So I mean, I understand we need to get that done, but we have to work on this first 'cause this is a feature request or whatever."
And so it helps to sort of provide, get that from the leadership perspective, that value add of why they need to also make this a priority.
Because the heavy lifting, you only have to do kind of once. If you keep it up, it's really easy to keep moving forward. But 'cause this is evolving.
Jess: Yeah, because the moving forward is as part of this feature request. How will I know this feature is ever used, and how will I know that it's working?
Dan: Yeah, exactly. But it's getting those things to line up with folks, and getting the language, I think we have the right language, and we have the right wording, we even have the right tools at our fingertips these days.
I feel like we have a lot of things that we can say, "Here, take a look at this, take a look at that." We have so many examples now that we can look at. We have so many discussions on this.
I mean, there's a lot of podcasts on observability, SRE, DevOps, all these best practice, all these good things. Adriana herself has done a whirlwind tour, who talks about the best packs of all time, and knows what she's talking about.
I feel like we're still banging our heads against a brick wall some places. I don't know, maybe it's just I'm picking the wrong places to talk to.
Adriana: I think some of it boils down to some of the culture around it, right? There's two things that come to mind. So one is the culture.
So I'm one of the maintainers of the OTel end user sig, and we have end users come talk about how they're using OTel out in the wild.
And we had one end user last year who came and talked to us about how the company that they were working at has this culture of observability.
And she addressed one thing that is a huge pet peeve of mine and that I encountered personally, which is the fact that when I was running the observability practices team, our team was basically told, "Well, can't you just instrument the code for the developers?"
I'm like, "Hey?" Like, "What?" And I thought it was really cool what she said about having this observability culture in her organization because the mandate came from up above saying, "No, developers must instrument their own code."
And so that made a huge difference, because it's like, well that's the mandate. So you can't run crying to your manager and ask for the observability team to go instrument your code for you. Which when you think about it is ridiculous.
Jess: Yeah, yeah, and suddenly the observability team, which does have the ability to make it easy for people to instrument their own code, suddenly there's a demand for that.
Adriana: Yes.
Dan: Yeah, I feel like you have to kind of take care of your own things too. When you clean someone else's house, you're not going to know all the different idiosyncrasies that a person has of their own house.
So you'll go in, you do the basics, you clean up, vacuum, mop, dust, what have you. But they may want certain things done, given the extra special care. Or there's certain areas where they know the nooks and crannies of their home, and so they'll know how to do a better job in their own place.
This is their code, they will know how to do it. Probably a horrible illustration, I apologize, but.
Jess: Oh no, no, it's great, it's great. Because yeah, the cleaners can do the floors and some of the surfaces. But some of the surfaces are really hard because they're still covered with all your crap.
Dan: Yeah.
Jess: Whereas when I clean my house, if I go help my kid clean their room for instance, not only does it get cleaner and some of the crap gets thrown away, because in the process we learn something about the code and we make it cleaner.
But also the next time they're like, "Ah, ah, I need scissors," I know right where they are.
Adriana: Exactly. And the other way to look at it too is, and I love using the quote from Liz Fong-Jones when she was on "Call Me Maybe," Where she says, you won't ask someone to write comments for your code, right? That's a you thing, that's not a them thing.
Dan: Exactly.
Adriana: Right? Yeah. That's just weird. What, you're going to ask someone to write your unit test for you? You're going to ask someone to write your logs for you? They dunno what's important.
Jess: Yet I could get a hundred percent test coverage on somebody else's code, but it's not going to be well tested.
Adriana: Nope.
Austin: I think you actually raised a really interesting point, that kind of goes back to sort of these workflows and what we build on them. And that really takes it back to that developer loop, right?
Because it's one thing to ask someone, "Hey, yeah, you need to log what your function is doing, or the parameters to this request."
And then you do that, you say, "Okay, cool." Save, reload, send a message. And then I just see, it pops up. The button-action connection is super, super tight.
But if I go to that same person and say, "Okay, now you need to think of this as part of a bigger system, and you need to think about this request as part of a series of mini requests. And it's going to and fro, and you need to think about fan out, fan in."
I think in some ways tracing is a really good way to see who's going to, really tracing is go into a room of engineers and say, "Okay, who is actually a systems thinker?" And see who raises their hands and then say, "Okay, how many of you like distributed tracing?" And then all the people that lower their hands, you found the five people that actually do like systems thinking, right?
They overlap that Venn diagram. Because it is really hard. We probably shouldn't sugarcoat it and say, "Oh, this is easy, and if you don't get it that's a you problem." It is challenging.
Jess: It is. And it's an expansion of what we're asking people to think about and the level we're asking them to think at.
Austin: Yeah, and the potential to get it wrong, I think is, it's a lot harder to screw up a log I feel like. You can make a bad log really easily.
It's really hard to make a really good log. Or it can be. It's probably less hard than making a good span, but the floor to even write that bad span is much higher than the floor to write a log that is useful. Maybe not good, but useful.
Jess: So we need to make failure cheaper. In addition to making success easier and also felt closer to you, we need to make failure cheaper. So if your vendor is charging you for attributes on spans or the number of values in that thing you just made a metric for or whatever, that's scary.
Some companies have, if you're going to add any metrics or observability attributes that could cost something, it requires a review from a whole nother team. Not going to happen then.
Dan: No, because then you're just adding sort of layers of complexity that, and especially in the development life cycle, you're not going to want to do.
Jess: Yeah, it's already hard enough.
Dan: It's already hard enough. And I agree with logging is actually incredibly hard. I mean, I don't know how many times I've seen horrible logs in my career. And I think we all have.
And I'm constantly surprised that there's such bad logs out there. And not just from internal systems either. I've seen some vendors come out. Like, where did you learn to log?
Jess: Well, that's the thing, you don't have to learn to log, you do have to learn to trace.
Dan: That is true.
Austin: No one ever teaches you either, right? I still remember college, I still remember CS, I don't remember any classes on how to write a log or a metric.
Dan: So that's what's missing. We got to go back to the education system. We need to get in there, be teachers, professors, and teach the young.
Jess: The mathematical culture of programming says that inputs and outputs are all that matters, and you should issue side effects including logging.
That logging and tracing are much more about the engineering culture of programming, of really getting things done in practical systems.
Austin: What I've seen, at least from talking to people, is that you do learn this on the job, right? And you tend to learn whatever your first employer does.
So if you go into a big company that has a decent development framework or service framework, then cool, they're going to have something for you. And then you're just, when you look at the code, you're going to see people are using logger dot whatever.
And then you're going to have to be like, "Okay, well I guess I should do that too," right? And whoever you're pairing with, or your mentors, or someone in PR review will tell you, "Oh, you should have done it this way and not that way."
But it very much is a learn by doing sort of thing. And whatever that first formative experience is, that's going to be what sticks with you. And OTel does present a really different sort of approach to this.
And I think that in some ways we don't do a great job of really pitching that better way. But then I think part of that is because it's really hard to understand what you get unless you kind of see the vision and you can hold a lot of things in your head at one time.
Jess: And they're not already full of all the other features you're holding in your head.
Adriana: Yeah, it can definitely be tricky. One thing that I wanted to mention too is, in terms of OTel adoption, something that I also experienced at Tucows. 'Cause when I joined, as I mentioned, traces weren't even GA.
And so people are really nervous about adopting it as a result, right? 'Cause they're like, "Well, you're telling me to adopt this thing that you're saying is going to be a big deal, but not."
And one thing that really helped was, I actually, 'cause we were considering a couple of vendors at the time. We were looking to move away from the vendor that we were using at the time. So we're talking to both Honeycomb and Lightstep.
And I had reached out to reps from both and I'm like, "What if we had Liz and Ted just come and do, Liz Fong-Jones and Ted Young, just put the vendor hats aside and just come do a Q&A with the folks at Tucows. To just answer their questions, address their concerns around OpenTelemetry adoption?"
And that went surprisingly well, better than I expected, I honestly thought, I'm like, "Oh my God, my team is going to have to ask some questions because there isn't going to be enough interest."
And it was a full house. We had questions. We had them for an hour. They came and they answered all the questions, and I thought that was so beneficial to have somebody to, with firsthand experience, representing two different vendors.
So sorry, they were from two different vendors, but they weren't representing the vendors, they came under the umbrella of OpenTelemetry and vendor neutrality. And I think these kinds of conversations are so valuable and can really help to calm some of the nerves.
So it would be nice to see some of that. But I think the challenge, both Liz and Ted were very nice to have made time in their busy schedules for that. But to have some sort of means like that in the community to, so that we can help just calm people's nerves when it comes to adopting OpenTelemetry.
Dan: I think that you were touching on a very interesting point, because I've struggled with that over my-- 'Cause I used to get really hung up on certain vendors, and then I realized after a while, just because this one place worked well with this particular tooling doesn't mean that all places will work like that.
So you have to grow. So I learned to continuously keep up to date and grow. If I could share a little quick story. I went to this one place. And again, actually it's back to the same place where I went over there, brought me into revamp their monitoring infrastructure.
And I brought in logging, but before, on my first or second day, I walked in there. Now, they had just migrated a bunch of images into AWS, they just started using the cloud. So they're out of the data center, they're into the cloud, all that stuff.
And the assistant manager who's been there like two or three months, came over to me and says, "So how are you going to monitor these things? You're going to use SNMP?" And I looked at him kind of funny, like, "What? Why would we use SNMP?"
And the reason why I'm bringing this story up is because he was still thinking about what he knew as a total way to monitor systems. And then he got really angry with me because I said, "No, I'm not going to use SNMP. Doesn't make any sense, we're in AWS."
Jess: You just said his baby was ugly.
Dan: I may have. He did quit two months later.
Jess: Oopsie.
Dan: He did, but yeah, it wasn't because of me though. It was because he actually got really angry by the fact that my boss did not want our images to be, to have SSH access. He wanted everything to be immutable.
So if something happens, close it down, bring another one up. Simple as that. He didn't like that thought at all. He was very old school. He would SSH into all the different web posts and have literally, tail the Apache logs and things like that.
So very, very old school in his thinking, and never obviously did not grow as the times have changed, as the tools change, as the environment's changed.
Even our services have fundamentally have changed. It's no longer a three-tier system anymore. We've got microservices upon microservices, and wrapped around monoliths.
Jess: It turns out the way you learn to program is not the right way to program.
Dan: Yeah.
Jess: It's not the best way to program. It is a legitimate way of programming. It can still cry, and eat, and poop.
Austin: Yeah, it goes to a bigger problem, I think in software in general, right? Where it's just really, really hard to kind of keep track, to stay on the pace, right?
And maybe the past decade plus of ZIRP and other fun things changed how we think about--
Jess: That's zero interest rates, right. Which leads to a lot more investment, a lot more software being written.
Austin: Yeah, but if you think about it, every week, every month there was the new thing. Every year it's like, "Oh, I want to."
I mean, heck just this year I was like, "Oh, I need to write a library in TypeScript. Let me Google for best TypeScript starter library," and then add 2024 after it because I know that depending on how far back you go, you're going to get a different answer every year, right.
And that's sort of that hyper accelerated lifecycle for products and technologies, just kind of consumed the industry, I think. And we're just now almost getting to a point of relative stability.
Jess: Austin, that just means you're not looking hard enough. You're just not up on the latest.
Austin: Well, no, because all of the churn and chop is all happening over in, hmm, the AI space right now, and is passed over the, us mirror backend engineers or full stack engineers, right.
Jess: So there it is, you're too old.
Austin: Yeah, that's actually the truth, yeah.
If you're someone that's been in industry for a while, if you're mid-career or whatever, yeah, you're probably going to see all the new stuff and be like, "Well, why do we need the new stuff? The old stuff works."
Jess: Yeah, logs work for you on your machine when you were the one who wrote the log.
Austin: Right, and it requires a pretty big lift actually. And it requires a lot of personal responsibility and responsibility shifting in some ways for people to really start to understand it.
I saw this at a job a while ago where they laid off their QA team. And the question was, well who's going to do QA now? And it was like, well, the developers have to test their software.
And this look of sheer panic of, "Wait, what? I don't know how to do that."
Jess: Wow.
Austin: It's like, well I guess you better learn, 'cause there's no more QA team.
Adriana: But--
I think that pretty much exemplifies the nature of software, right? It's changing so rapidly, you kind of have to either keep up or you kind of, you get left behind. And pretty soon your way of doing things ends up becoming so woefully outdated that it ends up doing you a disservice. It might be worse and do your team a great disservice, because you're unwilling to make that paradigm shift. But it's hard.
That's not to say we can't put the people who are in that position, because it is so hard, especially when you have seen so much success doing it a certain way and then all of a sudden it's like, eh, no longer.
Jess: And when what you're asked for is something different, is to deliver features. But it's interesting, my favorite thing about observability is that as a developer, when I'm building stuff, I get to find out what really happens.
I get to hear the end of the story. It's not just, "And then I pushed, merge, and lived happily ever after."
Dan: Because that always happens, right?
Jess: As far as I know.
Dan: It's about those engineers though that don't. And again, I feel for them because it is hard to stay on top of things, even for us.
So even in our world, staying on top of the changes that are occurring, looking at some of the new tools. 'Cause it's exploding in certain areas. You got--
And then with this whole concept of AI, how this can also, we're using AI to help with observability and things like that. Which I don't think it's mature enough yet to do that yet, but I mean eventually it might. I mean, we're talking about pattern recognition.
If you got a lot of cardinality and pattern recognition going on, it'd be kind of easy to, it'd be nice to have something that can actually do that for you without you having to look at it and go, "Okay, I can see where we're going sideways."
Jess: Well, Copilot is getting better at helping me instrument my code. It's still completely wrong, but it's getting closer.
Dan: Exactly. I would use Copilot completely, but that's just because I'm lazy and I don't know how to code.
But I think the trick is, for us, we are struggling. So I can't imagine how those folks who have been in the industry say. But I mean, and then also possibly at the same place for a decade.
Jess: Yep.
Dan: Right, and they've been. And there are people you come across who, engineers and developers who have been at the same place for a decade, who got there maybe at the beginning when they were small. And so what they've built in there has been now become indoctrinated, it's part of the cycle.
And I always wondered to myself, why can't you just swap that old code out and put something brand new? But not realizing, this is when I was first getting into this business, that's how I used to look, thinking like, why are we doing this? This is such old school.
But now learning that it's not as easy as to say, okay, well yeah, we'll just put this in a newer code base and away we go. There's a lot more involved now. But then on top of that, also learning new concepts to add a new way of thinking to, oh, I can actually debug my code a lot easier with these types of tools and to push it from this mentality than I did with the old way.
Right, I have to do this, do this, do this. And then compile it and wait. And then maybe even push it out to a station environment to even get to push some live traffic towards it or fake traffic.
Whatever the case may be, so. Yeah, I get it, but it's like, how do we encourage them to see that there is a easier way to do this and we'll take a little bit of that load off of their shoulders?
Jess: Yeah, how do we draw them into participation in this wider system where they get feedback from production?
Dan: Exactly.
Adriana: I think it almost becomes a, you got to experience it for yourself. Start with a small, "Look what happens if you try to instrument your own code and see what happens." Experiencing it for yourself. I think it speaks volumes to be honest.
Jess: And once it's their idea, once they start having ideas to build on it, then they're part of it.
Adriana: Yes.
Dan: Totally, very true.
Jess: Okay, this is very exciting, but we're running out of time. Okay, I need all O11yneers to have a final thought for this o11ycast.
Austin: My final thought is that, gosh, ain't this stuff hard? I think there's so much opportunity for people to build ways to make it easier, to build really cool stuff on top of Otel, and to just, and give it away, right?
Do not hide your light under a bushel. Let the world see your O11yneer selves.
Adriana: My advice is always be curious. Being curious, I think leads to wonderful things. I think that's what led me into observability and OpenTelemetry.
And piggybacking on what Austin said, there's always ways to improve OpenTelemetry. And if you think there's a cool way to improve it, I think go for it. And please, contribute back to the project.
Jess: Yeah, 'cause observability, it is very much about shining a light, it is very much about asking questions.
Adriana: Mh-hm.
Jess: My thought is that, Dan mentioned people like to tack on observability and security afterward, but these are emergent properties.
And it's like you try to bake the cake and then sprinkle on some observability. And that might look pretty, but it doesn't tell you what's inside the cake.
Because the emergent property of observability and security and like that, it's not icing. It happens when the egg and the sugar get heated up together and combine. It happens during the building of the system.
And also it's, I really like it when you put the sprinkles inside the cake, and then they kind of spread out, and the yellow cake turns into a rainbow cake on the inside. Recommend.
Dan: It's true though. I can't argue that. I mean, it is not an emergence thing, it is something you do at the beginning.
You get it all mixed up together. You bring it out and you get these colors. And I love the fact that you're all embracing O11yneer. I appreciate that. Make it a thing.
There should be senior O11yneer, staff O11yneer, principal O11yneer. These should be real titles.
Jess: But no trademarking it.
Dan: No trademarking, please. This is part of the open source community.
Adriana: Love it.
Dan: My thought is that, I think we need to still have conversations. I think we should keep pushing, keep talking about it, keep encouraging. There's value here. So yeah, O11yneer for life.
Content from the Library
O11ycast Ep. #76, Managing 200-Armed Agents with Andrew Keller
In episode 76 of o11ycast, Jessica Kerr and Martin Thwaites speak with Andrew Keller, Principal Engineer at ObservIQ, about the...
O11ycast Ep. #74, The Universal Language of Telemetry with Liudmila Molkova
In episode 74 of o11ycast, Liudmila Molkova unpacks the importance of semantic conventions in telemetry. The discussion...
Machine Learning Model Monitoring: What to Do In Production
Machine learning model monitoring is the process of continuously tracking and evaluating the performance of a machine learning...