Ep. #62, Adopting OpenTelemetry with Doug Ramirez of Uplight
In episode 62 of o11ycast, Jessica Kerr and Martin Thwaites speak with Doug Ramirez of Uplight. This conversation covers many aspects of adopting OpenTelemetry including integration concerns, social challenges, building trust, and running it at scale. Listen in to find out why after 35 years of diverse experiences across several industries, Doug’s excited about observability.
Doug Ramirez is currently Principal Architect at Uplight. Doug’s expansive 35-year career began as a software developer at General Electric in 1990 and has included roles in entertainment, finance, healthcare, and manufacturing.
In episode 62 of o11ycast, Jessica Kerr and Martin Thwaites speak with Doug Ramirez of Uplight. This conversation covers many aspects of adopting OpenTelemetry including integration concerns, social challenges, building trust, and running it at scale. Listen in to find out why after 35 years of diverse experiences across several industries, Doug’s excited about observability.
transcript
Jessica Kerr: So Doug, you're sitting in front of a whale and it reminds me of the company you work for. You work at Uplight, right?
Doug Ramirez: Yeah.
Jessica: Tell me how Uplight fits with your ambitions as a human being.
Doug: So I've been at Uplight for about a year and a half. I am really fortunate and I have a ton of gratitude for the experiences I have had coming up on about 35 years, starting out as a computer programmer and having held lots of different positions as individual contributor, leader, manager, owner of a business, et cetera.
During that time, I've worked with pretty much every industry out there, entertainment, financial, manufacturing, healthcare, and one of the spaces I've never really spent any time in is the energy space. I learned about Uplight and quickly became very interested in its mission, and it's to help save the planet.
We do that by helping utilities manage their grids efficiently, and ultimately making sure that the resources that we're using to cool and heat our homes and buildings is done as efficiently as possible, and as minimally as possible.
So I think that for a lot of people that immediately can resonate with you as a human being living on Earth right now at this time, and being able to take my skills and experiences and apply that to that mission is like a perfect combination of things for me right now.
The other thing that I really, really like about Uplight's mission and my ability to help them is also Uplight being a B corporation and B Certified Corporation, and I think that that was very attractive to me as I learned more about what a B corp is.
Jessica: Why do you say B corp? It's not the kind of B that Honeycomb talks about.
Doug: I think what it is, the way I would describe it in a nutshell, is to say that it's a certification that businesses pass that proves that they're using business as a force for good.
Jessica: Ooh, cool.
Doug: And there's a number of criteria associated with that and we, Uplight, are a B corp and I'm really proud of that and I'm proud of our mission. This is great for me to take the decades of experience and apply it in a completely new way.
Jessica: Perfect. And in your 35 years of very diverse experience, now you're excited about observability?
Doug: Yeah. I mention this story when people ask me about observability, and I think it resonates for pretty much anybody who's doing what we do, write code, building systems. For me, observability reminds me of the excitement that I got the first time I started writing software.
For me, it was adding some numbers on a teletype that was connected to a 300-baud modem that was connected to a very large computer on the grounds of the University of Virginia where my father was a statistician and a math professor. I think it's hard for people like us to explain why we get so excited about that, that immediate sense of feedback and gratification you get by getting the machine to do something for you that you want it to do.
For me, observability, especially these days and the type of environments, the type of tech stacks that we have to build things, it's hard to see what we do because our software gets shipped to places that we can't see, that we don't know where they are. Having observability, to me, brings back that excitement and that gratification you get of saying, "Oh, I asked the machine to do something. I gave it a set of instructions. I want it to solve this problem. I want it do something with this piece of hardware. Now I get to see it actually happening."
And so going back a long time ago, before 35 years, I was a little kid doing that program on the teletype, today I get that same level of excitement when I ship services and APIs to somewhere in the cloud, but then I get to watch it run. So that's why observability is very interesting to me.
Jessica: That's so cool.
Martin Thwaites: Yeah. That feeling that you get right in that first Hello World application, and you hit Run and then it prints out Hello World, and then you make it dynamic, and then you add your name, and you put in your name in the command line. Then it comes back and it's that rush of excitement that I don't think we really get from just writing a bit of code, and we write a bit more code, until we actually see it.
Doug: Yeah, and for those of us that write, I'm primarily lately have been exclusively building backend systems, systems integration, APIs, services that enable things that people ultimately see. It's very, very rare that I work on or touch a piece of code that a human interacts with. It's mostly machines.
So when you deploy something like a new API or a new service, and you watch other pieces of software starting to interact with your software, you don't even know who these people are, you don't know what software they're even using. You just see their requests coming in. To me, that's also really exciting too, it's like, "Ooh, I built something and I can see it running, and I can see people using it. This is awesome."
So for me it really does go back and scratch that itch and elevate that excitement that I had, a lot of people have when you just first start writing software. So I'm a big fan, obviously, and I'm very passionate and very excited about it.
Jessica: Yeah. It's like some people really enjoy Raspberry Pi and robotics and internet of things, and anything that makes something physical that you can see. Other people really like UI development because it's something you can see and interact with as a person.
Then those of us who work on abstractions within abstractions that are summoned by other abstractions, it's not too physical until you get to look at that trace waterfall and count all the trace waterfalls. Then you can celebrate them.
Martin: I think graphs is our frontend, isn't it? The building of the graphs and the building of the heat maps. We have a channel dedicated to just the art that comes out of building graphs based on your telemetry.
Doug: Interesting.
Martin: It's such an interesting thing to see, like, "Ooh, my system created this thing." Maybe that's our UI, that's what we see, that's our visceral thing of seeing something.
Jessica: It's a window from the social to the technical half of our socio-technical system, and it knits us together.
Doug: I'm probably jumping ahead a bit here, but one of the things that also interests me about observability and following along with that knitting analogy is that with standards and open specifications, I can actually see how my software, my Hello World, my thingy actually is part of that fabric of the ecosystem of things that we build.
Now I can start to see calls coming into my service, but I can actually see calls that are making it to another service and we can view the interactions between all these myriad of services by having distributed tracing and by having these visualizations of how everything is working together. So that's the other thing that I get really excited about as well.
I can say to somebody else who might be building up a frontend or a backend to a frontend or whatever it may be, if we all agree to emit our observability signals this way then we can all let it land in some place and we can see how everything is starting to work together.
Jessica: Perfect. You can see your code's place in the wider software system. Excellent. And of course that's going to lead us directly into OpenTelemetry. Before we do that, tell our listeners who you are.
Doug: I'm Doug Ramirez, I work at Uplight, I am a principal engineer and architect of our data platform. I've been with Uplight for about a year and a half, prior to Uplight I have spent several decades in the software world. Started out writing code at General Electric, started out at a school in 1990. I had a dumb terminal in my office that I wrote COBOL on paper.
Jessica: Wow.
Doug: And then I typed it into a 80x24 character terminal using a line editor. I don't know if anyone remembers line editors, but essentially you edit a character or a set of characters at a time on a line on a terminal. Things have gotten a lot better since then.
I've held lots of different positions in the software world as an individual contributor, analyst, manager, director, head of things, owner of a consulting business, and I am now an individual contributor and an architect, and I'm really lucky to be able to take 35 years of failing a lot and succeeding sometimes and trying to help Uplight succeed its mission.
Martin: Failure is the true educator.
Jessica: Great. So you just brought up open standards for observability for fitting everything in. You brought OpenTelemetry into Uplight?
Doug: Correct. I think I should qualify that by saying it's harder than it probably should've been, harder than I wanted it to be. But hard in the sense that, like a lot of big shifts that are introduced to a large software organization, there's always going to be some time and conversations and challenges along the way. But we have adopted it, it is our go forward observability way, and it's proving to be very, very successful.
We have a long ways to go, there's lots of things I want to do with it, but I'm happy to say that to a certain degree, we're an OTel shop, our software runs and it let's the rest of the world know what it's doing by honoring the observability signals and the specification that OpenTelemetry provides.
We can land all those signals into platforms like Honeycomb, and we can do interesting things to observe how all the software is working, in ways that we couldn't do before. So it's proven to be really, really beneficial.
Jessica: Was that a social kind of hard or a technical kind of hard?
Doug: Social. I think that fortunately for us and a lot of other organizations, that OpenTelemetry project was able to leverage some existing projects that had been around before, like OpenCensus and OpenTrace. So I think that the technology is mature, it's well thought out, and the support that the community gives via things like the collector, the SDKs, et cetera, I think that solved most of the technical challenges we had.
I think for us it was more around this idea of trusting the specifications, trusting the community, and being okay with using some pieces of OpenTelemetry that were not as stable as others. The log specification, I think is one thing that required a lot of conversations. I think up until recently it hasn't been marked as stable. I can't remember, I think the log spec, the log data model is now accepted.
I don't know about the SDKs, but there was definitely some concerns around the maturity of OpenTelemetry, even though it benefited from the maturity of OpenCensus and OpenTracing. So I think the newness from some people's perspective was a concern. I think that for some people just the idea of leaning into an open source project and not integrating directly with an APM vendor felt a little strange.
I think some people mightn't have had that experience before. I think that there was also just this idea of... I don't know how to call it, but it's almost like the fear of the least common denominator. I think people felt like they might be constrained by being 'forced' into a specification.
As we talked about it and as I talked about it, and as I evangelized this with the folks internally, I think we all started to realize that in the absence of OpenTelemetry if we had come up with our own specification, it would probably look a lot like the specification that OpenTelemetry came up with.
Jessica: And it is extensible. That's interesting. OpenTelemetry benefits from not being the first specification in this space, at least for tracing there was Jaeger and OpenTracing and OpenCensus before it. Somehow the people involved in those communities actually came together to deprecate theirs and go with this common one.
Martin: And that is the true triumph of OpenTelemetry, is two open source projects agreeing to come together and be one. The whole, "Lets just create a new standard because we need to unify the previous two standards." And what do we have? Now we have three standards. But we actually have one, OpenCensus was recently deprecated.
They started to close things off. Jaeger has sunsetted their protocol, the THRIFT protocol. That is one of the true triumphs of OpenTelemetry, was these projects going, "Yeah, we're out. That one's better and we're going to take the best bits from both of them and bring them together."
Doug: Yeah. So the social challenge was taking this leap of faith. Asking people who had already done a direct integration with an APM vendor, asking them to refactor their code, to trust that the community, the specification was well thought out, to trust that the collector, the SDKs and all the tooling was there. I think to a certain degree, to trust me to say, "I am confident that this is going to provide benefit, even though we haven't done it at scale at Uplight yet."
My experience, my intuition, I became very, very confident in what I saw as the vision for observability at Uplight with OpenTelemetry, and so part of what I had to do was simply to ask people to take a leap of faith, to assume some risk, and to trust that this was all going to work out. For some people, that's hard to do.
I've been on the other side. I've heard people like me get excited and evangelize things and get super passionate about it, and it hasn't always worked out. So having some healthy skepticism is good, it definitely forced me to really think and double check my math in asking people to join me on this journey.
I think the leap of faith, I think that was the hard part, probably was the hard part for most people because they hadn't seen it running at scale, they may not have been familiar with the project, they might not have seen the previous projects, or even have a lot of familiarity with the Cloud Native Computing Foundation, and they already had something working. So I had to ask them to go and break it and make it better.
Jessica: But make it better in a way that's compatible with everything else at Uplight.
Doug: Correct.
Jessica: I read somewhere that Uplight is built out of many smaller companies.
Doug: Correct, yes. Uplight is a couple years old, the legacy companies as we call them are much older than that. But the Uplight brand, it's a collection of companies that are all trying to solve the same macro problem in a way that's very complementary. So this wasn't about collapsing companies together and then putting all the customers onto a platform and then deprecating the previous products or brands.
This is more like, hey, these are all pieces that are all working together to solve a problem, whether it is a white label marketplace to get smart devices or a machine learning model that understands the physics behind a building and how to pre cool it and pre warm it so that we can help the utility manage their grid efficiently.
So all of these different things were complementing each other, and so our challenge has been to bring everything together and get our platforms to work together and integrate with each other.
Jessica: So your software is built out of a bunch of software that was written by different companies, probably in different languages, at different times.
Doug: Yeah. Correct.
Jessica: That sounds really challenging.
Doug: It is. It is.
Jessica: Yeah. So you can't declare, "Well, we're a Java shop." Not when you intend to keep using a bunch of software that was created in many different languages, but you used the phrase earlier, "We're an OTel shop."
Doug: Yeah. I think that's one of the things that I find really powerful about OpenTelemetry. This was part of the leap of faith that I asked people to join me, and that is to say, "Yes, that's old software. Yes, it's written in a myriad of languages using a myriad of data persistence layers and using all kinds of software design patterns. But if we all simply agree to speak this one language, then guess what? We can observe this myriad of heterogeneous applications and a mismatch of technologies and stacks and frameworks. We can actually watch it all working together."
Jessica: So OpenTelemetry in all its myriad SDKs and automatic instrumentation is speaking all the different programming languages to the different applications, allowing all of those applications to speak the same language to you as a developer and maintainer and operator?
Doug: Yeah.
Jessica: So that, from your perspective, you can still work with software in a bunch of different languages.
Doug: Yeah. One of the things that I think I've benefited from with the timing of the OpenTelemetry project is that... I'm also curious to know where you've seen people talk about this before, but I think one of the things that I did that got people excited about OpenTelemetry was showing them a path to easing logging.
I think that going back to what we were talking about earlier, Hello World, it prints out, print statements, log statements, those are kind of like our Hello World. We can immediately see what's happening, I can print to the console, I can log a message somewhere, I can actually see my code running. I think that for most software developers, print statements, log statements are how they usually watch their code working.
Jessica: Yeah. They're the most direct, fast thing to be like, "Just tell me."
Doug: Right. Exactly. "I'm here. This number has this value. This took this long."
Jessica: Mine might say Jess in big letters.
Martin: Oh, the Here, Also Here, Here Three.
Doug: Right. And so I kind of benefited from the fact that the log SDK was available in most other languages.
Jessica: Was that more approachable to people than tracing?
Doug: I think so, yes. In my experience, I think really good implementations of metrics and tracing, usually I've seen that in more mature teams or more mature software organizations.
Jessica: But logs are everywhere.
Martin: And it makes me sad.
Doug: Exactly. So my thinking was if I can tell the story and create a vision of the future, and then work with people to simply start by getting logs into their applications. My first thing was just get the SDK into your repo, start emitting some logs, and watch them land at the upstream API. I think for some people, that was the first hurdle to get over.
My thought was if I could get developers excited about logging, which most of them are, it resonates, their familiar with it, if I could get them to lean into OpenTelemetry, create 'observability' using logging, then I could level them up with very, very minimal effort.
Now I can go back and say, "Hey. I want to talk to you about trace-log correlation, and I want to talk about tracing, I want to talk about distributed tracing, and why these tools and these concepts are really, really powerful. And guess what? You only need to write few more lines of code and you can be there because you have the SDK in your repo. You're already using it, you're emitting that signal, it's hitting a collector, it's being received, processed and exported some place else. All you have to do is a few lines of code and now, look at that, you can see a trace and a span, and guess what? You can see your logs that were emitted associated with that. How cool is that?"
Martin: I do like to talk about logs and spans, and I actually have a sticker that I created that traces are just basically fancy logs because if you start with logs and you start with this idea of, like you say, you build them up and they've got a log line. It's like, "Well, wouldn't it be great if that log line was now a stream of properties so it wasn't just a human readable thing? Now we've got a stream of properties that give us different bits of context about what was happening. Great, okay."
And then they go, "Well, what if we added a duration on there and we started to put in maybe how long it took when you did that message? You wrapped a context and now you said how long it took." "Oh, great. We'll do that. What if we also said what was happening before this happened?" And you go, "Yeah, that'd be really interesting."
Now you've got a trace. So you're kind of walking people to that idea of, well, what you've got is great, but how can you make it better? How can you get those little baby steps of just add this and just add this? Then eventually they're there with, "Oh, we've got a full trace now and that makes this diagram so much better, that makes our debugging experience so much better."
But it's hard from a human perspective to jump from, "I just want to log things in the console," to, "I'd like a big, distributed trace up on a screen somewhere that I can see when things go wrong." That does feel, from a human perspective, a big leap.
So I like that idea of how od you take people from that and just start adding bits on and adding bits on, like you say, you've already got the SDKs there. I think that's one of the powers of OpenTelemetry because it's one SDK. It's the OpenTelemetry SDK which you can get logs, traces, metrics and obviously down the road maybe more out of it. So you've already got the tools there.
Jessica: Right. So they take the Hello World that they have in front of them, and step one is get that up on the console, and then turn it into traces and have those on the console. I really like that. Suddenly I'm much more enamored with OpenTelemetry logs.
Doug: Yeah. I think it's interesting, the way that our journey unfolded and because logs were kind of the last thing to really get baked into the OpenTelemetry specification. But it's where we started and I think that our approach, admittedly, kind of violated that idea of do the hard thing first. So for me, the hardest thing would be distributed tracing because now it's like contract based programming. In the world of logging, I don't necessarily need to talk to anybody else, I don't need to agree on anything else, as a developer I can just print-log, print-log all day long.
Jessica: But when you used OpenTelemetry for those logs, you might not get all that connected.
Martin: The Trojan Horse, OpenTelemetry is a Trojan Horse. I like it.
Doug: Exactly. That was my plan. I don't think I was being evil. I think I was being well intended about this idea.
Jessica: I think we call that strategic.
Doug: Yeah. I felt like introducing, starting out with distributed tracing which, to me, is kind of like the Holy Grail of observability because especially in this world that we live in today... But I thought if I had to ask people to take this leap of faith, go on this journey with me, believe in the vision that I'm painting, and start to introduce concepts like the W3C Trace Context specification and getting into those details I felt like I was going to start to lose people because I would've meant that if I'm a developer on a team with a service, in order for this to work very well I need to get other people to agree to something.
If I could get them started by getting the logging going, getting the SDKs into the repos, then I could go back to those people and say, "Okay. Now let me show you this W3C specification around trace contexts and let me talk to you about why this specification is important. And guess what? Your piece of software can speak this already, natively, it can do it." Then you just flip that bit, the trace context comes in.
Jessica: Magic!
Doug: Yes, exactly. It all starts to work, and I think that it's proven to be successful, this idea of starting with logging and then leveling up to metrics, traces and then distributed tracing. I'm watching that unfold. I wish it was going faster, but it is happening.
Jessica: The whale is lifting off.
Martin: I mean, this all goes back to the good software engineering principles of small, baby steps. Not the Big Bang, that... Well, like you say, let's just get some logs, let's get the SDK, let's get this, let's get this, and you're building up over time which is just good software engineering in general, to try and do things incrementally. I just love that idea, that you're doing things small, incremental steps.
Jessica: You also mentioned do the hard thing first, which is different from small, incremental steps, but I think when I hear do the hard thing first, I think make sure it's possible, make sure that it's possible to correlate all these different emissions from all these different systems. OpenTelemetry has done that work you. So at Uplight you have all of these disparate softwares running and talking to each other now, and you're not trying from what I hear, not trying to move them to a single platform. Is that true?
Doug: Yes and no. There are certainly parts of our ecosystem of software where there is some duplication and some old tech that we would like to collapse to a new go-forward tech stack for that service or that set of features. But because the applications are complementary, we don't necessarily need to go through a very risky, large effort to put everything into one place.
There are definitely places within the business where we are really leaning into this idea of platforms and trying to collapse features that are similar into larger platforms. But for the most part, the different stacks, the different applications and the teams can continue to develop code the way they were before.
Jessica: Okay. So there's an evolution that is planned, as in, moving forward as we gradually consolidate and further integrate, there's a platform for that. But in the meantime, you've asked all these different software bits to effectively conform to an API, except it's an operator interface in the form of OpenTelemetry.
Doug: Yeah.
Jessica: I think that's a great vision, and very... It feels like a responsible vision. You're not knocking down all the trees to build a highway.
Doug: Yeah. And I think that as the companies came together and as we started to see pieces of software from one company talking to pieces of software of another company, being able to observe that is just incredibly important. There's parts of our observability journey where I get really excited because I'm a huge advocate for the developer experience.
I'm constantly trying to run interference for our developers and make sure that they're protected, so that they can have all the cognitive load back to work on their work. Hopefully, to enjoy the craft while they're doing it.
So part of my passion around observability is to elevate the developer experience, protecting the developer and let them do their job. But also for Uplight, and I think a lot of companies but especially for us, being a company of companies, with complementary applications and products it's really, really important for us to be able to observe how our software is interacting with each other.
Jessica: Right. Because if you can observe all the consequences of your action to the wider system, then you can act responsibly in the wider system.
Doug: Yeah. And I think that in the spirit of our mission, I think it's also just paramount that Uplight executes observability really well. We need to be good humans, residents of the planet, and make sure that our code is operating as efficiently as possible. I think like a lot of companies we have a long ways to go in improving the efficiency of our code and our software systems, but I think that Uplight's ability to achieve that mission and stay true to its core...
It would be almost impossible to do without really solid observability. I couldn't think of how you could achieve the goal of having massive amounts of software, all operating as efficiently as possible, having the smallest footprint on the planet, without really being able to see what your software is doing when it's running in production, at scale, all day, every day.
Jessica: I have a feeling that that thing you noticed, that it's almost impossible to achieve our goals without observability, is something often felt by people once they have it. And yet, people who've never had it before are like, "What are you talking about? I'm already doing this job."Which is true, but it just feels so much easier when you can see what you're doing.
Martin: Well, it's similar to tests, isn't it? The people who didn't have unit tests, the way that they made sure their application ran was to run it locally, start doing some stuff, and then when they start working with tests and they go, "Well, this is just so much easier, I can do things so much quicker. I can verify that what I thought was going to happen, happens quicker."
Jessica: "My tightrope is so much wider."
Martin: Yeah. It's like, "How did I work before I had this?" Then they go, "Right, okay. Now I've done that." Then they go to wider testing. It's the same with any of these concepts, how do I work without an IDE? How do I work without X IDE?
Jessica: How do you work with a line editor?
Doug: Slowly and very carefully, after you've written it out on a piece of paper. One of the things that I think that I get to benefit from is being an older software engineer, computer programmer, is that in the last several years I've really started to finally understand what it means when we say things like, "We want our architecture to be intentional because we want it to be a sponsor of innovation."
Jessica: You want your architecture to be a sponsor of innovation?
Doug: Of innovation. And so one of the things that... You mentioned the testing and it got me thinking along this thread because this was also part of my pitch and my ask of Uplight engineers to follow me on this journey, was this idea of having your architecture be a sponsor of innovation.
One of the ways I think about it in the world of observability, is this idea of when you get your application emitting the signals, logs, metrics, traces, distributed traces, spans, all the things and you can see them all being tied together and you can see the interaction between your software and another piece of software. To me, that's a perfect example of how your architecture can sponsor innovation.
If I can ship code and I can immediately see how it's behaving in a runtime environment, to me that does something cosmically for the developer. It creates a sense of trust and it creates a sense of knowing that I can ship software safely, reliably, predictably. I can see how it's going to operate immediately, I'm going to be told when it does something that I wasn't expecting, and I can just lean into that trust.
I think for me and I think for other developers, it removes some of the noise and just allows your brain to be more open and think more creatively about how you want to solve the next problem that you're working on. And so to me, part of observability and baking it into the way that we do things at Uplight is really about sponsoring innovation.
Jessica: Wow.
Martin: That's deep.
Jessica: I was going to ask another question, but you know what? That's just such a beautiful place to end.
Martin: Thank you so much for being on.
Doug: Yeah, no problem. I think it's probably pretty obvious that I actually am very passionate about this stuff and I really do enjoy writing software, even to this day. I love doing it.
Jessica: So if our listeners want to find more of your wisdom, where can they look for you?
Doug: I'm kind of cringing, I'm going to say LinkedIn.
Jessica: LinkedIn does not suck.
Doug: No judgment, no comment, no editorial, no color around that. Just I'm on LinkedIn. LinkedIn/DougRamirez.
Jessica: You can find that link in the shown notes. Thank you so much, Doug.
Doug: Thank you, all. It was very nice meeting you.
Content from the Library
O11ycast Ep. #75, O11yneering with Daniel Ravenstone and Adriana Villela
In episode 75 of o11ycast, Daniel Ravenstone and Adriana Villela dive into the challenges of adopting observability and...
O11ycast Ep. #74, The Universal Language of Telemetry with Liudmila Molkova
In episode 74 of o11ycast, Liudmila Molkova unpacks the importance of semantic conventions in telemetry. The discussion...
O11ycast Ep. #72, Mobile Observability with Hanson Ho of Embrace
Episode 72 of o11ycast explores the world of mobile observability with Hanson Ho, Android Architect at Embrace. Hanson unpacks...