NOV 21, 2023

35 MIN

Ep. #65, Simplifying Tracing with Sophie DeBenedetto of GitHub

GuestsSophie DeBenedetto

light mode

about the episode

In episode 65 of o11ycast, Jess and Martin speak with Sophie DeBenedetto of GitHub. This talk explores observability at GitHub, the value of tracing, the BEAM ecosystem, the Elixir language, and insights on leveraging observability at scale.

about the guests

Sophie DeBenedetto is a staff software engineer at GitHub working on the observability team. Sophie is also a co-host on the BEAM Radio podcast, co-author of Programming Phoenix LiveView, and a graduate of the Flatiron School.

show notes

about the episode

about the guests

show notes

transcript

Sophie DeBenedetto: I think my mentality towards observability is probably not dissimilar from those of a lot of folks that came into development through web development. I, myself, am a career change programmer. I didn't study computer science or anything like that in school. I did a programming boot camp, oh gosh, seven or eight years ago at this point.

I attended the Flatiron School in New York, and I learned Ruby On Rails and learned how to build web apps. When you are learning how to build web apps in order to get a job, you are focused on building features for customers and users in order to get that first job, and that was the stuff that was the most exciting to me at the time, building these features to solve interesting problems from a user point of view.

I didn't really have a systems oriented mentality when I was approaching the applications that I was building, and I didn't have any experience, I think, when it comes to really operating systems in development. I think that this is a problem that many organizations still face, I think it was probably even more of a problem seven or eight years ago when tools like Honeycomb and even Data Dog and others were new or didn't exist yet.

We had this really big gulf between developers, web developers in particular, and DevOps which was this big, scary, mysterious side of the world that I certainly didn't know anything about. I was in the habit, and I think many people can get into the habit of throwing your code over a wall and now it's out in the world, it's out in production and it's out of your hands.

That is, I think, the biggest change between how I used to think and how I think now, and I think that's because just my own experiences have changed, what I work on changes, what I'm interested in changes. I've learned a lot more, I've worked on more production systems. But the ecosystems have changed and the tools have changed, and now there doesn't have to be quite as big a gap between folks that are doing feature development and folks that are building systems, operating systems, maintaining systems and observing systems.

Now I think it's much more common to consider that it is absolutely the responsibility of your average feature developer, web developer, whathaveyou, to understand how their code operates in production, to be able to observe their code in production and to be responsible for observing and operating their code in production. It's a smaller ask than it probably was eight years ago because we have the tools for it and the practices and processes and learning that have developed out of that.

So I don't necessarily think that's a solved problem for everyone or for every organization, and even organizations that have solved that problem in many ways are still going to struggle with empowering their developers to observe and own their code in production. But that I think is the biggest mentality change, that I don't just have to throw my code into a black hole and hope it works and hope it's someone else's problem if it doesn't, but that I can understand what it means to instrument my code.

I have the tools available to me to observe it and I can learn from what I'm looking at, I can make sense of the information that's coming out of my services in order to make them better, make them more available.

Jessica Kerr: Fantastic. All right, tell everybody who you are.

Sophie: Absolutely. Hi, I'm Sophie DeBenedetto. Currently I'm a staff engineer at GitHub on our observability team, so observability is a topic that I think about all day, every day, at least five days a weak. I'm also involved in the BEAM programming community, in particular on the Elixir side of things. I'm the coauthor of the Programming Phoenix Live View book that's out in beta on Pragmatic Bookshelf now.

I've done it with Bruce Tee who is an excellent writer himself, and I've learned quite a lot from. I also have a podcast if you can't get enough of listening to me talk by the end of this. You can check out BEAM Radio where together with Bruce and a number of other excellent cohosts we talk about all things BEAM and we have a lot of fabulous guests on whenever we can to talk more about and learn from their experiences within the BEAM ecosystem.

Jessica: Martin, this would be a great time for your question.

Martin Thwaites: Yes.

Sophie: What is it? What am I talking about?

Martin: So I lost count of how many times you mentioned the word BEAM, and I'm really interested as to what BEAM is. I've been doing research today to try and understand what BEAM is, or more specifically, the live view stuff, the book that you've wrote and all that kind of stuff. I'm really interested, educate me, tell me what BEAM is.

Sophie: I'm thrilled that you're interested, I hope that your listeners will be a little bit interested or at least that I'll pique their curiosity. The BEAM is the Erlang-vm, the Erlang operating system but there are a number of other languages that run BEAM, including Elixir which is the language that I learned in order to access BEAM programming and other languages as well that I could go on and list.

What is so great about the BEAM? And what is so great with working with languages on the BEAM? The BEAM implements the Actor Model, and if you've heard of Erlang and you've heard of the BEAM you've probably heard of Joe Armstrong who was one of the cocreators of Erlang coming out of Ericsson way, way back.

With the Actor Model we think of and we write code that models the world in terms of message passing between actors, and this allows a lot of process based communication to grow up organically in BEAM languages and it is the philosophy that underpins OTP, which is a suite of libraries that basically comprise a lot of the code you're going to write in Erlang or Elixir. OTP stands for Open Telecom Platform, which is nothing to do with what it is, don't worry.

Martin: Oh no, not another one.

Jessica: No, that's a historical name and it's about telecom, like phones.

Sophie: Yes, exactly right. Because it came out of Ericsson labs in the 80s. Ericsson, as you know, is in telecommunication, it began with phones. So what Joe Armstrong and the folks that he was working with at the time found when they were at Ericsson they were trying to solve problems of fault tolerance, essentially. They wanted to make sure that you could use your phone and that if you called someone and it disconnected, the whole world wouldn't stop and come crashing down and affect everyone else talking on their phones all over the place.

They set out to solve that problem, and what they ended up with was essentially the Actor Model. What they ended up with was a VM, a language, a framework and an ecosystem that not only solves for problems of fault tolerance but also allows for massive scale, massive concurrency which is why you have WhatsApp, for example. That's a really good example of a popular app that needs to handle massive scale, massive concurrency, lots of fault tolerance and it does it because of Erlang, and it does it because of BEAM.

So OTP is a phrase that people throw around a lot, it does really confuse people because it stands for Open Telecom Platform, which is nothing to do with really what it is. It's just a bunch of libraries and modules that you take advantage of, probably without even really realizing if you're writing Erlang, if you're writing Elixir.

Jessica: It's the standard library for the BEAM, right?

Sophie: That's exactly right. Those are the primitives that give you the fault tolerance and the concurrency and the process based communication models that you will use if you're writing those languages.

Martin: I think it's probably really confusing then when you start talking about OpenTelemetry , which also uses OpenTelemetry Protocol, or OpenTelemetry Line Protocol, which is OTLP.

Sophie: Exactly, yeah.

Jessica: Oh, maybe that's why it has the L.

Sophie: That's actually probably right.

Jessica: Right? Because it's OpenTelemetry, the Ellis Island Protocol. But if we just called OTP it would be super confusing, because OTP is the standard library on the BEAM which is extremely useful, but completely different.

Martin: Yeah. When you look at the libraries now, there's obviously OTLP, not to be confused with OLTP which is something completely different.

Sophie: Wait, what is OLTP?

Martin: Online Transaction Processing. It's the whole database modeling stuff. But they say OTLP which is the OpenTelemetry Line Protocol.

Jessica: But it's not Line. Austin says, and Austin was around for some of this naming, that it's the OpenTelemetry Protocol.

Sophie: That's what I thought it was.

Martin: But yeah, then the libraries are all OpenTelemetry Protocol with the O and the T and the P capitalized. So none of it makes sense.

Jessica: It's fine, it's fine. Acronyms are also names.

Sophie: Exactly right. Yeah.

Martin: Let's talk a little bit about observability then. So on the BEAM is observability a thing yet?

Sophie: Yes, absolutely. One of the things that I love about the BEAM as a VM and the languages that run on the BEAM, especially Erlang and Elixir, is they really treat observability as the first class citizen that it is and always have. Yes, exactly. It's really lead with observability from first principles and one thing that I'll call out, the Erlang Ecosystem Foundation which is a not for profit which I'm actually on the board of directors on, full disclosure, and we're an elected board, everything is public and we're funded generally by our sponsors.

We run and support a number of working groups that just have as their reason for being, efforts to support and move forward different aspects of the BEAM community and the BEAM ecosystem. So we do have an observability working group, which is comprised of a lot of the folks that maintain the OTel libraries for Erlang and who are concerned with other features and aspects of observability on the BEAM. Very much a thing in Erlang and in Elixir.

So yeah, we've got OpenTelemetry libraries for Erlang. I think Elixir is lagging behind a little bit in the OTel space, I would love to see some more investment there. But one of the features in the Elixir language that I really love and encourage folks to work with is the telemetry module that you have access to in Elixir. What that allows you to do is basically bake tracing into every Elixir application without needing to pull in third party services, up until that point you're ready to actually export those traces somewhere else.

But it leverages S Tables which is Erlang term storage, which is in memory cache and you register your telemetry handlers and then throughout the course of your code you can view the syntax. It's probably very familiar with folks from tracing where you wrap bits of code in a telemetry span and then the attached handlers are invoked, and in those handlers is where you can centralize your logging logic, your metric commission and your tree submission, exception reporting.

All that stuff can happen in one, central location and that's just baked in to the Elixir language, and so any Elixir library author is going to reach for that telemetry module and make sure that telemetry is baked into their libraries from day one. Then you have access to that same paradigm and those same patterns if you're an application developer.

Martin: See, I love and I wish more languages would think about primitives for tracing data. It's been in .Net for, what, seven or eight years now. I wasn't aware that Elixir had the same sort of thing in there, but this idea that we need to separate, don't we? This idea that the language should be observable.

This is how you do observability in that particular language. How you export it is a different thing, and that protocol. I've always said that the protocol in OpenTelemetry is the best thing about OpenTelemetry. The SDKs are great, but the best thing about OpenTelemetry is the protocol itself.

Jessica: OTLP.

Martin: Not OTP, because that's a different thing. OTLP.

Sophie: There's an L in there.

Jessica: Yeah. Well established.

Martin: Yeah, but this idea that you shouldn't need to pull in a third party library in order to be able to instrument your library because then other people have to take dependencies on it, and then we end up in DLL hell, which is the .Net term of versions and all of that kind of stuff. So it's amazing that that's not built in at a low level.

Sophie: Yeah. I think that's absolutely right, and that's one of the things that I love about Elixir as a language is that the barrier to entry is very low so you don't have to learn another library, you don't have to learn all about a dependency and pick a dependency in order, in this case, to bring in tracing because it's baked into the language itself.

That's something that I think is a running theme through Elixir. It solves for so many common development problems so elegantly and so accessibly, and that's why I think it's such a joy to work with and it's very accessible to beginners.

Martin: Do you think because it's in the language that you do it more in development?

Sophie: Absolutely. Yeah.

Martin: In the development phase?

Sophie: Yeah, absolutely.

Jessica: By it, do you mean tracing?

Martin: Yeah.

Sophie: Or just instrumentation overall?

Jessica: Yeah. And Elixir as a language, it descends from Ruby On Rails.

Sophie: In a sense. So it's built on the BEAM and it's kind of built on top of Erlang in many ways, it interops with Erlang but it is very much inspired by the experiences of the creator, who was Jose Valim, working on the Ruby On Rails core team.

I think it came from him wanting to be able to bring real time web development into Rails more seamlessly and finally feeling that the obstacles he was running up against were not in the Rails framework, but in the Ruby language itself and he wanted-

Jessica: And the run time.

Sophie: ... a better run time, exactly. At run time that would provide for him what OTP provides, which is fault tolerance, massive concurrency and fundamentally the Actor Model.

Jessica: Right. Yeah, so Elixir is friendly because Jose Valim is friendly.

Sophie: The feelgood factor, right? Yeah, just how he used to say, what is it? MINASWAN for Matz, Ruby is nice because Matz is nice. Yeah, the Elixir community's openness and the way that it extends welcoming arms to beginners, supports beginners to learn.

I think it definitely comes from Jose who is famous for, if you interact with him through open source or through any of these dev forum-type things, he'll do his trademark rainbow of all the hearts emojis to respond to people's questions and contributions and feedback. I think that really says it all in setting the tone.

Jessica: Wonderful. So you mentioned that Elixir has telemetry built in, and yet you said it lags behind on OpenTelemetry. Can you export from the built in telemetry module to OTLP?

Sophie: Yeah, absolutely. So the built in telemetry model is totally agnostic when it comes to what you are trying to do with the code that you're instrumenting. It basically provides an API through which you can emit telemetry signals from your code, with a standard pattern and in a centralized location so you don't need to reinvent the wheel every time you're thinking, "Do I need to log here or emit a metric here or report an exception?"

You basically just wrap everything in a telemetry span and then your handlers for those telemetry event emissions will be invoked for you. But what you do at those moments in time is totally up to you. Do you want a log? You can log here. Do you want to emit a metric to Data Dog? You can do that. If you want to emit a trace to Honeycomb or Data Dog API or light stuff, you can do that as well.

It just provides a place for you to put your telemetry emission code and it's opinionated about how you implement that instrumentation within your code. So if you want to emit traces that comply to OpenTelemetry you can absolutely do so with the help of the telemetry module within your Elixir application. I think where it lags behind is the SDK and just the community of users around OpenTelemetry within the Elixir community.

I do think there's a lot of strong representation for observability overall within Elixir, in part because of this telemetry module but I don't see a lot of discussion of OTLP, shall I say, explicitly in Elixir at this time.

Jessica: Okay. It's interesting that because there's already an appreciation for observability, OpenTelemetry is not as big a deal.

Sophie: I have found that to be the case in a lot of places and in a lot of communities, and I'm curious to hear your guys' thoughts and experiences on it. I feel like observability overall isn't necessarily such a hard sell to individual engineers or teams or organizations for obvious reasons. But then getting people to converge on this standard, even though it is the industry standard, you're kind of dealing with a different beast.

Jessica: Speaking of observability as not a hard sell, do you find it harder to sell logs or traces?

Sophie: A loaded question.

I think that logging is easier for many engineers to wrap their heads around, and I think that it's the first experience that a lot of engineers have with the concept of observability, and I'm definitely speaking from personal experience here.

Jessica: In the bootcamp you probably used logging, right?

Sophie: Exactly. To the point where we would've talked about observability at all because you're there for three months and you're trying to learn enough to get out there and get hired. We learned about logging, and logging is extremely straightforward conceptually. A thing happened, I will emit a piece of information that indicates that that thing happened, decorated with other pieces of data that will help tell someone what went down and why. So that, I think, is what many people reach for first when they think they want to be able to observe the behavior of their systems.

Jessica: But it turns out that in production you're wrong about that piece of information telling someone what went down and why at scale.

Sophie: Exactly. At scale is the problem.

Jessica: And out of the context that you have right now when you just created that log statement.

Sophie: Exactly. Another piece of complication is when you're dealing with privacy concerns, and especially if you're moving into European regions and you have to think about the various restrictions and legal restrictions around emitting certain types of data. People really do want to log everything and want to log the user ID and the email or whatever, so that they can have what they need at their fingertips to respond to that customer support request that says, "Hey, I clicked a button and a thing didn't happen."

But I think the thing that's challenging with observability as a concept, especially when it comes to encouraging people to leverage traces and metrics over logs is that generally we find telemetry signals to be interesting and informative in aggregate or at scale which is not what logs are great for. Logs are great for telling someone that a specific thing happened. But when that customer submits a complaint, and you open up that Zendesk ticket, let's say, the pressure is on.

Your lizard brain immediately is like, "Oh my god, a thing went wrong. I need to tell Stacy exactly what happened because she reported a problem." And I think that's this very human mismatch between the way that we want to observe our systems and the way that we sometimes feel we need to be able to observe them to respond to urgent issues.

Martin: I think logs in production have always felt like a cludge on top of, observability has kind of felt like a cludge on top. Like, "I had a load of logs that I was using while I was doing development, so I'm going to try and use those to understand production." So it's kind of accidental observability, if you like, that you're trying to get in production because I wrote my log lines that say, "I was here. I was here. I was also here. Sophie was here. Martin was here."

And those things end up in your production logging, and then you try and aggregate those, you then structure those logs because you don't want to do parsing anymore. So you structure them, you add more context and it just becomes accidental that that becomes your observability. Whereas, when you start doing things like traces and metrics you're much more intentional about, "These are things that are important to me."

I think there's a mismatch then about during the development cycle, that people think, "Well, I need to know I was here, I was here, I was here, during the development lifecycle. But I can't do that with tracing."But you can. And that really gets me really deep down.

Jessica: It goes back to what Sophie said at the very beginning, the difference between wanting to implement features for users versus a systems oriented mentality.

Sophie: I think that's right, and I think the more complex our systems become, the harder it is to think with a systems first mentality because it's harder.

If you've got one web application, you're great. If you've got two, three, four, five, six, seven, eight, nine, ten, it's becoming a little bit more complex but you could still sit there reasonably with a piece of paper and a pen, and map out the various pieces of your infrastructure.

Jessica: You can draw it on a whiteboard in 10 minutes.

Sophie: Exactly. If you've got hundreds of services, it's impossible for someone to hold that in their head at a given point in time. So how do we leverage observability to give us that level of visibility to assist us in developing a consensus and understanding of what our system looks like. Even if we accept that it might be impossible to visualize it perfectly for everyone.

Jessica: Consensus. I like that word. So is that what you were working on at GitHub?

Sophie: Yeah, in part. It's something that we've been thinking a lot about and we just had a book club discussion that was exactly on this topic, within our team and with some other observability related teams. But one of the challenges we face at GitHub is that it is a big organization, we do have many services, we have many, many engineers and no one person does have it in their head and no one could have in their head this perfectly correct and complete understanding of what our very distributed system looks like.

So how do we provide observability tools and services that allow people to understand their corner of the ecosystem and understand it in relation to its dependencies both upstream and downstream. This is where we're beginning to find tracing to be really critical and essential for us. It's not that tracing is a new concept or a new tool that we've exposed to hubbers, but we are working with Data Dog APM now and it's giving us a super charged look at tracing and its service map feature is also a really cool way for us to actually visualize our system.

If you were to look at their service map page, like the entry point, it is still kind of like a little bit useless to us because we actually have so many services that, looking at them altogether, is not that helpful. But you can drill down into your service and then see the dependency map of the thing-set or touch it immediately, and that is a much more useful view of the world.

Jessica: It sounds like logging is obvious, but maybe tracing is a skill.

Sophie: I think that's well said, yeah. I think it's hard to learn. I've been on the observability team at GitHub for, I don't know exactly, over two years or close to two years, or maybe two and a half years. Somewhere between two and three years.

Martin: You have scars.

Sophie: Yes, exactly. I still feel like I'm super weak at tracing. When our users come to us with questions on how to best use tracing to understand and solve for really hard problems, identifying certain bottlenecks and eliminating them, I'm still not the expert on our team personally, that can really walk them through every step. I still feel like I have a lot to learn. As a writer and as an educator myself, I find that to be frustrating.

I want to be able to understand it well enough to build these guides and make it accessible to other engineers and that's something I feel like I'm only just starting to do after a couple of years working on our observability team. If you have any tips about how to make tracing easy and accessible, I'm certainly all ears.

Jessica: We have opinions.

Martin: Opinions, not tips. Yeah. I feel like we should do a call in for that, Soph. If you've got observability tips then call in now.

Jessica: You can tweet at us at @O11yCast.

Martin: So how do you normally handle those kinds of questions when somebody comes up and says, "I've got a particular problem in a particular service. You've been evangelizing tracing, tell me your ways"? How do you normally go about that? Is it all a conversation, looking at code? How do you work with them on that?

Sophie: Yeah. I think that's a really great question because I think people take for granted that if you get a couple of experts working in your organization, then these kinds of problems that all of your teams are facing will be solved or can be solved. We have an observability team at GitHub, we have a group of people who's only job is to understand these tools deeply and to make them available to other GitHub engineers.

But it's still really challenging for us to meet that demand and meet that need from our colleagues sometimes, and I think the biggest challenge we face is just a scale issue. There's like 1,000 or something GitHub engineers but there will only ever be so many observability team members. The answer that I have for this is mostly process based and I still think that it's important to share because I think this is a hard problem, like I said, for organizations to solve.

So we've created a couple of different ways through which our colleagues can get support from us and we try to do things as async as possible to begin with, for a number of reasons. GitHub is a fully remote organization, always has been or at least has been since I joined a number of years ago. What that means for us is that we don't just do things on Zoom, let's say, instead of meeting physically in a room together.

But we try to focus on leveraging writing as much as possible, documentation, we use GitHub issues and project boards for all of our processes and we do a lot of design decision records, architectural decision records, those kinds of things, to really leave a strong paper trail when we make decisions and when we build that documentation we feel the same way.

So when other hubbers have questions for us about how to best leverage tracing to solve certain problems, for example we asked them to go through our team support process and they open an issue and they fill out a form, and we take it from there. That doesn't necessarily mean that we don't have a conversation with them. We'd love to have a conversation with our colleagues and we love to pair with them whenever we can.

But we try to lead with async first so that we can make sure to produce artifacts from out and build out our knowledge base on our intranet to leave these things for other engineers for posterity. That's something that we are just starting to do a little bit better with tracing as we've been moving more into the Data Dog APM world, so I actually have open in a separate tab while I talk to you guys, I've got my three PRs open to continue to build out our tracing user guide and recipe book that we've been really putting a lot of effort into over the past couple of weeks.

Martin: I like the idea of a recipe book. What do you mean by recipe book? What does a recipe book mean to you for this context?

Sophie: So we have our Basic Tracing User Guide, which covers, "This is how you make sure that your application is emitting OTel compliant traces to our vendor of choices, which happens to be Data Dog APM at this time. This is how you can look at traces in development, and these are the attributes of your trace and let's go into the Data Dog UI and break down what's useful. These are the daft words and so on."

That is a static guide that's probably not going to change much over the years, maybe we'll add to it if we enable more features in our vendor at that time. The recipe book which doesn't exist yet because I'm going to totally start it right after we stop talking today. No, seriously, that's what I'm going to do the rest of today. We are trying to pay attention to interesting ways that GitHub engineers are using traces to solve problems whether it's to assist in incident remediation, to identify bottlenecks and resolve them, or to just generally improve the availability of their systems.

We're trying to extract their stories and document them in this recipe book, so we'll probably have a recipe on things like, "Here's a story and some guidance about how a particular team found that they were making 10 separate web requests to the same endpoint with the same client, instead of reusing the same connection, and you can identify problems like this as well."

Jessica: Oh yeah. What does that look like in a trace with pictures?

Sophie: Exactly, all the pictures. Yeah, so identifying those common problems that traces are designed to help us identify, like too many web requests that we can optimize, too many database requests, non performant database queries that are taking too much time. Basically, we provide this guidance and we say traces are good in incident remediation in this way. Traces are good at helping you identify the dependencies of your system and the bottlenecks, but those are only words to people that haven't had an opportunity to really dig into some of these-

Jessica: Right. You need the story.

Sophie: You need examples and you need the story.

Jessica: And a picture. Yeah, and when you see the picture and then you show, "Oh, here's how it shows this SQL statement so that you can see what tables it was accessing, but you can't see the parameters that were passed in because that would be PII."

Sophie: Exactly.

Martin: I think there's a whole thing with tracing which it has to be seen to be believed, as to how much better it is than logs in specific contexts. Where you have context and you can use tracing, if you look at logs versus tracing in that same context it's night and day. But you have to see it, you can tell people about correlation, you can tell people about the individual bits of what a span context is and durations and all of that kind of stuff. But until they see it and they then go, "Oh, all right. Yeah. I get it now."

Jessica: It parallels the move from logs to traces because logs make sense to you when you write them in your context of, "I'm developing this feature." But traces make the software self documenting. They lean into async and remote and leverage writing as much as possible because the software is documenting what happened in a way that other teams can read and not just you.

Sophie: Yeah. I think that's really well said, that concept of it being self documenting, and I think a really important piece to that puzzle is to enable autoinstrumentation wherever possible because it's not reasonable that somebody should have to add a span event for every step of a pipeline for their code flow for request handling. Certainly you should be able to add specific span events if you, the developer, feel that a particular interaction is meaningful that wouldn't otherwise be captured.

But it's got to be true that if you make a database call, a span event is going to be emitted for you. It's got to be true that if you're using this HTTP client to talk to this other service, that the same is true and so on. That's definitely something that can be challenging because not all of the libraries you're going to reach for may support that type of instrumentation. They may support them in ways that aren't compatible with one another. They may support them in ways that aren't necessarily compliant with OpenTelemetry .

Jessica: Oh yeah, it needs to be OpenTelemetry because that's all about the compatibility with each other.

Sophie: That's the whole point that it exists, because it doesn't help if you have Client Library A talking to Service B, and they don't recognize the same trace context header. Maybe one of them is using the OpenTel W3C trace parent header and the other is using the legacy Lightstep OTrace header and then you've got to write some translation layer in there. So converging on OpenTelemetry has been a really big push for the observability team at GitHub for the past couple of years, so that this autoinstrumentation just works for people so that they can get tracing out of the box and they can make use of it.

Jessica: And that brings those hundreds of services and hundreds of teams, I imagine, together at the interface and when they need to know it because that's the thing about nobody can hold this whole system in their head, right? You have to be able to drill into the pieces of the system that matter, when it matters. That's what observability gives you. Sophie, this has been wonderful. We talked about your personal passion about the BEAM and Elixir, and then we talked about observability at GitHub which it sounds like you're doing really interesting work, and talk about observability at scale. Thousands of developers.

Martin: And the recipe books that you're going to create as soon as this call is over.

Sophie: Yeah. By the end of today, they're all going to exist.

Jessica: By the time you're listening to this episode, that recipe exists at least in a pull request. Yeah, I like how you're taking people's stories and turning them into something that can be consumed more widely than just the small audience that was listening.

Sophie: Yeah. That's the goal.

Jessica: And it's the goal of our podcast. When people want to learn more about you, where can they go?

Sophie: Yeah. You can always find me on Twitter. My name on there is @SM_DeBenedetto, and I'll also shout out that I love to hear from people if they want to talk about anything observability or Elixir related and in particular if you feel like you have an Elixir book in you of any length or size, I'm also the Elixir series editor at PragProg, which means I am here to support you if you would like to even consider writing a book or get something submitted to our proposals committee. I would also encourage folks to check out, and I will share this link and ask you guys to share it along if that's okay-

Jessica: Oh yeah, we'll put it in the show notes.

Sophie: Awesome, thank you. Yeah, the Observability Working Group for the Erlang Ecosystem Foundation. Open to everyone who cares about observability and is interested in the BEAM. You are more than welcome to join the Slack channel and linger and just say hi. People are very welcoming and friendly there, and certainly if there are things that you want to work on or get involved in, then that's what we need to see more of. So yeah, please go and check that out.

Martin: Awesome.

Jessica: Thank you so much.

Sophie: Thank you, guys, so much for having me. This was really fun.

Content from the Library

Visit library

May 21, 2024

Podcast

O11ycast Ep. #70, Evangelizing Observability with Dan Gomez Blanco of Skyscanner

In episode 70 of o11ycast, Jess and Martin speak with Dan Gomez Blanco of Skyscanner. Dan shares his expertise on evangelizing...

Apr 13, 2023

Podcast

O11ycast Ep. #59, Learning From Incidents with Laura Maguire of Jeli

In episode 59 of o11ycast, Jess and Martin speak with Laura Maguire of Jeli and Nick Travaglini of Honeycomb. They unpack...

Oct 7, 2021

Podcast

O11ycast Ep. #44, Examining OpenTelemetry with Vincent Behar of Ubisoft

In episode 44 of o11ycast, Liz and Charity speak with Vincent Behar of Ubisoft. Together they pull back the curtain on...