![O11ycast](https://cdn.sanity.io/images/50q6fr1p/production/69438e658cf62a71c9335b5647e92521f3bfad33-3000x3000.jpg?auto=format)
Ep. #78, Exploring OTTL with Tyler Helmuth and Evan Bradley
Episode 78 of o11ycast examines the world of OpenTelemetry Transformation Language (OTTL) with Tyler Helmuth and Evan Bradley, the maintainers behind this innovative framework. Discover how OTTL enables powerful telemetry data transformations, its practical applications, and what lies ahead for OpenTelemetry's transformative ecosystem.
- Tyler Helmuth is an engineer at Honeycomb and a core contributor to OpenTelemetry. Since 2022, Tyler has been a driving force behind the development of OTTL and the transform processor, focusing on making telemetry data transformation accessible and powerful for developers.
- Evan Bradley is an engineer at Dynatrace and a passionate advocate for OpenTelemetry. Involved since 2022, Evan has been instrumental in maintaining OTTL and enabling enterprise users to leverage this tool for solving complex observability challenges.
Episode 78 of o11ycast examines the world of OpenTelemetry Transformation Language (OTTL) with Tyler Helmuth and Evan Bradley, the maintainers behind this innovative framework. Discover how OTTL enables powerful telemetry data transformations, its practical applications, and what lies ahead for OpenTelemetry's transformative ecosystem.
transcript
Tyler Helmuth: OTTL is a tool, I guess, it's a framework, that's primarily used in the transform processor.
Jessica "Jess" Kerr: What does OTTL stand for?
Tyler: It stands for OpenTelemetry Transformation Language.
Jess: And you said it fits in the transform processor?
Tyler: That's right. So in the OpenTelemetry collector, which we can get into... OpenTelemetry Collector is a tool in OpenTelemetry all about processing, or I guess, receiving, processing and exporting your data.
One of those processors is the transform processor. It allows you to change your data in almost any way. And the reason it can allow you to change your data in almost any way is because of OTTL. OTTL is a framework that gives you access to every single field in the OTLP payload.
So your Span names, your data point values, your log bodies, attributes for resources, scopes, spans, span events, so on and so forth.
OTTL gives you access to all that data and the Transform processor uses the OTTL statements that you write to then go actually change your underlying telemetry.
Jess: So you put these little OTTL statements, they're like commands, in the YAML for your OpenTelemetry collector and then it does stuff?
Tyler: Yes, that's exactly right. It's not a programming language, but you write strings and they're interpreted by OTTL and then the transform processor can use that interpretation to go change the underlying telemetry.
And you can do that for any signal. I guess that statement's not totally true anymore 'cause we don't support profiles quite yet, although there are open issues and people are asking that we add support for profiles.
But in general, the theory is that if there's an OTel signal the Transform processor or OTTL will be able to modify that information.
Ken Rimple: So how long has OTTL been around? Is this kind of a newer feature or has this always been part of the transform part of the module?
Evan Bradley: So OTTL has been around-- Tyler would know a little bit better than I would, but it's been around for I'd say at least three years, maybe even going on four at this point, probably since about, I'd estimate 2021.
Tyler: Yeah, I think the original creator proposed in late 2021 around like, November, December timeframe, maybe even a little bit before that. And then the first implementations were done right at the beginning of 2022, I believe.
Ken: And before that you would've had to write what like code in some sort of regular language and hook it in?
Evan: Exactly. So you would need to note that the collector does not support like live reloadable plugin. So you can't load in... Once you've compiled a collector binary, you can't load in code into the collector. You need to do that at compiled time.
So if you wanted to do these sorts of custom transformations, you would either need to find another component inside of your already built collector and use those to do your transformations, or you'd have to write that yourself. OTTL allows you to do a hopefully useful but slightly more limited subset of transformations on your data without needing to recompile the collector.
Jess: Nice. Okay. So who are you? And how long have you been involved in OTTL?
Tyler: My name is Tyler Helmuth. I am an engineer at Honeycomb. I've been working on OTTL in the transform processor since I got started in OTel, which would've been March of 2022.
Evan: And I'm Evan. I am an engineer at Dynatrace. I've been involved with OTel since roughly June, maybe July of 2022. And I've been involved with OTTL roughly just as long. And then Tyler and I are the primary maintainers of OTTL within the collector.
Jess: Nice. What drew you to this particular component?
Tyler: When I was getting starting in OTel, it was a really good place to volunteer to do work. So I wanted to get involved in the collector and going to SIG meetings people were asking, "Hey, we need to," like, "this is a really cool component, we need help working on it."
And I was like, "Well, I'll help work on it."
Jess: Like the Transform processor?
Tyler: Yeah, transform processor, OTTL. So I volunteered to work on it because it's always great to be able to solve problems that people are asking for when you're trying to join a community.
And then also, I just really like the concept of it, so the standardization of the ways to transform the standardization of access to all fields, I just like saw the power of that type of capability and so I was drawn to it.
Evan: Yeah, I'm pretty much in exactly the same position as Tyler. It's easy to get started, it's kind of isolated and you don't need to have deep, deep knowledge of, you know, observability or all the different pieces of the collector.
It's fairly self-contained. And we see that a lot with new contributors as well. They'll come in and they'll be able to implement a particular function inside of the standard set of built-in OTTL functions, and they can do this without needing to have, you know, broad knowledge of how the collector works or understand the entire code base of OTTL.
So that was one thing, but also just I guess coming a little bit less from a personal standpoint more, you know, why my employer has me work on the collector, Dynatrace sees a lot of enterprise customers and they've got all these sorts of weird use cases.
And it was pretty clear to me that OTTL is exactly the sort of tool that's going to be useful for these people.
Jess: Oh, oh, oh. So how are people using it?
Evan: It's hard to even really give a concrete example here, but basically just abusing their data in every way possible.
Tyler: Yeah, that's a good term. Abusing the data. And that was a key concept for OTTL is that it should not be spec'ed. You should be allowed to break all OTel spec with the transform processor if you wanted to.
Jess: You can break the OTel spec?
Tyler: So like your telemetry is not beholden to the OpenTelemetry specification anymore once you're in the transform processor. So for example, if you want to change your data point attributes and not re-aggregate, you're welcome to, that changes your metric identity-
Jess: It's a metrics thing.
Tyler: Yeah, like if that changes your metrics identity, that could really mess with, like if you've got some database that's doing time series metrics, like you might have changed something, the data might mathematically not be correct anymore, but if you know that the transformation is safe, like you're allowed to do it.
So basically we let you do the transformations that you want where there's not really a lot of handholding. Another example is that you could drop spans using the filter processor with OTTL, that could create orphan telemetry, but that's okay. Like, if you know it's safe you're allowed to do it.
Jess: Does the transform processor and the filter processor, do they work on OTLP specifically?
Tyler: Correct. All processors in the collector work on OTLP exclusively.
Jess: Okay, so the input is going to be OTLP and the output is going to be?
Austin Parker: Well, so they work on pdata to be more precise. So anything that you send into a collector gets turned into pdata, which is an in memory representation of OTLP.
So you are perfectly capable of taking, you know, unstructured logs that maybe you scrape with a file log receiver and then transform them in the transform processor.
And it's working on OT like, in memory at least it's working on OTLP, right? But the actual source of the data doesn't matter because by the time it's in a collector pipeline it is OTLP.
Jess: Okay, so the collector receivers, which might be scraping file logs or they might be receiving Zipkin, or Jaeger, or OpenTracing, or other historical formats, those receivers put that into pdata format?
Tyler: That's right, that's a requirement. All receivers have to pass their data on as pdata, which is OTLP.
Jess: Okay, so it looks like OTLP?
Austin: I mean, it's literally OTLP.
Tyler: It is, yeah. It's our wrapper around it.
Austin: It's the proto buff or it's a data structure that is OTLP, like all the fields are named the same and it obeys all the rules and I'm pretty sure it's... Is it actually it's generated from the proto bus, right?
Tyler: There's a part of it that's generated and then there's some additional things, some helpers, yeah.
Jess: Okay and that's why you can manipulate all this different data, but then you might export it as some other format than OTLP.
Tyler: That's right. The processors, like the transform processor would pass on its data, whatever just changed, it passes on that payload to one or more exporters, as many as you've configured.
And then it's the exporter's job to transform that OTLP pdata payload into the exporting format, if it's not OTLP. So if there's some backend that wants the data, not as OTLP, the exporter's job is to do the translation.
Ken: So one of the things that drew me to it was this excellent presentation you had on a cookbook for OTTL. And you've got a bunch of really useful examples in here. I'm just looking at the presentation from KubeCon.
But like, you know everything from like, parsing unstructured logs, right? You have data coming in and you're kind of either parsing things at JSON and putting things into places, or you know, some data coming in that you're kind of putting structure around so it's easier to work with.
What are some of the other ones that like when you show them the people that really lights them up, they're like, "Wow, we got to do this. We've got to use this tool."
Tyler: One thing that I really like about that presentation is that all of our examples were based on user questions from either our own customers or from users in Slack. So it was cool to be able to build a talk out of like, this is what people are trying to do.
One thing that I always like showing people that OTTL can do is arithmetic. So a lot of times people will say, "I need to know span duration and I want to do a transformation based on span duration." That one comes up quite a bit.
OTTL supports arithmetic. So you can do like span end time minus span start time to get the span duration and then compare that against, you know, milliseconds, nanoseconds, seconds, whatever you want to do, throw that into some condition and then make a change.
That one comes up more frequently than I would think it comes up but it ends up being kind of fun to tell people. "Oh, we can do arithmetic. You can calculate a duration in OTTL on the fly." And that solves their problem. I like that one.
Jess: Oh, I have one that I used the other day which was pulling some information out of the body of the log and using that to set the severity.
Tyler: Yep. OTTL can do that. There's open issues to make that even better 'cause I don't know when you did it, you might have had to write like, 10 different statements to handle info warning-
Jess: Yeah, you have to set the severity code and the severity tags and yeah.
Tyler: Yeah, and then there's five or six different cases for the different numbers. And then if you were really going to handle all 29 or whatever it is, log severity numbers, it would be a lot. So we have had some users say, "We'd like to make this a little less verbose, what can we do?" And we're in talks about that.
Jess: Nice. The benefits of a domain specific language.
Ken: So what are some of the limitations? What can't you do with OTTL? That maybe if you were going to approach it, you thought, "Well this should be something I can do."
The thing I'm thinking of that we talked about when I was getting ready was like, you know, looking at multiple events, right? That's like one thing it doesn't really easily do is looking at multiple messages and being able to process those, right? They're in a concept of state.
Evan: That's right. OTTL statements are kind of stateless by default. You can record data between statements and we have like a map like structure that we have called a cache that's kind of similar to programming language variables. I guess one of the upcoming things that we're looking to add to OTTL is handling lists of items.
So like you're saying like if I have a list of events and I want to somehow consider multiple events in, you know, creating a new event and just processing in general, OTTL doesn't do much list handling right now. That's something that we're hoping to add to the language.
OTTL also, if you're just comparing it to general programming languages and maybe not necessarily things that we're looking to add, state is one thing that we're probably not really looking to add. It's just a data processing language. We're not looking for it to be like a full-fledged general programming language where you can, you know, use singleton patterns and stuff like that.
We're also not really looking to have it make network calls. No infinite loops. So nothing that's going to be like, you know, an event loop or anything like that. Probably no asynchronous work. I highly doubt that's going to end up on our roadmap.
So anything that needs to kind of act as a stateful application in any way, shape or form is unlikely to be something that you're going to want to use OTTL for or something that you're going to hope that OTTL could be used for in the future.
Jess: Except this cache thing?
Austin: Yeah, the stateful, statelessness thing is so interesting to me because I think one of the bigger, like, realizations that people have to come to around OTel... Because OTel is by default immutable and you know, price is a lot of things are immutability and statelessness, right?
And quite often I see people come in they're like, "Well, I just want to do X." And it seems like it should be so easy. For instance, back propagating decisions or something. And it's like, yes, but if we did let you do that, then that would open up all of these other... The wonderful wide world of mutable data structures, and staple data structures, and race conditions and so on and so on, and so forth.
And I think that, I feel like as an industry we've spent so long telling people, "Hey, yeah, statelessness is good, immutability of variables is good, you should do that." And then we never learn. Or at least we have to constantly relearn why it's a good idea.
Tyler: And there's a couple different concepts of state to talk about for OTTL as well. So in OTTL there are functions, and each statement that you write in OTTL, you define a function like set, like you say, I want to use the set function to set a value.
Jess: Okay, so you're using a function, right? There's a limited set of functions-
Tyler: Yeah, yeah, you're not finding it. Sorry, you're using a function, they're predefined. And it compiled into the collector. At the moment, none of those functions remember anything about the input and the output that they used.
We haven't made any rules yet against a function remembering something that would be a form of state, but we don't have any functions today that do that.
For the language itself, what everyone was talking about was more like variable type stuff, like OTTL, the language doesn't support defining a variable and then three lines later referencing that variable and using it. You can't do that in OTTL.
The data types that we work with, so spans, metrics, logs, that kind of stuff, we have added a feature like a thing called, we call the cache a map. It's outside of the language.
And when you're using the transform processor in your group of statements, you can use it to remember things but then it's forgotten once you go on to the next span, or the next log, or the next data point.
Jess: So could I do things like count the span events?
Tyler: You can count the span events but only because we have a function called length that can get the length of a slice. An example of when you would use the cache is, let's say you had a JSON string in a body and you wanted to parse that JSON string into a map, and then you wanted to access different values from that parsed string and use it to set an attribute. You could parse the string and save it-
Jess: Oh, so you can avoid parsing the body over and over, and over every time you need one thing in it?
Tyler: Correct, so you could parse the body, put it into the cache and then reference that in the next statements. But then, that would be only for that particular log or whatever. Once you move on to the next log it would forget about all that.
Austin: It's honestly a lot like if you've used Rust, like the way the borrow checker works and where it really enforces things going in and out of scope, right? Like as soon as you iterate past whatever you're currently on, then you lose the context of it.
Jess: As you said, Austin, we learn over and over again that statefulness on a global scope is really painful. But we also learn over and over again that statefulness in a very local scope is super useful.
Tyler: There's a couple more features that OTTL doesn't support that we want to support, and we've got open issues, and we're like working on making it better. So I want to call those out too.
So the first is the thing that Evan mentioned around dealing with lists. So the idea of... I would like to write a statement and I would like to look at a list of items and maybe do certain things to only some of them.
So, like, maybe I want to change only a subset of my attributes and not all of them based on some condition, that looping concept we can't do yet but we want to do it and we've got people working on it.
Another thing that we can't do yet but we want to do is dynamic indexing. So say you've got a list or a map and you'd like to index that map based on a key, but you don't actually know the name of that key, maybe that key is saved off in another attribute or it's the name of your span or something.
Being able to say, "I would like to access my attributes map based on some other field." We can't do that yet, but that would be nice.
And then finally, OTTL allows you to set values higher in the OTLP hierarchy. So say you're in a span and you've got a span attribute, maybe even service.name on your span attribute, and you're like... Oh, service.name that's not a span attribute, that's supposed to be a resource attribute. I'd like to set a resource attribute.
OTTL allows you to do that. You can say "set my resource attribute using my span attribute." But there's a lot of caveats to that and I won't get into 'em unless you want me to. But that upward direction setting has some gotchas and we're working on fixing that as well.
Jess: Oh, that is something that I like to do.
Austin: Yeah, I do think one thing that's kind of nice about OTTL is like, it is technically, like, independent of the transform processor. So there's room for, like, either at a ecosystem level or people that want to come in and write.
You know, if you do want these processors that let you do funky stuff, or stateful stuff, or if you want to promulgate rules across a, you know, cluster or whatever, like, you can use OTTL independently of any of this.
You could use OTTL if you wanted to extract it and build some sort of like thing that just sits on Kubernetes pods and interacts with telemetry at that layer and lets SDKs go and talk to it. I don't know, like, there's stuff you can do, right?
One thing that I think I would love for people to think about is like how can we extend these sort of open telemetry primitives into bigger, more exciting things.
Evan: I think that's a great call out. I mean one of the big benefits of using the collector as opposed to a different similar data processing tool is that it does its processing in an OTel native format and you can extend that to OTTL. Since OTTL works with a pdata or an OTLP like structure, it's very OTel native and we've leveraged that.
OTTL is considered like a gold module so it's a package that can be reused in any of a number of different collector components and we've made it so that it, you know, transform process and filter process like we've mentioned so far.
But the routing process also uses it so you can use OTTL to match conditions on your data as it comes through and figure out where it goes, the tail sampling processor so you can use it to figure out whether traces are sampled or not.
We have additional components as well that are continuing to use it. I think that there are probably at least two or three others and people are encouraged to use OTTL is... Its intention is to be the standard way of directly interacting with data in a way where the operation that you want to do on that data isn't necessarily prescribed.
Or if the operation is prescribed then maybe the conditions aren't prescribed. The goal is that when you're using the collector, you only need to know OTTL and you don't need to know all of these other various expression languages and ways of referring to the data.
We've made it so that our paths are hopefully easy to, you know, they're intuitive, they're easy to understand. You know, I have a metric name, metric.name should be hopefully intuitive and if I want to interact with that data directly, I already know what part I want. OTTL is kind of the canonical way of doing that.
Ken: So a practical perspective then as people start engaging with OTTL, what are some tricks for things like debugging your understanding what you're doing to kind help 'em get started? What would you do to kind of figure out what you're doing wrong?
Tyler: We get a lot of questions about how to debug OTTL, which is fair because it can be tricky and sometimes the error messages aren't great. We try to make them better, but sometimes working with a parser is just hard.
So what I like to do whenever I'm writing OTTL statements is start simple. So normally I have my input data as like a log and I use the file log receiver to read it in, that way I know exactly what my payload is that I'm looking at.
I do my transformations in the transform processor and then I use the debug exporter with verbosity detailed to print the data out to see what it looks like.
And one thing that's really, really important when you're using OTTL, or the transform processor, or any processor that uses OTTL is to remember that the statements work based on what the collector sees as your data, not how you see it in your vendor of choice's UI.
A lot of backends will do things to data, they might transform it a little bit, they might show it as a underscore instead of a dot, or maybe they've renamed the field entirely or whatever.
If you think the field name is something, make sure that that is actually how the collector sees the field name and the debug export is really good at showing you what the collector sees as the telemetry because if you take your attribute key from your backend and then that doesn't match what the collector knows the data to be, then it's not going to work.
We've also added some pretty in-depth debug logs to the transform processor and all of OTTL specifically actually.
So if you went to your collector and you turned on, I believe it's service telemetry logs level debug, the collector's going to start spitting out a ton of debug logs and some of those debug logs will be the transform processors or OTTLs the debug logs and it'll show you exactly what the transform processor is working on right now and then what it looks like after the transform processor or OTTL ran a statement.
So you can see the before and after and it'll just log each one of the transformations so you can see exactly how it's changing after each step. And that's quite helpful.
Ken: Are you looking long term at maybe having the ability to put these things outside of YAML? My only thought is YAML is?
Austin: Great, lovely, wonderful. The best thing since sliced bread?
Ken: Is it though? It's not bad. I'm just asking.
Austin: I like YAML.
Ken: But I mean, I guess my point being all right, in addition to just having the YAML, editing YAML, you got a lot of debug output, which is really helpful and ways to do that and some techniques.
Linting it, anything like that, is anyone ask for a linting syntax? It's probably very hard to write a linter for this, right?
Evan: The short answer is yes. We actually, Tyler and I have had discussions recently with somebody who's done like professional compiler design and has actual programming language experience specifically I think with Scala.
And we have looked into the possibility of putting OTTL into separate files, you know, writing something like an LSP server for it, things along those lines.
I will not say that they're on the short list. They're, you know, challenging problems in their self on their own and they are going to take some time to develop. Somebody's going to have to take that up. But I definitely think that it's possible and it is something that we'd like to see.
Austin: Good first issue, right? Write the LSP integration.
Tyler: We'll add help wanted on that.
Austin: Yeah, that's a help wanted but I do think that... I mean, it's totally valid because there is a, like, OTTL is not Turing complete, but there is a set of rules, like one nice thing with the fact that YAML and JSON are more, you know, are convertible, is that you could publish like JSON or YMAL schemes for it and that gets you a little closer.
Like I think something that would probably be like a good weekend, like if someone's looking for like good weekend projects, you could pull in the OTTL parser and lexer, and everything from the collector Go mods and then write just a little, you know, loop that gives you a UI, lets you paste in, you know, some input and then as you know, and then you click go and then it applies all the rules and shows you the output.
Jess: A little JSFiddle but for OTTL.
Tyler: There's actually, people have asked for a way to try their OTTL statements without running a collector because there's, you know, input and set up for that. So someone has opened an issue for, "Hey what if we had an OTTL playground?" Like the Go Playground.
Jess: There it is.
Austin: Yeah.
Tyler: The answer is yes, that would be great. I hope someone works on it and someone did work on it. I don't know if it's live right now, but they posted in that issue a public accessible website that had input OTLP JSON representation and you could write your statements and it showed the output and-
Austin: Nice.
Tyler: It was really great. So yeah, the community is asking for that and also the community is responding so that's an exciting bit of work right there.
Austin: Yeah, I was also going to say like bonus since it's Go, you can compile it to WASM and not have to like, just run the actual module in the browser. WASM being WebAssembly.
Ken: Very cool. I'm going to point people to the Cloud Native Computing Foundation Slack as a place you could go and look for some of this stuff, right? You can see the conversation going on around OpenTelemetry in there.
We'll post your cookbook presentation, of course, and a link to the project itself. Where are other resources people can go to for more information on OTTL?
Evan: The big one is... For better or worse just the package README inside of the GitHub repo. Tyler has a good blog on it.
And then we have those two other talks that we gave at two previous conferences that are, in my opinion slightly better introductions to OTTL. I think that the cookbook was more intended for people who are already using it to maybe expand their horizons a little bit about what it can do.
Tyler: Yeah, we'd like the OpenTelemetry website to have more docs around the transform processor, and filter processor, and OTTL and how to use it to select data and all the different components and so on.
But as it's just changing so much still, like the collector in general I mean, we don't have a lot of collector docs in the OpenTelemetry site yet for specific components.
We've got a lot of like how do you run the collector? How do you install the collector? We've got a bunch of that stuff but we haven't moved our individual component documentation to the OTel site yet.
Jess: The collector is changing rapidly?
Tyler: When I say the collector's changing rapidly, not necessarily for end users, but we release every two weeks and the change log is big. So yeah, like a lot of features go out every two weeks in the collector a lot.
Like Contrib is the most actively contributed to repository collector contrib in OpenTelemetry and I don't even think it's close. Like the next second most isn't even close to Contrib.
So we get a lot of engagement on that repository. That's where all of the communities like the OpenTelemetry owned components live. So there's a lot of features add, a lot of bug fixes, sometimes there's breaking changes, and so at the moment we've been keeping all of our documentation in the README still.
Jess: Can you tell us about the contrib repository structure and how it contrasts with the other repository which is just OpenTelemetry collector maybe?
Evan: So Contrib and the collector are both fairly similar. So OpenTelemetry dash collector without the contrib, without any suffix, we usually call core. Just because it's the APIs. So a lot of the APIs and then a handful of very central important components are in there.
So stuff like the batch processor, or the OTLP receiver, or the OTLP exporters, and those are in there mostly because we want a set of components kind of already in the repo when we go to make any API changes.
Contrib is largely organized the same way. So they're both organized with the different component types at the top level of the repos, so you've got your receivers, your processors, connectors, extensions, exporters.
And within that, then you have the individual components themselves. And those are all fairly self-contained. Of course, you have other things at the top level, but I think that those are probably for the average user, the most relevant folders that they'd want to look through.
Jess: So from an end-user perspective, the stuff in the OpenTelemetry core repository is pretty stable but then there's in the OpenTelemetry collector contrib repository, there's a whole slew of different components with varying degrees of stability. Right?
Tyler: Yeah. And really it's best to look at the components specifically that you care about. So each component that OpenTelemetry manages has a stability level for any signal that it supports.
So you can go to a component and you can see this is in beta for metrics, traces and logs, but some other component might be in development for logs, alpha for metrics, and beta for traces or something like that.
So it's a good practice to look at the individual components when you're trying to understand the stability levels for what you're running.
Ken: And that's in the READMEs?
Tyler: That's correct. Right at the top.
Jess: Right, so you go to like github.com/open-telemetry/opentelemetry-collector- contrib/exporter, maybe, if you're looking for a particular one. And then in there, you look for the particular place you would want to export to or whatever.
You click in there into that package and then you look at the README, and then you look at the stability level for the signal that you want, so if you're looking for logs for instance, and if it says beta, that's actually usually pretty good.
Tyler: Beta's the top that you could be right now.
Jess: Really? Okay.
Tyler: We have defined stable, but we have had a rule that. We're not going to call any component that we manage stable until the collector core libraries, the APIs that Evan was talking about, are 1.0.
Austin: I also want to add a little color here 'cause I see people talk about like, "Oh, OTel is so unstable or not production ready." Part of the reason is that we actually have like really strict requirements for like what stability means and really strict support guarantees.
So anything that gets marked stable, we basically support forever up until like any kind of breaking change or any kind of change really. We have to support the old way of doing it for at least three years after that change.
Jess: Three years?
Austin: Well that's the whole freaking point of OTel is that it's supposed to be a stable platform for instrumentation. We don't want, you know, we don't want to change things, which means there's a, one, there's a bias away from stabilizing stuff because you know, we know we're going to have to live with our decisions for a long time.
But two, like, we tend to stabilize things once we have sufficient evidence from the community and production that like it truly should be stable, right? So I think this is maybe something people aren't really as used to anymore. Like, that level of support for an open source project.
You know, if I tried to go use a 3-year-old node.JS library that hadn't been touched since then, I imagine I would've been in for a lot of fun times, right? Like how many versions of React have we gone through in three years? How many versions of-
Jess: And how many months of development effort have we spent on every version bump of React? Oh my gosh.
Ken: Or pay for price when you don't.
Austin: Right, so OTel stabilizes slowly, you know, specifically to avoid, one, us burning out a bunch of maintainers and contributors with having to deal with our own bad decisions.
And two, because we want to get, you know, we've made promises and we want to keep those promises to our end-users. So don't be scared of beta in OTel.
Jess: That's really important. Is this definition of stability a CNCF definition?
Austin: No, it's something we came up with.
Jess: Okay. And this is a way that OpenTelemetry is compatible with enterprise software development?
Austin: I mean, I would say more or less, right?
Like it's also just an important thing for everyone to keep in mind is that the goal of OpenTelemetry is to make instrumentation built in, make it native. We want this to be something that just exists without who having to think of it. And the only way to way get people bigger than us, you know, to get languages or frameworks or whoever to really adopt it and integrate it is to provide guarantees like that.
So that they can plan around, like, you know, Java today when I said, "Okay, I'm going to go write some core framework things, some standard library stuff that integrates with the OTel API."
Like they could know that whatever they write today is going to be good for three years. Like that's hugely important when you're trying to do a standards project.
So yeah, it's not just like to make it enterprise friendly or whatever, it's to really achieve the goals of the project, I would say.
Jess: Yeah, I mean, it's enterprise friendly in the sense that people want to know this stuff is going to still work and OpenTelemetry as a project is very oriented around that. Meanwhile, for any particular collector component, it sounds like beta is effectively, we don't expect to change this, but we'll adapt if the underlying APIs change.
Tyler: Yeah, the collector has really strict requirements for what it means to do a breaking change for beta components.
Jess: Oh, nice.
Tyler: There's a entire feature gate process for doing a breaking change against a beta component. And normally that means introducing the change behind a feature gate that you can opt into. Eventually switching the gate to be on by default so then you can opt out of, and then eventually removing the gate.
And that's at minimum, a three release process, which would be six weeks but most of the time it's longer than that. For end-user stuff, like config changes, feature gates are out there for a while.
Jess: Nice, so beta here means something very different than beta in whatever you found on npm?
Evan: Absolutely.
Tyler: I had say so yes.
Jess: That's reassuring.
Evan: So one thing that I want to call out real quick, just that we're not giving any guarantees that we can't keep here. For the collectors specifically, the guarantees for support are a little bit shorter than OTel in general.
We're not quite as worried about things like, somebody, you know, baking instrumentation into the Java standard library. This is mostly used by people run on servers and things like that.
So for binary releases, we only give a support guarantee of one year and then for package releases at six months. And this is after the next stable version is released, if this is the current stable version, it's in support, you know, for as long as the project continues.
Jess: That makes sense. Yeah, the collector is easier to upgrade than code that you baked into your library.
Ken: And you could react to the new things that are coming out and handle them, add new features to it because of that.
Jess: Well, thank you for this, like, deep perspective on what's going on in OpenTelemetry, and the collector, and all the different packages in OTTL? Is OTTL beta?
Tyler: If only. Last year at KubeCon in 2023, we stood on stage and said it's almost beta. And then, both Evan and I worked on a lot of other stuff in 2024, specifically getting the collector to 1.0 instead of OTTL.
So there is one issue left. There's a tracking, there's a tracking issue, that you can go see if you want to look at where OTTL status is. There's really one issue left right now and we've got a very active contributor working on it.
It's kind of a complex one, they've been working on it all fall and they're making great progress, and really as soon as that issue is closed, we'll mark OTTL is beta. As long as there's not some other big thing that comes up.
But when it comes to the configuration of OTTL, we don't foresee any big breaking changes, so we're really close. But we did the thing that OpenTelemetry in general did instead that we were really close and then a year later and we hadn't finished it yet, but that's okay, someone's working on it now.
And we're really thankful for the community members that contribute to OTTL because Evan and I are busy doing a lot of stuff in collector land and outside of collector land. And so it's really helpful when we get new contributors doing things.
Jess: Great, so when someone wants to contribute, where should they go?
Tyler: Collector Contrib and you can look for issues that are marked help wanted, and if you're really new, also include "Good First Issue."
Jess: Yay. That's great. And where can people find more from the two of you?
Tyler: I don't use social media, so my GitHub handle is "TylerHelmuth." You can reach me there. It's the same on the CNCF Slack, if you wanted to find me, it's the same name.
Evan: Yeah, mine's evan-bradley. Or just Evan Bradley on CNCF Slack.
Jess: Great, thank you.
Ken: Thank you very much. This was a great conversation.
Content from the Library
O11ycast Ep. #77, Observability 2.0 and Beyond with Jeremy Morrell
In episode 77 of o11ycast, Charity, Martin, Ken, and Jess welcome Jeremy Morrell to talk about OpenTelemetry, the future of...
O11ycast Ep. #76, Managing 200-Armed Agents with Andrew Keller
In episode 76 of o11ycast, Jessica Kerr and Martin Thwaites speak with Andrew Keller, Principal Engineer at ObservIQ, about the...
O11ycast Ep. #75, O11yneering with Daniel Ravenstone and Adriana Villela
In episode 75 of o11ycast, Daniel Ravenstone and Adriana Villela dive into the challenges of adopting observability and...