Ep. #74, The Universal Language of Telemetry with Liudmila Molkova
In episode 74 of o11ycast, Liudmila Molkova unpacks the importance of semantic conventions in telemetry. The discussion highlights the challenge of agreeing on naming conventions, the role of working groups in maintaining these standards, and the potential for future integration with AI technologies. Liudmila underscores the importance of a shared, vendor-agnostic approach to telemetry, enabling smoother interoperability across platforms.
Liudmila Molkova is a Principal Software Engineer at Microsoft, specializing in observability and distributed tracing. She is a key contributor to the Azure SDKs team, focusing on improving observability across various programming languages.
In episode 74 of o11ycast, Liudmila Molkova unpacks the importance of semantic conventions in telemetry. The discussion highlights the challenge of agreeing on naming conventions, the role of working groups in maintaining these standards, and the potential for future integration with AI technologies. Liudmila underscores the importance of a shared, vendor-agnostic approach to telemetry, enabling smoother interoperability across platforms.
transcript
Liudmila Molkova: And I think a lot of people find themselves in your own shoes, right? So they try, let's say, OpenTelemetry or they tried OpenTracing or something else in the past, and they see all these amazing traces, and they're so happy.
They can finally see things with their eyes instead of, I don't know, reading this boring walls of text and gripping stuff, right? And they might think, okay, I'm done. That's perfect. That's the heaven.
Until they need to, let's say, find a needle in the haystack or aggregate this data, and they realize, oh, I don't understand what this span describes. I have ten different ways to record URL on my spans.
Or they move from one team to another, or from one org to another, and they realize that they have no idea whatsoever what that system generates, what kind of telemetry.
So my dream that if I leave Microsoft and go work on some other company, as I've done in the past, and I look at their telemetry, I would understand what it means.
Martin Thwaites: Shared vocabulary, basically. We're talking, everybody's talking the same language, the universal language of telemetry and observability data, really.
Jessica "Jess" Kerr: This is a noble dream.
Martin: We will fight with you.
Austin Parker: And it's so simple to do and so easy, because as everyone knows, the easiest thing in the world is to get a bunch of developers in a room together and have them agree on what to name something.
Martin: It's one of the easiest problems in computer science.
Jess: There's a lot of nodding happening right now.
Martin: So tell us about you. Why is it you're passionate about this? What is it that you do day to day, that means that you're interested in this.
Liudmila: Yeah, great question. So I'll start with a quick intro and then I'll explain why I'm passionate about it. So I work at Microsoft on Azure SDK team. My main role is observability architect, if you say, if you want to give it a name.
Jess: What's your name?
Liudmila: I'm Liudmila Molkova. It's a hard one to pronounce. I know, you can pronounce it whatever way you want, but my GitHub handle is "Lmolkova." It's easy to spell. So this is how you can find me.
I'm the observability architect of the Azure SDK team. Most of my time I spend actually working on OpenTelemetry or helping people inside Microsoft to make some progress on their telemetry solutions.
I used to work on Azure Monitor and I have some backend development stuff going on there for tracing. My actual observability journey started years ago at Skype.
We'd run some cool services to picture sharing. And it's interesting. So even a very simple thing like image distribution needs some very powerful observability techniques.
So when I send you a picture, it's an asynchronous operation. You receive it maybe days ahead, maybe seconds. And it's interesting to know how many times, how many people will receive it.
You want it for caching, right? It's interesting too, if we are both online, it's interesting to know the end-to-end latency, how much time happened between I hit send and you actually started receiving.
And it's hard to imagine how much semantic conventions and observability stuff work goes into making it possible. I don't think we actually ever achieved it properly.
Well, we had some very complicated pipelines, Hadoop and whatnot, to actually analyze all this data and find the end-to-end latencies. And I'm super passionate about it because maybe I just don't understand stuff.
So people talk about some very high-level things. They say, okay, end-to-end latency, but they never bother to define it. Or they say, "Okay, our service is slow, but what does it mean? Slow in what?"
Or people say, "Okay, we need to change this stuff. This algorithm is imperfect."
But if you have some telemetry and you look at it, you will find that this algorithm has nothing to do with reliability or performance, it's just something else. It doesn't even get to this algorithm. It's stuck somewhere else.
Martin: Yeah, what's the idea? If we don't have data, all we've got is opinions. And you know whose opinion matters the most? Well, that's the debate, really?
Jess: Yours, because your beard is the longest and your mouth is the loudest.
Martin: My name's Martin, and I approve of this message. But, yeah, this idea of being able to understand these complex operations like video, is a really interesting difference.
A lot of people will work in web. They'll open a request, and they'll see a thing, and that's easy to reason about. It's easy to know what's going on. Like you say, you've got these: " What's a URL? What's a route, or route," depending on where you are in the world.
Those things are really easy, but it gets really a lot deeper than that, how you expand that information out. And that shared vocabulary, that shared language that we have, if we have that between teams, if we have that between organizations, it makes onboarding easier, and it makes fixing things.
It reduces stress, because, well, I can go and see the thing, and it tells me what went wrong. It tells me where to go, which is the holy grail, really.
Liudmila: Keep talking. I enjoy listening to it so much.
Martin: Oh, I could talk for a long time.
Liudmila: I actually really like talking to Martin because he gives a lot of feedback that other people don't give. And I don't always agree with it, but I enjoy listening to it a lot.
Austin: Yeah. This is actually secretly Martin's performance review for the year.
One thing that's interesting anecdote from a couple years ago, back when we were really just getting rolling with OpenTelemetry. And it was around when we invited the Prometheus maintainers in a little more to try and better align what they were doing with OpenMetrics and with what we were doing OpenTelemetry.
And I do recall very clearly there was a conversation where someone from the Prometheus community is like, "I don't know about the rest of the stuff you're doing, but these semantic inventions seem great. We're really, really interested in that."
And since then, it's been this really interesting journey towards, let's look some historical notes. We had the Elastic team come in and kind of donate their Elastic schema.
Martin: Elastic Common Schema.
Austin: Yes, thank you. Elastic Common Schema.
Martin: It was amazing. Everybody loved it. It caused no disruptions whatsoever, just helped everything along.
Jess: Okay, okay, okay. For the sake of people who don't have all the inside jokes, no, we cannot agree on naming. And part of that was, oh, well, let's change the semantic conventions to match this Elastic Common Schema.
Austin: Oh, no, no, it's align them.
Jess: No?
Austin: Subtly different.
Jess: Align. Oh.
Austin: So what I wanted to ask Liudmila though, is I think a lot of people, when they come into OTel, semantic conventions seem like this big scary thing that's kind of happening off in the corner.
Jess: Can we define semantic conventions real quick?
Austin: So, yeah, let's do a little like soup to nuts. How do we get here? What's going on? Why are these important? What do they do? And why do we spend so much time on them as a project?
Liudmila: Oh, great questions. So let's do the journey of this developer who just enabled tracing. They look at the traces, they're happy, and now they want to, I don't know, create a shared code between all the services, microservices and their team.
They want to write some HTTP handler, and they want everyone to use this handler. What should it do? At the bare minimum, it should give us band name. They want to set a status.
Well, what else? Maybe record exceptions. You probably need to capture something about HTTP requests. You want to also say, "What does span represent?" Is it the try? What do you do for redirects? And so on.
So semantic conventions is actually the project that tries to answer these questions. There is this piece of technology, like HTTP or databases, and you want to document what we capture, how we record telemetry about this technology. And we can go very broad.
We can say all databases should do this, and we can have a way to narrow it down and say, "Okay, for MongoDB it means this specifically," but the general stuff still applies. And then this developer who just enabled tracing and now writes some common handler, goes and implements this handler in a certain way.
Or better, he can go and use the instrumentation created by someone else that follows the semantic conventions, and somebody else like OpenTelemetry or some runtime that actually wants to provide some native instrumentation, they can implement the semantic conventions.
So, to summarize, semantic conventions is a contract, that says "Everything that captures this telemetry should do it in this way, roughly." It allows some wiggle room, but the goal is that for someone who uses this telemetry or visualizes it, they can look at the spans, metrics, logs, anything, and know what's going on there, know how to visualize it.
Martin: I think it's a lot about removing the naming debate out of a lot of things. I mean, there's consistency, I get that. The main thing for me about the semantic conventions is nobody can argue about naming anymore.
Do we call it HTTP.route? Or do we call it HTTP.request.URL? Do we call it request.target? No, semantic convention says, "This is the name. Shut up."
Jess: We argue, so you don't have to.
Martin: That should be the tagline for the semantic convention sig. But no, this idea that we can use standardized names for things, because then everybody uses the same name, whether you're implementing it in Go or you're implementing it in .NET or PHP or Java, we all know that HTTP.route is what that means.
It means how you did the routing infrastructure inside of your application, how it went to how you did the templating. All of those things have a meaning now.
The big thing for me though, and I don't know whether this is what semantic conventions, one of their goals is, is it allows telemetry vendors, the people on the backend, to make assumptions about data and really about being able to visualize and analyze that data, is the whole reason it exists.
So if you've got a data catalog that tells us why these things are there, and if this thing's there and this thing's there, oh, it means this. So if you've got messaging.type is on there and it says, "Serviceables" or "RabbitMQ," oh, I know that there's a RabbitMQ instance there. I can put an icon on the span that's got a little rabbit on it.
These things help, these visualizations and all of that kind of stuff really, really helps. And to remove that away, if we really want OpenTelemetry to be this idea of a vendor agnostic interface to those back-ends, we all need to use the same names, regardless of language, regardless of backend system, regardless of SDK, we need to use those same names.
Liudmila: Yes, see Martin can say it much better than me.
Jess: He is preaching the mission of the semantic conventions, I guess. So Liudmila, are you on the Semantic Conventions Committee?
Liudmila: I'm the semantic conventions maintainer. There are other maintainers. I probably contribute a lot of micro things to semantic conventions and also macro things.
I'm actively trying to move database and messaging semantic conventions forward. They are being experimental for I don't know how many years, I think from OpenTracing years.
And now we are trying to stabilize them. We have made some great progress. We are almost done at database site or still in progress, and messaging. Also there are some--
Okay, so how semantic conventions group works, we have kind of big repository for semantic conventions, and there is everything there.
You can find people trying to add profiling stuff. You can find some hardware metrics defined there. So somebody said, "If it's about computers, if it has anything to do with computers, it's probably in semantic conventions."
So obviously no one is an expert in everything. And how we try to work is that we have some focus groups, work groups that take specific area and they co-work on it or just maintain it.
So we have Database Working Group, we have GenAI Working Group or LLM, we have CI/CD Working Group, we have a bunch of others.
And those groups are, they meet, they discuss things, they work on prototypes, they submit proposals, and effectively the overall semantic convention community, once the group agrees on something, it's happy to give it more of a cosmetic review and make sure it follows whatever else we have.
So sometimes, let's say CI/CD Working Group wants to add something, but it also affects Security Working Group, because they are similar to some extent, and we want to make sure they stay in sync and they will do something consistent rather than doing stuff and small silos.
Yes, so I'm part of at least several of these groups, but also I'm the maintainer and I'm trying to keep this balance between individual groups and overall stuff.
Martin: So you're not really selling me getting involved, because that sounds like my own personal hell. But you know, like Jess says, I think the, you know, "We argue so you don't have to" is a really cool tagline for that meeting.
I mean, it is for the benefit of everyone, isn't it? You know, if everybody's using the same names, then it just makes things easier, you know? So we appreciate you.
Liudmila: Thank you. Some of the time when we argue about a specific name or should versus must, I feel that I'm doing some very important work. I use my time wisely.
Austin: So on the subject of naming, one thing that we've, I've really seen kind of grow in popularity over the past couple of years, especially you can comment with the rise of OpenTelemetry, is the amount of startups and tooling built on top of generative AI, large language models, things like that, that leverage OpenTelemetry data.
And I think that for as much as we like to think about how useful semantic conventions are to us as humans, so that we have this shared language in context, it's also very useful for language models and other people building sort of generative AI stuff.
Do you think that's accurate? Do you think we're going to see more stuff like this?
Liudmila: I'm a GenAI pessimist. I think it could be interesting to give the GenAI a trace or a bunch of telemetry. Not too much though, and ask it to summarize what's going on there. But then you need to go back and validate it and check. Okay, did you really understand?
So I hope there will be some future developments that would allow to maybe instead of showing you the whole huge trace, give you some parts of it that actually demonstrate something interesting. But I think we're pretty far from it.
Maybe Honeycomb is ahead. In my part of the world, I haven't seen something optimistic. So I'm kind of a very pragmatic person and I want to just build a foundation. I don't know how people will use it.
I'm happy if they will use GenAI. I want to build a foundation that all of these things are possible. You can write your query manually or you can tell GenAI to write this query. It doesn't matter, right? As long as data is there, it's structured, it's consistent, I'm happy.
Jess: Foundations. That's a good description of OpenTelemetry in general and semantic conventions in particular.
Austin: I mean, that's kind of the point of OpenTelemetry, right, to have that stable base layer that's just kind of everywhere.
Jess: And the other thing I like about semantic conventions is that you don't have to use OpenTelemetry to use the conventions, right?
Liudmila: Absolutely, you don't have to. Well, I work on OpenTelemetry. We actually want you to use OpenTelemetry. If there are reasons you cannot, then yeah, at least you can use semantic conventions.
I can give you an example close to my home. So, Azure, you can imagine Azure services had some observability for years, not sure if it happened before OpenTracing or along with it, but effectively they have years of existing observability around them, and they expose this data to end users, some of this data, and they use it internally.
Can they move the OpenTelemetry overnight? Well, no. Can they use OpenTelemetry over a decade? Maybe some. Effectively, if Azure services needed their events or logs or whatever, but they followed the same annotations as OpenTelemetry defines, that would be nice.
We have some projects, let's say in Java, we have micrometer observation API and micrometer tracing and micrometer metrics. There are systems that use them, there are communities that want to use them, we can't always change it.
We want to play nicely with them, we want to work with them, we want to unify at some point we hope for it. But if the only thing they are open to be using is semantic conventions, well, it's awesome.
At least people can put this data in the same backend and they can use this data in the same way, even though there could be some difficulties with correlation, some features and stuff, but at least the shape of this telemetry will be similar.
Martin: Yeah, and it's about shape and also the data that's contained within it. That data catalog is, you know, yes, we've got a big blob of data which we call attributes, but if at least they conform to the format, the API contract, but then also conform to using the right names for the right things.
It really doesn't matter about the internals of whether you're using Java or whether you're using .NET or PHP, but also whether you're using the OpenTelemetry library or whether the Azure SDK is producing its own telemetry. As long as it comes out in the same format, everybody benefits from the same thing.
Austin: One other thing that's really, I think, exciting about this that we haven't talked about is that it is designed as kind of the separate thing, right?
Like OpenTelemetry isn't just saying, "This is how everything should be named," it's providing tools and formats and all this stuff to build an ecosystem of this, because you can imagine a future where you, as you're building a framework or a library or a product or something, you could just ship your own convention file and say, "Hey, are you using MyCoolPlatform.exe? Well, here's all the data it emits and what that data means."
And then tools that also speak in this terms of these conventions could import that and use it and build on it and do all sorts of cool stuff.
Martin: And that's where that GenAI, the LLMs comes in, isn't it? Because if we can describe the data up front, they've got a better idea.
Austin: Even humans can, right Liudmila?
Liudmila: Yeah, and even before we get there, I think that there is a bunch of cool stuff that happened in OpenTelemetry regarding this, but it's like we grow.
So I think I saw a few examples of OpenTelemetry growth that touched things that never happened before. So for example, instrumentations, right? The presence of common instrumentation libraries, it's been there for years, but at some point we start seeing native ones coming and native ones opening some problems.
I don't know if runtime instruments, their HTTP client, they end up discovering a lot of interesting edge cases and corner cases for just the HTTP. Or when we instrument, I don't know, messaging stuff, again we discover that there are a bunch of core things, like I don't know, the throughput or latency.
There are also things that are specific to my messaging system. How do I describe it? Where do I put it? I can keep it in OpenTelemetry. I can keep it to myself. I can document it somewhere else and give everyone my documentation.
One cool thing that happens in OpenTelemetry is it's called Project Weaver. The goal is to allow you to define your own semantic conventions. You can generate Markdown files, you can generate code.
Currently, code generation is, well, it's not awesome. It just generates you some attribute names, and maybe you can generate the metric definition, the function that will give you an instrument. But in the future, what we can do, well, that's another dream. Imagine you have a swagger for your OpenAPI description for your service.
What if you describe the telemetry there? What if you can say, "Okay, this operation can be described as the span, and you can also have a metric, and these are the attributes, you get them from this parameters of the model or from the headers or whatnot," and you hit generate, and you get your SDK generated with telemetry in it.
That's the dream I have for not just the semantic conventions, but every piece of tooling around it. And you don't need to host the semantic conventions in OpenTelemetry.
I can imagine, as we have instrumentations outside of OpenTelemetry repositories, that we may have some registries of semantic conventions, some complementing instrumentation libraries hosted somewhere else. They can be specific to certain domain, let's say security, or they can be highly optimized, they can have some other properties. It's just there could be multiple complementing forms of instrumentations and different parts of the semantic convention definitions.
Martin: So you think like the idea of, well, let's create an E-commerce semantic convention. So if you're writing an E-commerce site, then maybe there's some common things for how we describe products or categories.
It might be to do with, well, you run an observability platform. Maybe this is how you do your internal telemetry. So it's not just limited to the tech, but also to the domain.
And people could be, "yeah, we're the arbiters of the E-commerce semantic convention domain," or I'm sure there's lots of other ones, not just like security and stuff like that, but "I create products like this."
Again, somebody else has had a naming debate so that you don't have to argue about it internally and you can just bring those things in.
Liudmila: Absolutely.
Austin: I want to kind of call back. We did talk about how you don't have to use OpenTelemetry.
But I think that the thing that OpenTelemetry really shows you is the relationship between semantic conventions for metadata and then actual semantic data itself, because you can take these and just use it as like, okay, this is what I should call stuff.
But when you combine that with OpenTelemetry and OpenTelemetry gives you this very powerful system of like, okay, this is what a trace or a span actually represents. This is what a metric actually represents.
This is what an event or a log is. And we link all this together with context. It's the jelly to the peanut butter, right? It's two things that taste great, that go great together.
Because just knowing the names is like, okay, well, that's fine, but knowing, oh, this is the exact type of this metric and this is how it's recorded, and this is how frequently it's recorded, and this is data that you can use to interpret the data, really helps push a lot of these decisions.
It shifts them left. Instead of your observability vendor, whoever having to say, "Okay, well, here's all of your stuff. Here's your dashboards, your alerts, whatever."
It means that if I'm creating my framework or library or product or whatever, I can just say, "Oh, no, this is what you need to pay attention to." I describe it through semantic inventions. I shape the data using OpenTelemetry and OTLP.
And then you don't have to guess anymore. You don't have to figure this all out through experience. You can just use the people, use someone else's expertise and save time.
Liudmila: Yes, I consider this using semantic conventions only as a first step. If you cannot use the rest, you start with the basics, and this is part of your transition journey. But of course, there are things you would only benefit from if you use the whole stack.
Martin: And I like the fact that inside of the semantic convention, some of it is just, "here's a name for a thing and here's what it should contain."
But some of them say, "If you've got a span for a HTTP request, here's what should be there and here's what Must be there."
The idea of, you have to have these, which helps describe a whole span, not just the name.
Liudmila: Yes, the Must versus should.
Jess: What is that about?
Liudmila: Oh, yeah. Anytime anyone introduces Must, you get an enormous amount of pushback, usually.
Jess: What does Must imply?
Liudmila: Yeah, the capital Must implies that no matter what, even if the earth explodes or something, you should, you must do something.
I'll give you an example. The part of the debate I've been into. We have sampling in OpenTelemetry and sampling can be just random, it can be opinionated. You can say, "Okay, I want to sample out all my HTTP head requests, because they are useless and I don't care about them."
But sampling is a thing that, well, at least usually it runs before you collect any information. It runs when the span starts, right? And in order for you to have a sampler like this, you need to, you need instrumentations to provide you the sum information, right?
For example, the HTTP method name. But instrumentations can require high performance. They can have some opinions. They might say, "Okay, it's too expensive for me to collect the HTTP method for all spans, even though that usually I will collect maybe 1% of them. So I want to optimize on performance and I don't want to collect the HTTP method name and provide it at the start time."
So I was very radical. I wanted all instrumentations to collect this information. I pushed hard for must there. So if you collect these attributes, you must provide them at planned start time.
So I think I made progress. We put this in the spec, and then when we were working on HTTP semantic conventions, we started figuring out some edge cases where, okay, there are some questionable attributes, not everyone can collect them.
There are performance implications, security implications and whatnot. And we changed it to should the should. Should is, you must, but there could be good reasons for you not to. And I think I bored all the listeners by now. But must versus should is the, I think this thing, I spend maybe 20% of my time on.
Jess: So if the semantic inventions have like a Must, you must include this attribute at the start of the span, then things like head samplers can rely on that being there. So it's a plus for that. But it puts a limitation on any instrumentation library.
And then if for some reason it can't provide that, then it's not conforming to the semantic conventions.
Liudmila: Right. And it means that if I'm a backend and I visualize something or maybe I have my custom sampler, things can get broken. I expected it to be there. I relied on this, right?
And instrumentation didn't follow the contract, if it didn't provide things there. So this is the contract and this is a broken contract trade.
Martin: And that causes some real problems.
You know, head sampling is one really interesting area where having more information allows you to do better head sampling. If literally all you've got is the URL, that's the only thing that you can use, it becomes hard to do head sampling.
That one change between should and must has caused so many problems around the sampling debates that have happened even in .NET world, when we've been talking about how to do the head sampling in various different bits within the .NET instrumentation.
It's caused so many problems because people want to be able to say, "I want to sample the auth route differently than my homepage. I want to sample POSTs versus GETs to the homepage different."
But none of that information is provided as part of the sampling decisions, makes it hard.
Jess: Or isn't guaranteed.
Martin: Well, the .NET world, it's just not supplied at all, because when they looked at the spec, it said, "should" not "must," and they went, "Performance!"
Austin: One other kind of thing to throw in here too is, when we're building these as a project, OpenTelemetry is a project that is extremely dedicated to, we have a few core tenets, and one of those is that we're never going to optimize around a particular storage destination.
We're never going to optimize around a particular telemetry consumer, right? So we want to do, as a project, we want to make, you know, those most semantically accurate, useful language for how to represent software systems and the connections between those systems and the work that they're doing.
And this runs headlong into semantic conventions quite often because you'll get people that'll come in, it's like, well, I'm using X, Y, or Z, and the cardinality of this attribute is too high for my metrics-based system.
And maybe it would be fine if they were doing mostly tracing or mostly logging, or this must versus should doesn't comport with my security profile, my security stance.
So if you want to talk a little bit about some of, to kind of give the audience more context, I guess, you know, how many different things are coming into making these decisions and why it takes so long sometimes for the process to flow out.
Jess: Oh, so much reality.
Liudmila: So, you know, the HTTP evolution, the semantic conventions for HTTP gives a good perspective. I think we had maybe a handful of iterations that completely changed how HTTP stuff is recorded.
And the good segues for this changes were, yes, there is a ECS, the Elastic Common Schema. They define things in a different way. This is a well-established schema, and we should learn everything we can from it.
Another one is, who is this for? Earlier we discussed it's for the consumers, right? But early in the days it was more what can we collect and how we can record it, because it can be collected.
And the shape of the telemetry changes a lot, because if you optimize instrumentation and what can be collected, you will end up with a different set of things you want to collect and different names for these things.
Jess: That's interesting. So did you just say that the focus shifted from it started with, "Well, what can we collect? Put it in there," to "What do people need? How will it be used?"
Liudmila: Yeah, and a lot of the times when people start and define their own semantic conventions, that's my imagination. I'm happy if I'm wrong, but it feels that people start with, okay, what can be useful to know about this and what I can provide.
Let's say messaging headers contain this information. I need to define an attribute for every property in this header, but the question is why should we collect it, how users will benefit from it, and yes, every property is useful, but to how many people? Does it justify the cost of collecting it?
The GenAI semantic conventions gives a great perspective. People want us to record everything, request to model the responses from models. There are a lot of good reasons for it, but there is a balance.
You cannot collect it by default, all of it. You need to make sure that there is a way to reduce the amount of telemetry. You need to define what's important to collect.
So the things that usually come into the decision-making process, do we know that like almost everyone will benefit from it? How do we know? What are the common scenarios?
Second, is there something else that we can use? Is there something existing? Should it be in this layer? A good example is again GenAI, you have the logical span. I run the chat completion, I want to do the chat completion for this call.
But then there are internal steps, there could be HTTP requests. If you move one layer higher and you do some complex flow with GenAI, what information should go in which layer? Like the endpoint, the URL, it probably should be an HTTP layer.
The GenAI specific things, it should probably be in the GenAI layer. If you do RAG, like you store database vectors in the database, it's usually used in GenAI topic, but it's not GenAI-specific. You should not put it in the GenAI layer.
You should define something separate for vector databases. And then you go into the place where, okay, there are general database semantic conventions, vector databases are not specific to vectors usually.
And all those things need to be taken into consideration when you define something new in the area. The usefulness, the costs, and if it's the right place to capture things.
Austin: I also want to point out there's a tension or maybe a balance I should say, that isn't necessarily just in SemConv, but I think SemConv is where it gets felt most obliquely, that we do have to balance as a project, like okay, what is the realm of possibility that exists today with existing tooling, and then trade it against, where should the industry be going?
What capabilities are we trying to unlock for the next generation of observability tools and platforms? Because OpenTelemetry, it's not, I think there's a popular maybe conception that like, oh, this is just ossifying the way things are, and that's not true at all.
Like, we're trying to build something that's going to last 10, 20, 25 years. And so we have to think like, yes, what is sort of the minimal asset that we can do today or what is going to be broadly compatible with what exists today, but also has the hooks there and has the stuff there so that in the future, as technology improves and progresses, you'll be able to build new things that we can't even think about right now.
We don't want to hold back observability. So it's never, I would say there is literally no simple decision in SemConv.
Liudmila: There are some, to remove stuff.
Martin: There's an outsized impact.
Austin: Okay, yeah. Deleting code always is simple.
Liudmila: Yeah. But also, I think this is a great topic in OpenTelemetry in general. There are technical things we want to make happen, but there are things that are agreeable, right?
You can find energy and consensus in the community to some small things that unblock this, like multi-step progress towards something. And a lot of the times as in our needs, it's very easy to move forward small changes that people agree to.
It creates some interesting problems, again, in the semantic conventions where sometimes we are doing the small incremental things that slowly, one by one, move us in the wrong direction. We don't tackle some big problem. But sometimes it's a no-brainer. You want to introduce something, and if there is energy in the community, the community believes it's a good thing, a lot of people want it, you probably will not get any pressure against such a change.
Jess: So there are some sources of energy and joy.
Liudmila: Oh, of course, yes. One of the awesome things I want to highlight, CI/CD Working Group, if you are interested in instrumenting CI/CD or testing, I mean, this is the holy grail of observability.
You don't need to think about verbosity, performance, and whatnot. And everyone's observability in their CI/CD systems, I want it. I'm dying for it. And they just introduced a bunch of semantic conventions for this area.
I'm excited to see the progress there, and they are actually starting to meet regularly. If you want to join CI/CD Working Group, please do, because everyone knows what CI/CD is.
Everyone wants to see some progress there. I think they are moving super fast and making great progress.
Jess: Wonderful. Yeah, and where can people join the Continuous Integration Deployment Working Group?
Liudmila: The best place to start is to go to OpenTelemetry community and there should be projects there. There is a CI/CD project listed in the Community Repo.
There should be some Slack channel listed there as well. Again, they are picking the meeting times so there are no meeting times yet, but there will be soon.
Jess: Great, where else can people go if they want to learn more about you, about semantic conventions, about participating?
Liudmila: Yeah, the OpenTelemetry Semantic Conventions Repo is a good place to start. It's probably overwhelming. There is tons of stuff there.
We have a Monday meeting 8:00 AM Pacific Time every Monday. There is a Slack channel. Please join, ask your questions.
Jess: Is that in the CNCF Slack?
Liudmila: The CNCF Slack, yes. So one thing we are experimenting with in semantic conventions meeting is leaving some time for brainstorming and design to actually help people design semantic conventions rather than discussing their proposals or some comments on the pull request so they can actually come and have this open floor discussion and get some suggestions.
Jess: Nice.
Liudmila: Yeah, I have very high hopes for people actually using this opportunity to come and design stuff with us.
Jess: Fantastic. Liudmila, thank you so much for joining us on o11ycast.
Liudmila: Thank you for having me.
Content from the Library
O11ycast Ep. #76, Managing 200-Armed Agents with Andrew Keller
In episode 76 of o11ycast, Jessica Kerr and Martin Thwaites speak with Andrew Keller, Principal Engineer at ObservIQ, about the...
O11ycast Ep. #75, O11yneering with Daniel Ravenstone and Adriana Villela
In episode 75 of o11ycast, Daniel Ravenstone and Adriana Villela dive into the challenges of adopting observability and...
O11ycast Ep. #72, Mobile Observability with Hanson Ho of Embrace
Episode 72 of o11ycast explores the world of mobile observability with Hanson Ho, Android Architect at Embrace. Hanson unpacks...