1. Library
  2. Podcasts
  3. O11ycast
  4. Ep. #43, Going Serverless with Apostolis (Toli) Apostolidis of Cinch
O11ycast
35 MIN

Ep. #43, Going Serverless with Apostolis (Toli) Apostolidis of Cinch

light mode
about the episode

In episode 43 of o11ycast, Charity and Liz speak with Apostolis (Toli) Apostolidis of Cinch. They discuss Cinch’s move from containers to serverless, lessons learned from Team Topologies, and advice on structuring your org with strategic constraints.

Apostolis (Toli) Apostolidis is Principal Software Engineer at Cinch, a U.K.-based company removing the faff from buying used cars online.

transcript

Apostolis "Toli" Apostolidis: The funny thing is that we didn't start out serverless.

So an agency built the first iteration of our product which is a website where you can buy secondhand cars online.

We started container-based, so we had containers in Azure.

We started cloud but we started container-based in Kubernetes actually in Azure.

And while we started building our organization that would take over the agency that built the initial software, we had a lot of issues with actually operating and understanding Kubernetes and understanding everything that goes on.

Although everything was infrastructure's code and everything was very well-written, we identified really early on that that wasn't going to be our differentiator.

At the start of the pandemic when we actually pivoted our product as well, to a standard second-hand car listing website.

To a website that we sell Cinch cars, our cars, directly to consumers.

So that was a massive product pivot. And we also decided, well, maybe it's a good opportunity to switch to serverless. And the reasoning there was, we didn't want to care at all about anything underlying our software so not even containers.

Liz Fong-Jones: It's one of those funny things that Intercom says, they say, run Less Software.

And I think Kubernetes counts as software.

If you can get away without running Kubernetes, you absolutely should.

Toli: Yeah. And it was a very funny thing for me because I came on thinking, oh, I haven't used Kubernetes, sounds fun.

I've done a form of optimized scheduling in the past, so I thought this is an interesting problem.

But actually we found that we didn't want to solve that problem.

It wasn't a problem we wanted to resolve.

So serverless made complete sense.

And not only we moved from containers to serverless, we moved from Azure to AWS because AWS made sense as well.

And eventually I think that made complete sense because six months later we had a product that we launched and we had massive success.

We had no issues at all. We had very minimum platform, if I can call it a platform, and that was just configuring AWS accounts.

Liz: So you did this before you did your major launch to customers.

So it wasn't like changing the tires on the car mid-race, it was-

Toli: Yeah. We let the website running as it was and we launched the new website with a click of a button six months later.

Charity Majors: Nice. So that's not easy either, trying to build and ship something that's got a completely different architecture.

Toli: It was not easy at all. And we have to scale at the same time.

We started at the start of 2020 with two product engineering teams, that we just created to take over the agency that I spoke about earlier.

And in March we probably had two or three, and then by October we had six product engineering teams building this website on the side.

Charity: 6 teams building it on the side.

Toli: Yeah. And we actually spent quite a bit of time thinking about how we do it in a way that it's not as big as a bank, if that makes any sense. Okay.

Customers didn't see the website, but we were deploying to prod as if it was prod.

And at the end we ended up just having a, I think it was a feature switch.

We just switched the feature and it was-

Charity: Yeah. That was to be my next question.

The hardest part with developing two versions of the same thing in parallel, tends to be, well team two is trying to frantically build to catch up, but team one can't just stop building, right? It can't stop shipping new features or fixing bugs and stuff.

And at some point you just have to call a hard stop and be like, "I'm sorry. No further progress may be made on this branch so the other one can catch up."

Toli: Yeah. As an organization we're quite influenced by Team Topologies.

Liz: Oh, yes. Manuel Pais. Yes. I love Manuel, he's great.

Toli: Yeah. And Matthew Skelton as well.

I think Matthew Skelton is basing leads, as well where I'm based.

And it's interesting because we talk about the two parallel teams working two parallel software systems.

I was actually tech lead of one of the teams that was dealing with the live website, if that makes sense.

And we do design our organization, we do design our teams in a way that makes sense.

So that team would take ownership entirely off the day-to-day that'd be AU if you like, feature.

Charity: Talk a little bit more about, what the lessons are that you learned from Team Topologies that really helped you practically.

Toli: I think the number one lesson is the idea of Conway's law Inverse Maneuver, that you need to be very careful in how you design your organization.

You need to be very careful with what constraints you put on your organization.

It's all about the team types.

Interestingly, Team Topologies talks about 14 types.

And up until recently we only ever had one, and that one was the streamline team.

Because the reason is we didn't need one, any other type at that point in time.

And the other big thing is communication between teams. This idea that it doesn't have to be maximum communication it's actually, too much communication might be a problem.

Liz: Yeah. In fact the developer advocate we just hired, who just joined Honeycomb Jessica.

She posted an article this week about how the more freeform communication collaboration you have between different teams in completely different areas of the business.

And the less structured it is, the more potentially out for these weird inner client dependencies that should be formalized.

Toli: Don't get me wrong.

We've got those weird dependencies that weren't formalized, but we tried very explicitly to evolve our organization in a way that some constraints were put in place.

So we are quite influenced by the Spotify model terminology, at least.

So we have product engineering teams that are autonomous and cross-functional, and there we call them squads.

And then initially we just had squads. So then we said, okay, well actually we have too many squads.

Now we need to break them down into logical business units that made sense.

And then we ended up with three tribes that were logically separated.

And that creates micro worlds.

And it creates some separation between squads because naturally a tribe that does search and convert for example, is quite different from a tribe that does operations, which is delivery and inventory and things like that.

And they end up with quite different tech and they end up with different types of engineering practices. So it's helped us quite a bit.

Liz: Right. That's a fun little overload of the word operations to mean, not soccer operations, but driving cars around operations.

Toli: It triggered me really early on. So I did call that out.

But the other interesting thing is talking about designing with intention is that, we had some constraints from the very start.

So we started out with no manual QA.

We thought we're not going to have any manual QA at all.

And what that meant is that we had a lot more focus on automation.

So test automation, CSD automation, so pipelines and deployment patterns.

Liz: It definitely sounds like you had a Greenfield environment to some extent, right?

You're able to be thoughtful about, how do you want to build this out?

How do you want to make it scale?

Toli: I think our tech leaders were quite bold to say, we inherited this from the agency.

Can we break from that and create something new?

And that was quite brave at the start, but I think it paid off eventually.

It's not that the old backend, for example, is not still used.

It is and we're still phasing it out.

But it really didn't have much overhead in terms of operating it.

It sits there, it works. It does what it needs to do and keep parts of the website as well, but we didn't need to iterate over it.

So it's worked quite nicely.

Charity: Where do SREs fit into this. Do you have any SREs?

Toli: We don't right now, SRE practice that we want to evolve.

We're currently, I would say, probably designing for that a bit, designing orientation for that.

I'd say that I've been speaking a lot about surface level objectives internally.

And trying to go from monitor everything to, okay, what matters for your service?

Charity: And you have how many software engineers?

Toli: Right now, we probably have about 100.

But don't hold me on that. I'm not entirely sure.

Charity: That's out of your amazing work or, oh my God, there are so many efficiencies that you probably could have, if you had some SREs?

Toli: Yeah. Completely.

I think we would benefit a lot from SREs.

One thing that we have done is, each product engineering team has, as you would expect all the roles or the cross-functional roles that you'd expect, which is a number of engineers.

You'll have a tech lead. You'll have maybe a UX or you'll have an analytics person.

So all the skills that you need to deliver software.

We also have another role, which we termed as automation engineers.

Now that may be quite a badly named role, but their role is very much a, almost like a dev ops subject matter expert or an engineer.

A software engineer who focuses a lot more than normal.

Every software engineer should, but a lot more than normal.

How can we improve our infrastructure configuration?

How can we improve our deployment patterns?

How can we improve our observability instrumentation?

How can we improve our monitoring?

Charity: It's great to have engineers devoted to that, whether they're software SRE and to be fair, there aren't really that many SREs out there, who have a bench of serverless experience.

Although there are some, and I think that there are some amazing force multipliers, but we should pause to ask you to introduce yourself here.

Toli: So I'm Apostolis Apostolidis. People call me Toli, for short.

I currently work at Cinch, a UK startup or scale up, you could say now, that has built a platform that takes the fat out of finding and buying a used car online.

At Cinch, I started out at the company's emphasis.

We were about five or 10 people at the time, as a dev ops advocate as such.

Because we wanted to build the organization with dev ops first mindset or an observable first mindset.

And I moved a few roles since.

But right now I probably moved to more of a staff engineering role, focused on dev ops across the organization.

And I'm about to move into more of a engineering practice lead role focused on dev ops again.

So you can see how my role has evolved.

Liz: That makes a lot of sense then why you might not have needed to hire SREs, if you had the number six employee in engineering, already that expertise in dev ops being baked into your culture from the start.

Charity: It's living the dream, honestly.

Toli: Yeah. My biggest thing really is, I'm assigned to software engineering.

I've searched all the stacks in the past.

I studied maths and physics in the past.

But when I was told, well, you're here to learn about dev ops and you're here to learn about observability and go away and follow this person, Charity Majors, and you'll learn everything.

And your role is to learn and bring things into their organization.

That's actually a really nice role to have.

It's challenging at times, but because it's explicit that your role is that, it means that it's important for the organization.

You're not a dev ops team. You're a dev ops advocate, if that makes sense.

So there's a very subtle differentiation.

Liz: Right. It's a matter of responsibility as a shared responsibility or is it this person's only responsibility.

Toli: Yeah.

Liz: So you mentioned observability in there.

What led Cinch to consider observability as the primary concern of dev ops.

What was `important to you about observability?

Toli: So as an organization, we wanted to build a platform that, at the time in 2019, seemed a bit crazy, but very quickly we realized that it's something that's happening in the industry quite a bit.

So we recognized really early on that we need to experiment and fail fast and optimized for fast flow and small batches and experimentation, actual software experimentation.

So to do that, it's not enough to just deliver.

We wanted to be able to understand what our software is in production.

And when I say we, from the start, we didn't say, oh, we'll have an ops team.

We'll analyze things and, or we'll have a data team analyze things.

We wanted from the start, the teams to take ownership of their production systems and the teams to understand how they behave, and be curious and figure out what their changes are doing.

And it goes back to that principle, that teams who build the software know the software the best. So they're in the best place.

Charity: And what that quote, like in gold.

Just like, teams that build the software, know the software best. It's just like an immutable laws of software physics.

Liz: Yeah. I'm really grateful to hear that there are teams that are approaching things this way, rather than, we didn't practice this.

Now we're in trouble. Now we're in pain, right?

Instead, when you start from this proactive stance of let's empower devs to own their software and production, you get better results.

Charity: And how hilarious, and ironic and awesome to see it come from a group of developers, starting from the beginning.

I've often lamented at how few ops founders there are.

How few teams have ops from the beginning.

But you don't have to have a specialist from the beginning.

If you just take this shit seriously, software engineers are more than capable of, of learning what needs to be learned.

If they just respect the discipline and level up at it.

Toli: You have an ops background, Charity. Right?

Charity: Yeah. 100,000%.

Toli: I can definitely recognize that ops is hard.

Soft engineers are finding it hard, but they're learning.

Especially at Cinch, we're learning a lot, but it takes time.

We have hiccups on the way. We have, where my logs?

I want my logs. I can't find my logs.

Just want to log this. Just let me log this, but to, oh no, but you can instrument your code.

You can have tracing. You can have custom metrics.

You can have things on the front-end as well.

That was a big battle that we fought of, you can measure real user traffic.

You don't have to rely on either logs or synthetics.

Liz: Did people think that it was going to be too hard to do, where they just end familiar with it.

What are some of the barriers that you saw, that you had to push out of the way?

Toli: Biggest barriers were, what does a good observability instrumentation look like?

I don't know. So we need to help with that. And we're still learning.

Charity: To be fair, very few engineers in the world have ever experienced it.

Toli: Yeah. It's funny because when we were in Azure, we actually got it into a place where it was really good, but we only had four services.

But as you scale, it becomes really hard.

I guess our biggest challenge was, okay, software engineers, really like writing code and really like delivering software.

Even all the challenges of deployment patterns and all that.

But then how do you teach people to be curious?

How do you teach people to know where to find things, or ask questions or get familiar with their production telemetry data?

I think that was the biggest challenge.

And tooling is pivotal for this because it's an extra UI that you need to learn.

It's an extra SDK you need to learn.

It's an extra concepts that you need to learn around observability, that is not just out there.

It's not like TDD is now, everyone put TDD on their CV, but it observable is not in everyone's CV.

Charity: Right. Hey Toli, you had submitted a talk for O11yCon, right? What was that talk about?

Toli: So I wanted to talk about our journey at Cinch.

How we went from container-based cloud services in Azure, to a serverless AWS software systems.

And how we went about taking our teams, our product and engineering teams on a journey of observability.

And how we got to a place where each team owns their production of telemetry data, if I can call it that.

Their instrumentation of their code, until they're collecting that telemetry data in useful queries.

Dare I say, in dashboards that initially, I personally was very against this idea that, oh, we'll just create.

Probably our tech leaders would say, "Where's the dashboard?"

Where I can just look up and to a place, it's a dashboard gazing type.

Exercise to a place where dashboards are actually used at Cinch, as a regular ceremony.

So after stand-ups, you'd look at your dashboard like, is there anything out here?

As a team who will look at your dashboard and say, is there anything on here?

Liz: Right. Exactly. It's not the dashboards are inherently bad. It's that, and maintain dashboards are bad.

So if it's something you're looking at regularly, if you're regularly immersing yourself in your system.

Charity, what's that thing you say about knowing what things look like, if you look at them regularly.

Charity: Yeah. I say all kinds of shit. But yeah, dashboards are not evil.

It's just that dashboards are a jumping off point and dashboards like, my rage is only for static dashboards.

If their dashboards even click on and dive down into and slice and dice on, zero rage, those are great.

I don't think of those as dashboards. I think of those as like querying tools, right?

And that's the mindset that people need to be in while they're trying to debug their systems.

That, what about this? And what about this? And what about this?

And this active interrogation, instead of just like I've seen.

And then the last person too many times, it's just like flipping through countless dashboards, looking for a spike that correlates, or just hazely pattern matching going.

It feels like red is, which is super validating when you're right.

But it takes a whole fuck ton of intuition and experience and stuff, to be able to pattern match.

And you can't expect that of everyone, right?

It's better to have a tool that allows you to ask questions, not at the level of low level systems like this or that, IPv6 and such product, whatever.

Those spikes are just constant and almost meaningless, but at the level of functions, and endpoints and the world that suffered in your spend their lives emerged in.

You should be able to engage with it dynamically at that level.

And I feel like just, if people rely on dashboards too much, they're often blind to, what's not captured there.

Their mental model of the system is much weaker and they're jumping to the answer and then looking for evidence that they're right, over and over again.

Instead of iteratively, following a trail of breadcrumbs to find the problem.

Toli: I think the biggest challenge we've faced is trying to understand what's important for it to team.

So one key thing is that our product is, you would call it a high value product service sale matters.

It's a car that you buy. So everything that happens in the system is not important.

Not everything is important, but you want to-

Liz: There's a real dollar value associated with some transactions.

Toli: Yeah.

So if you can't check out, for example, you need to be instrumenting for, and having all that information about why you can't check out or knowing that customers can't check out.

And for us, it's been quite a journey because serverless has allowed us to actually think about these things quite a bit.

I keep thinking about this, why haven't I not done observability?

Why have I not measured things this well in the past?

And it's probably because we cared a lot about, okay, where is my VM?

Where is my server? Whereas my container? How do I get to production?

All this stuff. Whereas all this stuff at Cinch is really quite, okay, we have our testing, regression testing.

We have various parts of regression testing.

Charity: Transactional. But it's not zeroing in on the business use case. Right?

Toli: Yeah. So we have a lot of business metrics.

So we, if I could say, the observability types that we started with were more custom metrics in AWS, either through logs, or through AWS custom metrics.

And those, they would measure things like cars sold or car delivered, things like that.

Whereas, then we started thinking, okay, so what else is important?

What other high cardinality data points we can have, that we can query against them? And that's been a journey.

Charity: So did you start thinking about it, first as a cardinality problem or a tracing problem, or is it both at once when you were coming to this realization?

Toli: I think there was a big difference between what I was thinking and where I was influenced, than what actually teams were thinking.

And I think we had to meet somewhere.

So the first question that the teams tried to answer is, how do I know if my service is not doing what it's expecting in production?

So they started with probably with metrics, that's it.

The metrics, not even any dimensions on the metrics or any tags, so they would know that.

But then they started using tracing initially, with x-ray, because it comes almost out of the box.

And then they use logs as well. There's teams that use logs heavily.

One big thing is we user real user monitoring on the front-end.

I think that, that's been quite an eye-opener for a lot of front engineers.

We have software engineers across the board, that you might have a front end background a bit more than, than others or a backend background, a bit more than others, but typically you're a software engineer.

But still front engineers are not used to measuring real traffic.

Liz: Right. Exactly. Can I actually see this user transaction flowing all the way from the front-end through to the backend?

It's huge. It's really eyeopening.

Toli: Exactly. And breaking those barriers down.

I'd love to sit here and tell you, well, actually, we started the tracing and we could see the whole, the full trace which we did actually in Azure.

But in serverless, there was a bit harder.

I'd say biggest achievement we've had is that teams own their production systems.

And from your machine to production is really close.

And then on production, most teams know what's going on.

Charity: How often do your teams typically, get paged or woken up in the middle of the night?

Toli: This is a great question.

Woken up in the middle of the night doesn't really happen, almost at all because we're entirely based in the UK right now.

So we don't really have anything. People don't buy cars in the middle of the night that much.

Charity: Really? No impulse shopping cars?

Toli: Buying cars is funny. Because you might be looking at a car for a few weeks.

I think it's about two or three weeks that you might be browsing a car and you buy it eventually.

It's not a thing that you do in half an hour as a choice, I guess.

In terms of incidents, we started out really sensitive with instance.

Because as I said, we have a relatively high value product and every issue with anything in the routes of buying a vehicle was an issue.

And we raise it as an incident.

I think there's a very special separation between in-hour's incidents and out of hours and incidents.

In-hours incidents is entirely owned by teams, especially when the incidents are related to services.

Charity: The impact is like, out of hours is like five times that of in-hours.

Toli: Yeah. And I think that the way we deal with it is different as well.

So typically, out of hours we take the stance that, okay, you might have an incident, but unless it's really critical, it can wait until the morning to fix it.

We never really have any issues that, oh, website down, we need to deal with it right now.

I can't remember an instance right now.

Charity: It's always more settled in that. Right? It's usually more settled.

And how has the on-call rotation staff then.

Is it staff per team or do you have like Intercom does what they volunteer rotation of engineers, who volunteer for the night shift?

Toli: Right now it's a volunteer rotation and part of the rotation as well.

Liz: That's so good to hear, right?

That you can make teams accountable for the staff during business hours, when they're more likely to be doing pushes anyways and also have people, have on-call be pleasant enough so that people are volunteering for it, rather than being gooned into it.

Toli: Yeah.

Liz: That's so great to hear.

One other question I wanted to ask was, you mentioned the distance.

You've literally said, the distance from desktop to running in the cloud is very short.

What's that interval between someone writing code and it running in production?

Toli: In terms of time?

Liz: Yeah. In terms of time.

Toli: It probably takes, for a backend change, it probably takes about five, 10 minutes to get to production, by the time you commit.

The funny story is that when we started building up our AWS account structure, bare in mind, we didn't know much about AWS.

We were learning on the go.

We had a prod account and we had a dev account and we were about to create a UAT or a staging account.

Just because that's what software engineers do. Right?

And we decided against that. We said, we don't need a UAT.

What's the reason for a UAT account?

So that has served us until now. We're restructuring our accounts now.

But for a long time, we have a debt account, which acts in various ways, usually as an integration account.

And then your next step is prod and that's it.

So you deployed today, we deployed prod and that's it.

And that's really shortened the distance.

Liz: Yeah. Test in prod.

Toli: Yeah. And it puts a special emphasis on prod.

Charity: Serverless is so good at that.

Toli: Yeah.

Charity: Have you guys embraced feature flags and progressing deployments, anything like that?

Toli: We haven't, as much as we'd probably want.

What we do, do is we have started feature flags that are build time feature flags.

And that's the extent. And we also have AB testing on the front end.

So typically when we release a new feature, we'll AB test it.

We'll progressively it even from 1% to 50% of traffic.

And we want to look more into this. We looked at it a bit and the serverless it's not as straightforward.

But yeah, we probably want to investigate more on that because right now it's either on or off. Right?

Liz: Right. One question I had was about your service level objectives.

You mentioned you have been starting to keep SLOs or at least keep track of key user journeys.

What are things you encourage your teams to do, given that they're kind of, you have a bunch of independent teams now?

Toli: Right now we're very early on with service level objectives.

Well, we have definitely, in terms of maturities is monitoring.

Unfortunately we've got the monitoring everything mantra.

So I think we've caught the attention of our software engineering teams, that you want to know if something goes wrong.

I think in terms of a journey, it felt like an easier win.

Our next stage is, okay, so what does your service do and what level is okay for you?

And then the next question is how do you measure that?

So we started out with one or two squads or product containing teams.

And they're measuring things like, latency of our Lambdas.

They're measuring things like, are we returning vehicles, that the types of the makes and models of vehicles that we expect, things like that.

So it's a mixture between performance operations and even business as well.

Liz: Which makes total sense, right? You have these critical user journeys.

Fast to get as good, slow is equivalent to down.

And you want to make sure and validate that people are, at the end of the day, able to buy cars.

Toli: Yeah. As I say, were very early on.

And I definitely think that we would benefit from SRE concepts like SLOs.

I think that there's a certain level of cognitive overhead that, that entails.

So what do other organizations do, to actually get to that place, is it separate teams?

Because I know Google, well, I don't know. I've heard Google has separate SRE teams that helps with this.

Liz: I don't think that people need separate SRE teams per se.

But I do think there is a discipline involved in saying, you know what?

Are we getting meaning out of these alerts?

Are we getting value of these alerts or should we turn them off?

Charity: This is not one of those new things in systems.

This is something that SRE and optimal we're battling with, for fucking 30 odd years.

And so there's a lot of best practices and there's a lot of, but they're not easy to just give someone a rule book and go, do these things.

It's much more of a feedback loop and an adaptive process.

For example, a mistake that a lot of software engineering teams make is, they care deeply about their service.

So they alert themselves on everything.

And the shift to SLS and budgeting is in large part, meant to relieve that.

To not page you on symptoms or paged on flaps or whatever, but actually to align operations, engineering pain, strictly with customers are in pain, right?

If your customers aren't in pain, your engineering teams aren't in pain.

And that is something, when teams are earlier, they're often like, you can grow up for a long way, like over alerting yourselves, right?

But there comes a point at scale, where you're just capped and you have to relax your standards a bit and only alert when your customers are being alerted.

Liz: I think there's another piece of that question that you asked, which was the value of adding incremental telemetry data.

Where we tend to push back really hard on this idea that more logs are better or more metrics are better. Right?

That it's better to have one set of good signals than three sets of crappy signals.

Charity: Yeah.

Toli: Yeah. I think for us, I think that's around the two data is there now.

And we're in a good place with instrumentation practices.

I think the big challenge now is, how do we get to a point where we own our service level objectives or we own what the service level that is acceptable?

And probably that's a theme throughout building Cinch is, ownership helps with everything.

Lack of ownership means that it will happen basically.

So I can see the solution being something along the lines of, if you own your service level, if it's your responsibility, then you'll find a way to measure it.

You'll find a way to know.

Liz: Exactly.

And I think that corollary to that is, if no one can remember why that alert is there, or if you always turn it off every time you don't need to wait for someone's approval to turn off, just turn it off. Right?

You have autonomy over that. It's not just, your team has autonomy.

I'm picking your quotes. But you're bound to your past decisions. You're really not. You can change it.

Toli: If the question is asked then delete the monitor, delete the alert.

Same answer to tests, delete the test if it's not useful. If you're wondering it, delete it.

Charity: Yep. If it's really important, you'll find out. Yeah.

I think there's a tendency also, for engineering teams to, just like with tests.

They'll keep the ones that are failing around because they see it as a point of pride and stuff and it's just, there's diminishing returns.

Like you said, just delete the test and move on.

So we wanted to talk a little bit about the analogy between adopting a very observably first mindset with software, with the pandemic and the observability of biological systems, human systems in production.

Toli: Yeah. I was thinking this summer, that the world has gone through a weird phase right now, whereby the system is not as green as we thought it was initially.

And we all know about it. It's impacted all of us and that's why it's termed the pandemic.

And it was just thinking about how we've gone through a journey of assuming that everything is fine.

And assuming that we don't need to care about things.

To a place where actually we all use terminology or we all know things, that we know how to do things that we didn't know two years ago at all.

It was something that we had to think maybe, by exception.

So I was thinking, especially with data, we started out thinking about, well, how impactful is COVID?

How much problem has it caused, to the point where now we almost all have an, I'd probably say an agreed way of measuring the impact of COVID?

And that's for me, a great analogy to software systems that are both biological systems and software systems are complex systems that, you don't always need to know the detail of what's going on, that just takes a long.

Liz: And you have to act before you add all of the information.

I think this other interesting piece here is this idea that it doesn't matter if you are missing it, as long as you have the ability to zoom in and enhance and pause that new question and gather the data for it.

And then pause the next few questions, you have the data for it.

That's how we get there iteratively, rather than doing it all at once.

Toli: Yeah. For me, the analogy with Cinch as well is, in general, the approach that Cinch has taken is, that we democratized the observability of our biological systems, right?

In the pandemic we all learned to contribute to the data.

We instrument our bodies, right? We will go and do PCR tests.

We all go and do rapid tests or self tests.

We are aware of the various metrics. We learn from experts.

We were guided along the way and we were taught to be curious and we notice things.

And that's very similar, a very similar approach in software systems.

And we can't accept any more that the system is largely green.

We now need to accept that we'll have instance and to learn from them.

And we learned to notice various states and trends.

Liz: Yep. Spotting the common patterns.

Toli: Yeah. And unfortunately, so my partner is a molecular biology researcher.

So she, and explaining the dynamics.

She told me, well, it's happened now, we need to deal with it.

That's what we say in software as well.

An incident happens, that's just something we need to accept and learn from it.

Liz: Well, thank you very much for joining us on the show Toli. It was a pleasure having you.

Toli: Thank you. It was my pleasure.