APR 3, 2025

24 MIN

Ep. #33, Developer Experience with Nicole Forsgren

GuestsNicole Forsgren

light mode

about the episode

In episode 33 of Generationship, Rachel Chalmers is joined by Nicole Forsgren—developer productivity researcher and co-founder of DORA—to discuss how AI is reshaping the software development landscape. From coding with LLMs to evaluating trust in AI-generated outputs, they explore the future of developer experience and the enduring value of human intuition.

about the guests

Nicole Forsgren is a researcher, author, and expert in software engineering performance and DevOps. She is best known for her work on the DORA (DevOps Research and Assessment) team and as co-author of Accelerate: The Science of Lean Software and DevOps. Her research has helped organizations worldwide understand the real impact of DevOps on business outcomes.

show notes

about the episode

about the guests

show notes

transcript

Rachel Chalmers: Today I am thrilled to welcome Nicole Forsgren onto the show.

Nicole is a DevOps and developer productivity expert who thrives on helping large organizations reshape their culture, processes and technology to improve their business and enhance the developer experience.

She combines AI tools with developer expertise to help teams unlock productivity and innovation.

This work improves value delivery for organizations, improves software and process efficiency, and increases the satisfaction and wellbeing of developers.

Simply put, she uses research, strategy, data, and collaboration to make things awesome.

She's the brains behind DORA, DevOps Research and Assessment, the industry standard method for assessing software delivery and the predictive models to help organizations improve it.

Her career journey also includes roles as software engineer, professor, and award-winning author.

She's currently a partner at Microsoft Research where she helms the Developer Experience Lab using AI to revolutionize how developers work, and serves on the board at ACMQ.

Nicole, welcome to the show. It's great to have you.

Nicole Forsgren: Thank you so much for having me. I'm really excited.

Rachel: DORA has been incredibly influential in how people think about developer productivity and how writing software benefits a business' bottom line.

What do you think the impact of LLM-based code gen tools like Cursor might be?

Nicole: You know, I'm really excited to see all of the developments here.

And I think it'll have a few impacts in terms of value. I think a lot of that is just easy to see right now.

Although, you know, anytime we say it's undisputed, there's a lot of disputes out there.

But I think we're seeing a really wonderful place where it clicks in right now, right?

Even with today's technology, in today's models where it's helping us create boiler plate work, it's helping us do chore work, it's helping us understand code bases faster, which I think can be really wonderful because it can then unlock our ability to spend more time solving complex problems, being creative, coming up with innovative solutions.

Rachel: I do obviously, living in San Francisco, run across the guys who are like, "Stop hiring humans. The code gen agents are going to write everything."

And then the counter argument is typically, "Oh, they're not capable of systems thinking yet."

And then the counter counter argument is, "Oh, they will become capable of systems thinking."

But I really like Grady Booch's joke that in the future we won't need programmers. We'll just need folks who are really, really good at giving very specific instructions to computers.

What's your take on on that whole argument?

Nicole: That's perfect. That really needs to be up on a wall somewhere.

Rachel: Absolutely.

Nicole: I do think there is going to be more and more opportunity for code agents to step in, right?

Especially when we're looking at situations where folks have a quick idea, they want to do a quick prototype. We're seeing more and more of that.

But when we start thinking about complex distributed systems, right?

When we're looking at how several components interact, this is where I think humans and machines have this really kind of incredible joint superpower, right? Humans are just very good at recognizing things with very little data right?

Now, sometimes it leads to intuition, which isn't great, but you know, when we think about training a model, training a model to recognize faces requires this massive dataset.

Toddlers, babies can do this very, very quickly with just a very limited dataset, very like limited exposure to faces.

And so while computers can handle and process and reason over significantly more data points than a human can, we still need humans to help kind of disambiguate, and translate and understand what some of these patterns mean, especially when they're brand new patterns, right?

And I think that's where we're going to see a lot of opportunity. And we saw it in the past, right? Like humans have still been writing.

You could write by hand, you could write on a typewriter, you could write with a word processor. The humans are still really important.

We just now have better tools that help us move faster or do more work or, you know, other things. I was joking with my mom.

She helped with my dad's master's thesis in civil engineering, and it was on a typewriter and she typed it, and you couldn't have any mistakes anywhere.

And I'm like, "This is terrifying." When I was writing my dissertation or when I write pipers now, I write in different segments.

And I circle back around, and I think, everyone's like, "Well, yeah, of course you do." That was not historically the way things were done.

You wrote something top to bottom or maybe you had some scribbles. I used to write, you know, handwritten letters when I was young.

But it's really changed and I think up leveled the way that we can write documents. I think the same will be true for software.

Rachel: I was looking at a very beautiful photo of Dorothea Lange's the other day and thinking that even with photography, the human meaning of framing an image is still completely apparent.

Like the difference between a frame of a CCTV video feed and a Dorothea Lange shot is the why, it's the creation of human meaning from the data.

And that felt like an interesting parallel to what we're seeing now with the agents.

Nicole: Yeah, absolutely. And, you know, I think there's also something interesting when we think about framing both a picture but also framing the work that we're doing.

An interesting paper, I want to say, was posted to Archive and they found that people who frequently use ChatGPT for writing tasks, they're very, very accurate detectors of when something has been written by AI.

And I think it was like they misclassified only like one of 300, one of 250-300.

Rachel: Yeah.

Nicole: Right? And some of it was, we know when they're being super verbose, we know when they're being a little too formal, but also we could just kind of tell.

And that's just with writing tasks, right? That's the things that, like we are all, like, everyone is very, very familiar with.

I suspect we're also, you know, we talk about write prints and code prints, it's probably going to be a lot the same.

Rachel: Yeah, it's got a signature. It's like when you have that composite face of all of the Hollywood actors and it's recognizable, but it's recognizably in the uncanny valley.

Nicole: Right. And when we can find ways to kind of leverage that for usefulness, right? Boiler plate code, it's probably fine, right?

But when we're really thinking about really in depth complicated problems or new approaches, we can kind of see, we can tell.

Rachel: Yeah, it's going to push us towards articulating the why a little more.

Nicole: Yep. Exactly.

Rachel: How is developer experience going to change in the age of AI?

Nicole: I love this question, and I think some of it is still a little open.

Now, I would say that the goal of developer experience will probably remain the same. And that is reducing friction, increasing opportunity, right?

Reducing cognitive load so that we can really focus on the things that are important. Improving flow.

I think from a 30,000 foot view, that will still largely remain the same.

Now, when we get down into like the details, that's where I think we'll really see a shift in emergence of what the developer experience means, what it means to write code.

And, you know, some of the questions that I always get are, how should we think about detecting that and measuring it so that we can help improve it?

And those are some of the things that are continuing to kind of shift under our feet right now.

Rachel: How should young aspiring developers prepare themselves for the workplaces of the future? What advice should we be giving kids in college?

Nicole: Oh gosh, I will say I was a professor for years, and I am glad that I'm not a professor now because that is incredibly challenging to be doing work when you're at kind of the cusp of a significant, significant shift, and it's moving so fast.

That said, I would say for junior developers, I think it's going to be a mix of being able to write code and understand code very well without any type of LLM assistance, right?

Because being able to know what's important from a performance standpoint, from a security standpoint, from a structuring standpoint, is super important.

And it's important because when we use LLMs more and more, as we know, we almost shift from writing code to reviewing code, even when it's our own code, because we're given so much of it.

And at the same time, I think it's very, very important to play with and use these tools as much as possible.

You know, I think back in the day we would joke that, I remember I had a job actually as a mainframe developer and database developer, and I was quite young, but I was in there as a senior consultant because I could Google-fu the best, frankly, right?

Like my Google-fu was very, very good. And when my contract was finally coming to an end, I was going to move and go back to get my PhD, they were like, "Well, what are we going to do on filling?"

Like six months before, I'm like, "Okay, listen, none of this is magic. You're sending me these really, really complicated problems, and I," to the point about coding, I know the general shape of it, but sometimes I didn't know like a specific command or something.

I mean, I was young at this time. And I was like, "Okay, let's start practicing Googling for the answers. We need to start practice, searching and what are the words you pulled together."

And the first month or two, and these were incredibly smart people, but they struggled because it was such a completely different skillset.

Rachel: Yep.

Nicole: And for a while they're like, "It's okay, we'll just keep paying you, we'll just keep paying a senior."

I'm like, "Thank you. And also..."

And so, you know, as we're seeing now with prompt engineering, it's kind of the equivalent of having really good Google-fu.

How can we think about crafting our requests and evaluating the responses to get what we need out of it to understand what's truly happening, to look for those catches?

But I really think we need both. And I worry about folks who've lean too far on either end because it's not going to serve engineering and people who want to be software engineers well.

Now, if you're a technical PM or you want to pull up some quick prototypes, it's fine, right? That's not meant for scale. But when we think about engineering, I think both are going to be important.

Rachel: Yeah, I really take your point about being able to evaluate the responses.

If you don't have a programming practice of your own, unassisted, where you learn what works and what doesn't, you have nothing to review the generated code against.

You need to be able to compare it to something that's within your experience.

And I think when Google was more reliable and we had that ability to like Google several things and compare them, that's a really good model for how to think about the outputs from these LLMs.

Nicole: Mm-hm. And you know, to your point, now, so I'm not doing real engineering work, but I'm still taking time to write code without any type of LLM help and then write code with.

And I will say when you have both of those, it also increases your learning. And I'll point that out for the students as well, right?

Because I have ways that I typically tend to structure and break down my code and, you know, define my classes.

And then sometimes I'll see the LLM based on a high level prompt, do something different.

So there have been a few tricks I've kind of picked up from that or ways I do it a little bit differently now, or I think about it differently.

And then there are also times where I'm like, that's not the best approach here. That's just not going to fit. You kind of need both of those.

Rachel: It feels like a really tightly closed loop peer programming.

Nicole: Yep. Exactly.

Rachel: How should managers think about assessing the quality of generated code?

Nicole: This is probably the million dollar question.

Rachel: Yeah, can you tell me so that I can start a new DORA and-

Nicole: Yeah, let's just do our own startup. It'll be great.

Rachel: Yeah.

Nicole: We'll take off. I mean, I think some of it we need to go back to even a pre-question, precondition question: how do we assess the quality of code without AI?

It's largely unanswered, right? We can think about many things. We can think about the outcomes that we see downstream, right? And even how do you think about outcomes?

Are we talking about security, and reliability, and sustainability? Are we looking at business outcomes like ROI, and customer engagement and user value?

But when we think about, you know, upstream a little bit more, if we want to think about what contributes to those and what that looks like, these are still explored questions that I think are going to be even more important because now we're creating so much more code, right?

And more code means more tech debt.

Rachel: Yep

Nicole: So how do we think about evaluating this? And there are some, you know, basic metrics and tools out there for kind of proxying technical debt or proxying complexity.

Now, they're not great. And the best ones are incredibly complicated because something like code complexity is something, to the humans and machines thing, a good senior engineer can look at code and they can tell you if it is complex or if the work to debug a multi-system problem is complex.

They can tell you if code needs refactored. Breaking that down into a handful of metrics can be really challenging. And experts will disagree.

Rachel: Yeah, it's a we know it when we see it.

Nicole: Yeah, exactly. So that's my worst non-answer. I will say though, that we do see some similarities.

So I've worked on the SPACE framework, which is a way to think about evaluating developer productivity or really any type of productivity frankly.

And it covers five dimensions, I'll cover this real quick.

S is satisfaction, right? So it is a human satisfied with the way the system works or the way the tooling is functioning.

P is performance, it's an outcome. So this is going to be your quality or your security, or you know, your sustainability work.

A is an activity. Now, this is anything that can be counted. These are the ones that most people think of because we can operationalize them, and instrument them very, very easily.

Rachel: Lines of code.

Nicole: Lines of code, number of PRs, number of commits, number of, anything, that's number, there you go.

C is communication and collaboration. This can be how people talk and work together. It could be how systems do.

What are the stats on our API calls? Do they use very often? Do they break all the time? What's our compatibility look like?

And an E is efficiency and flow. How fast is something. So any kind of a time metric. Or also you can ask someone, were you able to get into the flow? Were you able to concentrate?

And again, that's one of those ones that's a little tricky but people know it when they see it, right?

So far we're seeing that this general framework for thinking about metrics applies quite well in AI. I'm also seeing that we should probably look at adding onto that with something like trust or acceptance, right? Which has always been important.

I did early work with these admins who were working in very high complex, high risk situations, and there how much they could trust a tool or trust the output or rely on it was top of mind because if you make a mistake, you bring the entire site down, right?

You break the internet. Here, I don't want to say that it wasn't top of mind, it wasn't in the realm for developers, but like a compiler just compile, right?

Your test suite worked or like you kind of knew it was a bunch of flaky tests, so it's fine. You just like resubmitted an hour or something, right? Which is not great.

But now when we're being presented with dozens or hundreds of lines of code or recommendations that, you know, really are non-deterministic, whether or not we can trust something is top of mind because it will affect my minute, my hour, my day, my next weeks.

You know, if I accepted something it wasn't quite right, I have to go rewrite the whole thing. I have to refactor everything.

So I think that is another area that's surfacing even more than I think most people know.

Researchers are going to have a good debate about this 'cause we've always known it's important.

Rachel: But it will mess up the lovely acronym.

Nicole: I know.

Rachel: SPACE it. To SPACE. That's a perfect lead into my next question.

What risks do you see in the widespread adoption of gen AI and how might we mitigate those risks?

Nicole: Oh gosh, I think this could be its own discussion.

Rachel: Yeah.

Nicole: In part because, you know, this is such a big conversation right now among folks and what are these risks?

And I think we could break these down a few ways. We can break these down, we can think about them at the detailed level and we can think about them at the higher level.

So at a detailed level, you know, is AI introducing quality problems, right? Do we see it missing security incidents?

Do we see it missing patterns that it just hasn't been trained on, right? And then, you know, the risks then also fall on through the rest of the SDLC, right?

How do people review this code? Do they trust it as much? If we see a similar pattern to, you know, write prints, can people just know if it's different?

And then will they treat it differently? Because we already know that there's bias in the review process, because I mean, there's bias in people.

So I think there's risks kind of throughout that chain. There's another risk around once we get to a point as we approach a point where most of our code is written by machines and not with humans, what does that mean from a training standpoint?

How is that changing what our models will in turn recommend and suggest. From a higher level, I think there's this question that we kind of alluded to earlier.

There could be a risk that we lose some of the field for our systems. What it means to be a senior engineer could change drastically. And we may have an increasingly small amount of people who really truly understand how these systems work.

I mentioned I worked in mainframes early in my career 'cause I was going through my first couple years of school as everyone was preparing for Y2K. And so people were like, "How do you know mainframes?"

I'm like, 'cause they were teaching everyone mainframes because we thought the computer world was going to end. But now if you look back, there are almost no cobalt programmers or AS400 programmers.

Rachel: But the big iron's still out there. It still needs to be maintained.

Nicole: It's still there. I occasionally will get an outreach from a recruiter asking me to come do some AS400 development. And I'm like, "Ah, no."

Rachel: It's running the general ledges of our banks.

Nicole: It really is. I'm like, "It's been years." And they're like, "Everyone else is retired, like gone, gone retired or passed." We don't have anyone who knows these systems.

Now, luckily I think this is a unique opportunity for gen AI to come in because it can help transform some of those systems or build connectors to systems that like we don't feel comfortable changing.

We can ask it, right? It's very, very good at kind of projection thinking in some cases.

But if we kind of make that corollary, what happens if 20 to 30 years down the road, these are so integrated and that's all we use and we lose our mental model for the system?

I always make this like handset, like the feel, like when I was coding a lot, I knew what the system felt like, I could go draw it on a whiteboard and it wouldn't be like documentation style, like textbook style, detailed, but it would be enough that I could point to things, and I could walk people through it and I knew what the system was doing.

That's also a risk.

Rachel: Yeah, I love to think of software as archeology. We just like build on top of the previous generations.

But that creates the need for archeologists who are able to derive floor plans from like a few stones and a few lines.

Nicole: Yeah. Absolutely.

Rachel: What are some of your favorite sources for learning about AI?

Nicole: You know, I think one of the exciting things is that things are just moving and changing so quickly.

And one of the challenging things is that things are moving and changing so quickly.

Frankly, sometimes it's Google or Reddit or Hacker News or something like that.

I try to follow, you know, a handful of folks and see what they're posting about, see what they're talking about. I try to have some alerts up.

And I will say, personally I like looking at this from both an engineering standpoint and PM standpoint, because I've gotten some really great insights and ideas from PMs who are very technical, but they think about things a little more holistically sometimes.

Or holistically from a product point of view, where an engineer will think about things holistically from a backend system complexity point of view.

Rachel: Yep.

Nicole: And I like having both of those perspectives to help me kind of shape where I focus some of my time digging it, because again, there's there's so much out there. I run out of time.

Rachel: Yeah, product is the great, it's kind of the new sys admin in the sense that these are people who are completely essential and are somewhat undervalued by the industry at large. I'm a big champion of like, give products better analytics and give them more respect.

Nicole: Absolutely. I mean, a great PM is worth their weight in gold, right? Because they're helping you think about the strategy of a product.

They're actively looking for risks, competitive threats, whether it's like competition or just emerging tech or just the system breaking.

And, you know, I think it can be challenging 'cause a PM who's like not doing the role that I would think of is PMing, can be challenging to work through, right?

'Cause now you have more meetings, but a really, really good PM is just magical.

Rachel: Yeah, they're catalytic in the organization.

Nicole: Yes.

Rachel: Nicole, I'm going to make you god emperor of the solar system.

For the next five years everything goes exactly the way that you say it should.

What does the world look like in five years?

Nicole: You know, I would love for, you know, generally things to be a little less chaotic, less chaotic, you know, more communicative.

From a technology standpoint, I'm really excited to see what happens because, you know, some of the things that were just unheard of just a couple years ago, we can now accomplish in a matter of weeks and months.

And so I would love to see a world where technology is not a barrier for most people.

Where we continue to unlock not only the ability to do more complex things, but also the ability to do them with less, right?

How can we make some of this work sustainable? Sustainable from a human standpoint, but also sustainable from a power and environmental standpoint.

Rachel: Yeah, it does feel like the early days of the web. I mean, like you, I came in because I knew how to hit view source and understand HTML.

And that formed the bridge from an education and English literature to a career in tech.

It does feel like that kind of moment where suddenly people from all backgrounds have an opportunity to get a foothold and build something.

That's what's got me really excited about AI.

Last question, my favorite question. A "Generationship" is a star ship that goes on journeys longer than a human life so that multiple generations grow up inside the ship.

If you had such a ship to the stars, what would you name it?

Nicole: I would maybe call it something like Bridge, right? Which is incredibly simple.

It's easy to translate across languages, right? Because hopefully there will be a lot of different cultures, and languages and people involved.

And I think that's one of the things that would be exciting is it could be a bridge to, not just a bridging connection to another planet, another solar system, another, you know, far away place, but also bridging people, right?

Bridging ideas, bridging cultures.

Rachel: I love that. Nicole, what a joy to finally have you on the show. Thank you so much for your time.

Nicole: Thank you so much for having me.

Content from the Library

Visit library

Apr 24, 2025

Podcast

Open Source Ready Ep. #12, Exploring Flox and Nix with Ron Efroni & Ross Turk

In episode 12 of Open Source Ready, Brian and John welcome Ron Efroni and Ross Turk from Flox to explore the world of Nix, a...

Apr 22, 2025

Podcast

Platform Builders Ep. #4, Building Affinity: From College Dropout to SaaS Leader with Ray Zhou

In episode 4 of Platform Builders, Christine Spang and Isaac Nassimi interview Ray Zhou, co-founder of Affinity, about his...

Apr 17, 2025

Podcast

Generationship Ep. #34, Together with Nathen Harvey

In episode 34 of Generationship, Nathen Harvey brings data, humor, and heart to a conversation about AI, DevOps, open source, and...