OCT 20, 2021

76 MIN

Ep. #8, Defining the Data Scientist with Josh Wills of WeaveGrid

GuestsJosh Wills

light mode

about the episode

In episode 8 of The Right Track, Stef speaks with Josh Wills of WeaveGrid. They address common misconceptions about data and product analytics, tips for hiring data product managers, and many invaluable lessons Josh has learned from his time at companies like Slack, Google, and Cloudera.

about the guests

Josh Wills is a software engineer at WeaveGrid with a wealth of past experience. Some of his previous roles include Director of Data Engineering at Slack, Director of Data Science at Cloudera, and staff software engineer at Google.

show notes

about the episode

about the guests

show notes

transcript

Stefania Olafsdottir: Hello, Josh, welcome to The Right Track.

Josh Wills: Hi, Stef. Thanks for having me.

Stefania: So honored to have you. Been following you for a while and it's ... You always have good juicy hot takes.

One of my favorites is a pinned tweet on your Twitter profile, which is your thought from 2012 actually, on the definition of a data scientist which I relate heavily to which is great. We'll probably touch on that later.

I'll ask you to define that then and now.

I think it's a long-lived definition.

Josh: Yes.

Stefania: Josh, the prophet, could you kick us off by telling us a little bit about who you are, what you do, and how you got there?

Josh: Sure. My name is Josh Wills.

Right now I am a software engineer at a company called WeaveGrid, and we do managed charging of electric vehicles for utilities.

Like a lot of people, I have been feeling, for a few years, that I should do something to work on climate change.

And as a sort of historically a data person, it was tricky to figure out what exactly I could do besides making coffee drinks for scientists, building batteries, or something like that.

But that's what I try to at WeaveGrid now.

Before that, I used to work at Slack, so I was at Slack for about four years, and I was Slack's first director of data engineering and helped build the team, and a lot of Slack's very early 2015 era data infrastructure.

Before that, I was at Cloudera where I was director of data science, and talked a lot about--

This was sort of, was at Cloudera from 2011 to 2015, going through the whole big data science hype cycle. Got to live that whole thing.

Went around and mostly talked to people and told them how to use the dupe and how to do sort of again, early pre-deep learning machine learning and stuff like that.

And before I was at Cloudera, I did four years at Google. I was hired actually as a statistician.

My master's degree is in operations research, and Google hired me to be a statistician to analyze their auction system which is what I studied at grad school.

But as soon as I got to Google, all I did was write software.

I never would have made it past the Google software engineering interview, but as soon as I was in the door there was nothing stopping me from writing software all the time so I did and it was great.

And so I wrote a lot of software at Google, a lot of it for ads.

I became very much enamored with Google's experimentation framework and experimentation culture, and so I wrote the Java version of Google's experiment library.

It's still used today, which is crazy to me.

I once asked a Google friend-- There are still six to-dos with my name next to them in Google's goBIDs.

So I've been gone for 10 years but I think if I go back that's the first six things I'll have to do.

I've never ... I should say, there might be Google recruiters listen to this, I'm never going back. Never, ever, ever. Stop asking, leave me alone.

And that's me. I've been doing data stuff since I graduated college in 2001 and I had a five-gigabyte database. That was huge in 2001.

Stefania: That was pretty big.

Josh: It was a big deal. I had a lot of test data back from doing-- Testing microprocessors at IBM. That's me.

I don't know. I've done some things.

And then yes, the tweet and I've given talks and stuff like that.

Stefania: You profited the definition of a data scientist.

Josh: Yes.

Stefania: This is still a hot, hot, hot topic. What defines--

Josh: Is it?

Stefania: I think so.

Josh: It's funny.

I mean, I wrote the definition because in that time in 2012 or whenever that was, there was this ongoing debate of data scientist, not a real thing.

Have all scientists used data? With data scientists, it's a bunch of nonsense.

What does that even mean? Or what is it? A data scientist is a statistician who lives in San Francisco. I think that was the other definition that was popular at the time.

Stefania: That's good enough.

Josh: Which was true. It was probably ... At the time, it was honestly probably a better definition.

Stefania: For listeners-

Josh: Yes.

Stefania: Let's read the definition. I don't know if you have it top of mind. Would you like me to read it?

Josh: Oh, God. Would you read it for me?

I would say-- I try not to think about that thing so much anymore.

Stefania: So data scientist, a noun. A person who is better at statistics than any software engineer, and better at software engineer bring than statistician.

Josh: That's right. That was in the 140 character days.

So I put that end thing there to say, "I am defining something like a dictionary."

It was only-- I only had 140 characters and I think I used just about all of them.

That is a reference to a writer who was part of the Algonquin Round Table in New York in the '20s, '30s, '40s, whatever name.

I think it was-- His name is A.J. Liebling I want to say.

And he said, "As a writer, he was faster than anyone better and better than anyone faster."

Stefania: Interesting.

Josh: Yes.

Stefania: It'll take some digesting to take that in.

Josh: Yes, indeed. Indeed. And that was sort of how I thought of what it was to be a data scientist.

It was like you are better at statistics than software engineers are, but you are better at software engineering than statisticians are.

Stefania: As an input into that, I kicked my data science career off before it was called data science.

When I joined a genetics company and I was hired as a statistician, and I spent the majority of my time writing software.

Josh: Yes, been there.

Stefania: But would have never qualified for a software engineering interview.

Josh: Exactly.

Stefania: So I heavily relate to this definition.

Josh: Very much so.

Stefania: However, also later when I was running data science at a company called QuizUp, it was something that I put a lot of effort into.

Asked everyone to read the book Clean Code, and things like that so we would at least be trying to build best practices after software engineering despite having no experience or knowledge of the subject.

Josh: Exactly. Google made me a software engineer like I said.

I'm actually reasonably confident I still wouldn't pass a Google software engineering interview, but it was an amazing education.

And I was-- I studied math, and statistics, and optimization, and stuff in school so I did very little bit of programming here and there but didn't really know what it was to be a software engineer until I got there.

Stefania: A little-known fact about mathematics is how little calculations you actually do.

Josh: Very little.

Stefania: When you're studying mathematics.

Josh: I say, rarely ever interacted with a number, to be honest with you.

Stefania: Exactly.

Josh: I didn't see numbers ever, at least not the math I was doing.

Stefania: People, please stop asking us to calculate things quickly in our mind.

Josh: I know. Or, calculating the tip at a restaurant or whatever.

It's like I don't know, just get a calculator leave me alone.

Stefania: Exactly.

Josh: If you have a differentiable manifold or something, I can help you out.

Stefania: Exactly. Is this a group?

Josh: Yes, precisely. Exactly.

Stefania: Awesome. So thank you for that intro, Josh.

Josh: Sure.

Stefania: Do you think this definition of a data scientist still holds up?

Josh: I do. I think it's still popular and still cited because again, I don't know that anyone's done a better job of expressing the idea in fewer characters I guess. It's pissy.

Stefania: A little bird whispered to me that this is actually also been cited in official research papers.

Josh: I know. It is actually cited in official research papers.

Like most people, obviously, I Google myself all the time, and mostly my Google scholar citations and stuff so I do track these things for this.

When I was at Cloudera, my colleagues and I wrote a book about doing advanced analytics with Spark and that's-- It still beats the quote for citations, but pound for pound, whatever citations per character it's tough to beat.

Stefania: So I agree. I think this holds up. Another follow-up question off that.

Do you think this is a helpful title for people? For job descriptions?

Is it helpful for people to understand what your job is going to be about if someone is advertising for a data scientist?

Josh: I think it depends on-- Is it helpful for the job seeker?

It's helpful in so far as if they want to identify as a data scientist to the extent that they--

A lot of people's self-identity is wrapped up in their jobs and they want to be a data scientist then I think it's incredibly helpful for the people who are trying to hire people.

Whether the title is actually descriptive of what your day-to-day life is going to be like if you have that title across a bunch of different companies?

Well, probably not so much.

I think talking about more the modern data stuff, right, the new data scientist title is analytics engineer, that actually is useful because it's like if you're an analytics engineer well, you're going to be building models in DBT, that's what you're going to be doing.

It's a title everyone wants and is actually descriptive of what you do. Data scientist, title a lot of people want, descriptive probably not.

Stefania: No. I think one of the challenges is there are a lot of people that want to become a data scientist and then they join a job and the expectation versus reality of what you-

Josh: It's a blow. It's a blow. Exactly. It was better.

It was better for us, Stef, back in the day before the title interested, when it would just-- It was listen, I'm going to be a statistician but I'm really just going to write code all the time.

That's what I'm actually going to do, right.

Stefania: I think also maybe another way of how that has changed over time is, back in the day when we were starting off, I think the shaping of what companies were actually looking for in a role like that were--

They were just so short along if you could say that. And so we had the opportunity to just shape it.

Josh: Absolutely. I remember I actually had this happen in my-- I first interviewed for Google.

We should have a whole separate podcast, Stef, of my various terrible career decisions.

I first interviewed at Google in 2005, and I interviewed with the woman who eventually became my boss whose Diane Tang, who's a Google fellow and just the--

Hands-down best boss and this bad-ass engineer I think I've ever worked for.

And she interviewed me back in 2005, and she's like "I don't ... I'm looking at your resume, and are you a software engineer or a statistician?"

And I was like, "Well, why can't I do both? I don't, I'm not sure, I'd like to do both. Can I do both?"

And she was like "Sure. You can do both. Okay, come work at Google."

And I was like 2005, Google stock price is, I think it's $150 a share, that's way over-priced.

It's way, and search who's going to need that anyway? Passed on that job offer.

And she was kind then took me back a couple years later after I realized my mistake. But anyway.

Stefania: And it wasn't too late.

Josh: That's right. It turned out to be fine.

But I think the first title for equivalent of Google was-- It was decision engineer or something like that, which may be...

Which in a different world maybe that would've been the data scientist title.

It could've been decision engineer.

That's actually ... That's not terrible, right.

Stefania: I agree. So I want to ask you the classic question that I like to ask all of my guests.

Josh: Sure.

Stefania: Could you tell us an inspiring data story and a frustrating data story?

Josh: I think back to my younger, t his is all the old-timey "back in my day" podcast.

I remember being first exposed to experiment-driven developments at Google.

Where places I'd worked before, smaller companies, we did analysis.

We would analyze data and do simulations or projections or whatever and we would try to figure out what was going to happen and try to figure out is this idea any good?

Should we invest in this or not or whatever?

And then when I got to Google, I started doing that for the ad auction that I worked on. I was like okay, I'm going to run this simulator, I'm going to replay millions, billions, trillions of auctions and under different rules, and see what we think would've happened, and make all these assumptions and stuff like that.

And I'm doing this work. And then this--

One of the senior, more senior leaders of my team takes me aside and is like "You could spend all these weeks and thousands, millions of computer hours running simulations, or we could just implement the idea and then just run it as an experiment on a million users or whatever and just see what happens. And we have all this machinery built to automatically analyze the experiment, and compute confidence intervals, and calculate all these metrics, and all this stuff, and you can just do that."

At Google, the rule at the time was, if you could convince one other engineer that your idea was worth trying, that was all that was required to run an experiment on a million people and that was it.

And it was amazing, Stef. It was honestly, it was an absolutely amazingly wonderful way to work.

I loved it so much. It's cheesy to say, but the democracy aspect of it was delightful.

And so many-- Obviously, there were tons of terrible ideas that went nowhere, right, but so many great ideas that no one would've thought of or considered or whatever, right, but just try it and see, and just find out.

And everyone is equal. No one's analysis gets a better.

The whole system is automated, it's there for everyone. It was great.

Stefania: The data speaks for itself.

Josh: It was great.

I think whenever you are working in environments where data is basically directly turns into money, and that could be fraud, it could be, obviously, ad tech, a lot of recommendations stuff, any of these things where it's just really obvious connections, that's how you work.

It's not about politics. It's not about blah, blah, blah. It's just about, does it work?

This is how we measure it. We do it the same for everybody. That's it. That's the rule. And I loved it. It was great.

And I wanted more than anything I think after doing it for a few years, I wanted everyone to have experience with working like that.

I wanted it to be that it would be that way everywhere. So that's the inspiring story.

The disappointing story is the rest of my career where I regularly failed to actually do that.

Really anywhere else I've been.

Stefania: Oh, my goodness.

Josh: I try.

Stefania: What you said about the experimentation, it's a really beautiful framing actually.

And it's a notoriously difficult thing to plan and estimate software delivery.

Josh: Yes, absolutely.

Stefania: And I gave that some thought a while back and, obviously, continuously it's a thing that you notoriously always think about. Why is this so difficult?

Josh: Especially if you're like the CEO of a company, it's got to be especially frustrating.

Stefania: So I started comparing it a little bit with the process of building a dam, for example.

Josh: A dam. Okay.

Stefania: Or some sort of a huge operation where you-- There's no undo button.

It's just you build it and it fails or succeeds. It works or it just breaks and it's disastrous.

And so that's why you don't build dams before a 15-year planning process.

But this is in fact the beauty of building software. There are undo buttons.

And, of course, if you make the wrong BATs it might cost you things.

It might cost you revenue, and people getting frustrated and things like that, or your customers get frustrated.

It has cost but it doesn't have this extreme cost of no undo button.

Josh: That's true.

Stefania: And I think the experimentation mindset is sort of also a symptom of this opportunity that we have in software.

Josh: Yes.

Stefania: We do this because we can.

Josh: Yes. And I really got to live that. I once personally cost Google about $2 million with a bad experiment I launched.

It essentially disabled the ad system for a few hours and I never forget. The VP, whatever, took me aside was like, "What lesson did you learn from the $2 million we just spent to teach you?"

Right. Whatever. It was great.

I cost Google tens of millions of dollars in the process of making them hundreds of millions of dollars, and it was just part of the culture and it was fantastic.

It was just so, I was just, anyway. It was so freeing it was so fun.

Back in the day. Oh, well.

Stefania: Back in the day.

Josh: And it's been a long time since then, yes.

Stefania: All right. So do you have any frustrating data stories top of mind, specifics?

Josh: So many frustrating data stories. I mean, where to begin.

So many frustrating data stories. Let's see.

I sort of did this stuff at Google for a while and then I left Google when Google was getting into like Google+, and social, and stuff like that, and relentlessly chasing Facebook.

And this guy I knew named Jeff Hammerbacher, who was, as you know--

To tie this back to the first thing is, the person generally attributed with coining the term data scientist was working at a company called Cloudera, and he and I were friends, and we would go to baseball games together, and he was--

In the way that all relationships in San Francisco are somewhat transactional we're always trying to hire somebody or invest in somebody or whatever, he just asked me point-blank, "Do you actually care if Google or Facebook is the dominant social networking company in the world? Does that actually matter to you in any way?"And I was like, "No, actually come to think of it about I don't. I don't actually care."

So I went to Cloudera to work on something I did care about which was making it possible for everyone in the world to work with data the way that we worked with it at Google.

That was something I cared about. And so I did that for a few years. It's hard.

In fact to have to be doing developer relations stuff, and so you're evangelizing, and you're promoting, and it's great and it's very rewarding and I'm grateful for having done it.

At the same time, I'm flying 125,000 miles a year.

You're always removed from the work.

You're just advising people, you're consulting, but you're not really owning it and stuff like that in any way so it's hard to feel the impact.

And then when my son was born I really just wanted to stop traveling altogether, and so I went to Slack to build ... I think I've said this before.

My goal in joining Slack was to build data infrastructure that Google engineers would be jealous of. That was my goal, right.

Stefania: I like that.

Josh: That was my revenge-driven product development philosophy which is not great for various reasons.

But in the last-- That I wanted to go do this again. I wanted to build stuff.

I was tired of talking about building stuff, I wanted to get back to building stuff.

And it's a tough one for me, Stef. It is. It's a hard one.

I'm very proud of 85% of the decisions I made in the teammate building Slack data infrastructure.

I made again, a small number of absolutely terrible, horrible decisions that still inflict pain and suffering on everyone at Slack to this day, because, right, I'm not perfect and stuff like that, right.

And I think there's no shortage of these people on Twitter.

You can easily just go ask them and I think that was--

Again, it's actually the genesis of this conversation was a whole thread about logging and mistakes I made at Slack.

They'll, obviously, never forgive me for but that's fine.

But building data infrastructure is not sufficient to create a data culture at the company.

You can build the world's greatest data infrastructure but if it's just the company's not there yet, if it's not time, if you don't have the people in place, if it's not top of mind, whatever, it's just not going to happen and I didn't understand that.

I was very much in the if you build it they will come mentality and it's just not true. It doesn't work that way.

Stefania: The world's worst myth for any founder.

Josh: I mean, it's so true. I mean, very much so. Very much so.

And so the hardest thing for me was I killed myself for a couple of years building this stuff, and was so burned out, and was so drained, and was so tired of fighting, and was just honestly not a great middle manager or manager or anything really.

It was just horrible. Then I left.

And the nice thing was that Keith Adams the Slack chief architect said something to the effect of "Josh is far too good of an engineer to be wasted in management," which was a nice way to say it, that I was not a very good manager but nonetheless.

I think what was hard for me was Slack did eventually become the data-driven company that I had always hoped that it'd become, but it happened after me.

And it's the same thing in startups.

Being too early is the same thing as being wrong and so it's this bittersweet thing for me.

I'm very proud of everything the team did, but will always forever be somewhat sad that I couldn't really be a part of it in some way.

Stefania: To reap the benefits of what you had sort of really got a good foundation for though.

Josh: Indeed, but that's the way it goes. Such is life.

Stefania: I personally relate a little bit to this feeling after having built the data science division at QuizUp.

It was a three-year period and I'm likewise very proud of all of the work that we did.

I definitely did get to see some of the benefits, but the company was acquired and I started working on other things sort of exactly around the moment where it would've just been like a really sweet sailing type of thing.

You're like oh, wow, we've built something so cool and it's going to be so much fun to participate in that.

Thank you for sharing this. This is a really great story.

Josh: Sure.

Stefania: And I think-- And I hope actually it's an inspiration because I think this is a shared experience for many people that are the first employee, first data person, the founding analyst, or a founding data engineer role type of thing-

Josh: Absolutely.

Stefania: At a company that is learning and going through a transition of sort of wanting.

There is someone at the company high-level that wants to be more data-driven but the organization hasn't really learned how to.

Josh: Exactly. I feel like I should just be a therapist for these people full-time.

You know what I mean? It's actually probably-- But I'm not sure I could take that much like psychological pain and stuff on a regular basis.

It's just, we get together with old Slack people sometimes for breakfast and stuff and it's God, it's great seeing you all but oh my God, is it horrible talking about this stuff.

I just don't, I do not enjoy that at all.

Stefania: You're triggering me.

Josh: I am truly triggered. I have blocked this out and that you are making me remember it.

Stefania: We should start a therapy sort of agency type of thing.

Josh: Some safe space for this. Yes, exactly.

Stefania: And this is literally one of the things that drove me to start Avo.

Josh: I can only imagine. I mean, that's-- It seems like no better way to therapeutically work through your pain.

Or you could be like me, Stef.

You could just angel invest in people who are fixing it for you and then you don't even have to think about it.

Stefania: That's coming up next.

Josh: Okay, good. I'm so glad. I'm so glad.

Stefania: And I would literally was--

So I had a post-traumatic stress disorder trigger at a company that I started just after QuizUp.

And then later it was, I think Christmas later that year, I just was talking to all of these people and everyone was just so depressed and I was like I can't have all of these people that I really.

We must be able to fix this.

Josh: I feel you.

You're making me feel grateful about my current job because there's nothing at WeaveGrid where I'm like oh, I must go start a company to fix this.

I don't feel that. Things are, at least from the data research, I feel like things are actually, they're pretty good.

I'm just liking how things are going and it's pretty chill.

And I'm like this is all right. I'm okay with this.

This feels nice. It's very soothing. I'm in a good place. I don't know.

Stefania: We've come a long way.

Josh: We really have. It's only been five years really since it was just God-awful terrible.

I don't know. It's just so much better now.

Stefania: This is actually a really good segue into--

And we've already been touching on this. How the industry has been changing?

And I like to think about this in a fairly short-term perspective, just a two-year type of thing.

But, obviously, we're also talking about a--

More like a seven and 10-year type of thing so would love to hear your perspective on how the industry has changed in the past years.

Josh: It's funny. Right after I left Slack in November 2019, I did the Software Engineering Daily podcast and Jeff asked me the same question.

And it was interesting because I was talking about well, the big news in December 2019 before ... Shortly before the world ended, right, was really Snowflake.

It was like there's this thing called Snowflake and it's super great, and it just works, and it has solved all these problems, and it's awesome and you can build around it, and all this cool stuff happens.

And not for nothing, this was before Snowflakes IPO was mega crazy whatever, and all the investor interest picked up, and all this stuff.

I was just saying this was things in the data world pretty cool.

Not so bad anymore. And so, obviously, all that happened.

And then over the last year though there, I think it's ...

The really exciting thing is because we have this plumbing now, because we have Snowflake is this center of gravity, we're now starting to build much, much cooler stuff on top of this underlying substrate, which is fantastic and is the dream.

And so really like we were alluding to it, but it's like--

It's really for me, it's we have built this plumbing ingestion.

Again, at Slack, ingestion, compute, visualization, we built all this stuff ourselves.

We built our own version of Mode. We built our own Presto S3 data lake thing because we had to.

There was, Whatever. We built our own ingestion system because we had to, right.

All this crazy stuff. You don't even have to do that anymore.

You can just swipe a credit card and Fivetran, Segment, Snowplow, Rudderstack, Meltano, Airbyte, whatever you want. It's done. It's done.

You configure it, off you go Snowflake, you got as much compute capacity as you want.

I tweeted the other day, I loved it.

I was like you can, I mean, just the genius of Snowflake as a business.

Making it possible for anyone who has a question to just throw as much money and compute power as they want at that question, I was like that's unbelievable.

That's a license to print money. It's ingenious, right. Anyway. And then the visualization side of things like Looker, Mode.

Even Tableau, bless their hearts has finally gotten with the program, is building a better product.

It's just-- We've got the end-to-end.

You can just wire this up and it's fairly standard, well understood, problems are largely solved, and now we get to start building on top of that and we get to start--

And again, where I'm super excited about DBT is my proudest angel investment by far, is the core semantic layer where you can buy--

Or, you don't even buy just use open source DBT packages that implements stuff on top of the Stripe data API that you pull in from Fivetran or like whatever.

This is all just done for you and you-- There's just the community is great.

The data community has honestly always been great, it's always been awesome but it's just like that much better now and stuff.

And so I'm excited to see this emerge and I can't wait to see how this percolates out through ingestion all the way through visualization at the end.

This is the dream. And again, it's-- I think back to when I started doing this stuff in the late '90s when you had to pay Teradata an enormous amount of money for this box that had limited capacity and stuff.

And God, kids these days, I tell you, Stef, they have no idea how good they have it, right.

Stefania: That's right.

Josh: It's just fantastic.

You take these sort of layer underneath you and infrastructure for granted now, and you are just mostly confronting the problems of building the semantic layer and it's hard and stuff.

And we still spend plenty of time on Twitter shit posting about it but nonetheless, we're just in a vastly better place.

And I just ... Oh, it's just so great to see.

Stefania: I totally agree. Obviously, this is super exciting to see all of this, these changes happening.

And particularly what you're saying about we have the plumbing now and now it's time to also start finding ways to make the other schleps of our lives easier which is also exciting.

Josh: Absolutely. It's just absolutely so cool.

Stefania: So on this note, I definitely also want to talk a little bit about how you think data culture is changing and how the industry is changing with organizations and we'll touch on that a little bit later.

Because I think with all this accessibility and making our lives easier, I wonder what you think about probably one of the most common statements you ever hear, I don't trust this data?

And how is this changing with all of this? Why do people not trust data?

Why do people say this? How do we solve it?

Josh: Why do people say that?

I was talking to the head of data over at Convoy a few weeks ago. Chad Sanderson is great.

We were talking about the metrics layer and stuff like that so this is new companies.

And again, part of this whole Symantec revolution, building metrics, and stuff like that.

And Chad said, "The consequence of the fact that our data infrastructure and our plumbing is so good is that it is cheaper for me to come up with my own metric or ask my own question directly of the database than it is to invest the time and cognitive energy in understanding your model and your definition of daily active user and your definition of blah, blah, blah, blah right."

It's just too cognitively intensive to do it and so I'm better off just asking my own custom bespoke special Snowflake question.

This is the consequence of the stuff being so good we now have this new, weird problem as a result of it being so good, right. And I think that's what you see here and the data culture stuff is ... This is the problem we're addressing right now is this fragmentation and the executive team at these data-oriented companies being like why are there 16 different definitions of DAU for every part of the company?

What the hell is going on here from a data culture perspective?

If you're talking about companies that have already--

That have a data culture and that do stuff around data, that I think is the most interesting problem right now and most fascinating consequences of this, and it's just neat to see the ways in which we're--

Folks are approaching, and solving it, and stuff.

If you're talking about the genesis of data culture, of how does a company transition from the early stage, vision-driven, just crank on the feature to the Google, Facebook end state where everything is an experiment and we experiment on everything all the time, right, that's a fun one and that's a whole sort of separate thing.

Stefania: I totally agree. A huge reason why people don't trust the data is that they don't have transparency into what the data actually means.

Josh: And all the different layers of where did it come from and what transformations have been applied to it?

And who did that? And exactly. All of that is invisible these days. I don't know.

Stefania: And I think that you've hit a really interesting point there. Or Chad-

Josh: Chad, yes.

Stefania: Hit a really interesting point there-

Josh: He did.

Stefania: Saying that it's easier for us now to just try to go to the premise, the prerequisites of the world, and what can I build on top of that to answer my question?

But it causes another problem which is now I am seeing a bit of conflicting results for where we stand with our go-to-market or something like that.

Josh: That's right.

Stefania: It's really interesting.

I would love to hear that perspective of data trust from you as well because I know you built a lot of infrastructure at Slack around the logging.

Josh: Yes, we did.

Stefania: And there are so many different teams that work on that, obviously, because every product team needs to think about what they implement for their product and it's quite difficult to get that unified across everything.

Can you maybe talk a little bit about that as well?

Josh: Well, it's funny.

When you asked me the question I-- We were just talking about modern data semantic stuff and so I immediately went to this.

Even if you get to the point where you have the underlying data in the same place, maybe I don't trust the cognitive infrastructure that's been built on top of it, that's the part I don't trust.

Then there's the problem of I don't actually trust the data you generated.

I don't trust the source of this or the providence so I have no idea where this came from.

I didn't do this. And I want to tie this around to the companies making the journey to getting started with data, right.

The two places and the only two places I think that it really makes sense to get started with data are growth and performance.

These are real, this is the core-- Any company you have anywhere ever has growth problems, they need to drive revenue.

And they have performance problems because computers are terrible, software engineers are bad at their jobs, all that stuff, right.

And then also if you're lucky sometimes you have a machine learning problem and then you have a machine learning team, right.

And for the performance engineers and the machine learning engineers, I think the sort of data ...

I don't trust this data problem is easier if only because almost always the person who generates the data is the person who consumes the data.

The person who wants to understand the performance problem or build the machine learning model does the logging and the instrumentation to get the data they need to understand the performance problem, right.

And they understand the problem, they understand how the data was generated, they know what they want to do with it.

If it doesn't conform to their expectations they have again, a good prior.

All this great stuff. So they have all this sort of-- Again, I need to think of a catchy data science definition for this sort of cognitive superstructure I keep talking about.

That it clearly deserves some pithy little whatever.

So that's great and that solves all of these problems of trusting the data.

The growth team though it's probably not necessarily the case.

The person doing the instrumentation, doing the front end logging, doing all this stuff is probably not the data scientist who is going to be analyzing all this stuff on the back end.

And so you have this crazy disconnect between these two worlds, and inevitably--

And it's just a bad situation I think generally when the person who generates the data is not the person who consumes it or vice versa.

And that's really where you need a whole process and system to manage this relationship, be very explicit about things, make it easy to visually confirm that the log event is in fact the log event that I thought it was, and all that stuff, right.

So in performance and growth-- Performance slash machine learning you have one data culture where I don't think data trust is that big of a problem.

And then in growth, which is arguably more important than performance really for almost all businesses, you have the problem and to the nth degree massively, especially when you're talking about product-led growth these days.

Your whole growth strategy is really around instrumentation as a product. It's a massive thing.

And without some structure, process, tooling it's just basically impossible. It just can't be done.

Stefania: Incredible. So we're talking about this in the context of data trust, but it's also just such a fundamental piece in how different teams view data.

Josh: Yes.

Stefania: So when your job description is a software developer and you build products, and then you typically also work--

Or at least you touch base a little bit with the growth team or a product manager whose job is also to be a data consumer.

And here you're framing fundamentally how important it is for data producers and data consumers to work closely together.

Josh: Yes.

Stefania: I remember you talked a little bit about in a previous conversation we had, about sort of the different teams at Slack, for example, and how this worked for them.

Josh: Yes.

Stefania: Can you maybe talk a little bit about how did these teams who are the data consumers and the data producers, did not maybe share a room or a kanban board or a goal?

Josh: Right. Literally a head. Literally, Ideally they share a brain.

Stefania: Exactly.

Josh: The consumer and the producer were exactly the same person. This was the ideal.

Stefania: Exactly. And that's not going to be realistic for so many situations, for them to literally share a head.

But how can we solve this particular part of the problem for the teams that don't share a head between the data consumer and the date producer?

Josh: It's a great question. I think there's two sides to it.

There's the process side of it and then there's the incentive side of it.

And I think one of the unfortunate consequences of my stint in engineering management is I'm always thinking about incentives and promotion processes for engineers and stuff as I think about things because I spent a lot of time thinking about this for growth teams, right.

So process-wise, you really can, I think it's a fair point that as I think about JIRA or whatever, it doesn't have a great way I think of representing the logging task correctly.

You know what I mean?

Maybe there's a way of doing this and I'm just not good enough at JIRA.

But at Slack, we just have a spreadsheet, right, and it's like we're developing this feature within the logging--

Evolvable logging framework that I developed at Slack.

The clogs and the slogs and all the unfortunately named thrift schemas as I had.

There would just be a list of these are the events we're going to create, these are the fields we need for each of the events, this is where they have to happen in the code, and it's basically a checklist and we have--

We develop, again, it's like a PR, a ticket like anything else, you develop the feature, you do the logging for the feature in line for it, and then we test it.

Have dev tools to help support it, logging-- Testing to make sure the logs are coming in correctly. Blah, blah, blah, blah, blah.

And then downstream of that we start building it. And I think the growth team at Slack developed this process through the painful process of not having this process.

And one actually situation I remember was absolutely disastrous was with the-- And Stef, I know you'll appreciate this.

When you're at a high-growth company, every I don't know three to six months there's some crisis, and so your time there is largely defined by crises.

And so at Slack, at one point, there was the mobile app crisis.

I don't really know how to describe it. It was a fairly classic thing.

They did some engineering reorganizations where they broke up the dedicated mobile development teams and they embedded them in all the feature teams that did web stuff as well.

This is like a very common idiom.

And the consequence of this was that the core mobile code base basically went to shit because no one was responsible for maintaining it anymore.

And then after a period of months, the whole app was just an absolute disaster and so they declared a fire drill.

And our mobile logging stuff had, for reasons that are whatever, organizational, political, engineering, whatever, did not have our good schema ties logging.

It was just JSON lawlessness basically.

And so it was this absolute frantic disaster to instrument this thing as fast as possible and get the wall-- Get the app out there and stuff.

And, oh crap we put a typo in here, and oh no, we camel casted it here versus here. No. What have we done?

And it was just the worst possible situation because it's like the data scientists, the data engineers were working on the highest priority, total visibility in the company, and the data is just terrible.

And you have the problem of well, once the app's out there it's out there forever.

We need to live with these logs for basically eternity, right. It was just horrible.

And I think it's the kind of thing Stef, where there's not really rewards for solving these problems ahead of time, but when you come after the disaster, and after you see the terrible, and you come up with this is the process, this is the system, and now we have schemas, and there are types, and everything's going to be fine, and we're going to test it all, that's the hero.

That's the true. Anyway. The tooling was okay, it wasn't great.

We didn't have-- We had developer experience engineers but we didn't have one dedicated to making the logging system better for the growth engineers.

And so you can have all the process you want but bringing this back to the incentives, doing the logging is not super fun for the front-end engineers.

This is not like a-- This does not spark joy.

They're not getting promoted because they did more logging better and stuff like that. Really you get promoted for developing good tooling or making stuff super cool or doing like neat front end stuff, and sadly like doing the logging is not that.

And so it's ... Again, it's ... We have to have a process and we need a process and the data is important and all that stuff, but at the same time it's hard to make it ever really as good as the pure joy and happiness of the machine learning engineer or the performance engineer who's in their loop doing their thing, thinking about how they're going to analyze this data as they're creating it.

It's the best. It's the best, Stef, it's tough to beat. And I'm asking you, how do we solve this?

How do we spark joy? How do we make this fun?

Stefania: That's a really great question.

So one of my favorite things to see when I'm working with organizations is the transition from developers seeing analytics as a task to be done that has nothing to do with the end user, into seeing it as one of the fundamental tools that they have so that they can build better products for the customers that they are serving.

Josh: Absolutely.

Stefania: And this I think has a lot to do with creating a look where they are part of product decision making and product strategy in some way, and they see that the impact that they have made by ensuring that we have proper understanding of how customers are using the things that they built-

Josh: That's right.

Stefania: And they see the impact of that applied to how do we decide on our next steps?

Josh: And that's a leap for a company too.

I mean, that's a leap for a company going from a what do you mean?

Or, we don't have OKRs, we don't have metrics, we just like ship features.

That's what we do. And that's what you're talking about.

It's not just about, I mean, I agree with you.

That is a leap, but again, I lived through that at Slack and it was unfun, it was a bummer.

I mean, the engineers again, like the certainty of I will get promoted if I ship features, I don't actually care about the impact of the features.

I'm going to get promoted if I ship features and stuff.

And making that transition to no, you get promoted for solving problems not for shipping features, right.

Stefania: Exactly.

Josh: Those two are not one and the same, that's a fun one.

No amount of money you could pay me to go through that again.

Only fun when I think about that.

Stefania: It's interesting.

One of my favorite metrics is literally foreseeing the cultured impact or just the team impact of transition to data culture.

It's literally just the conversion rate of developers looking at charts or asking data questions as a follow-up of them building something.

Josh: Totally. 100%.

Stefania: And it's really interesting.

I think one of the fundamental mistakes I often see organizations do is they try to do this as an organizational change for the entire organization as opposed to try to start.

And like you're saying, try to-

Josh: As a team.

Stefania: Exactly. They try to build-- Now we have an OKR so we have to measure all of the OKRs versus just hey, we just--

We have a BAT in mind and we want to just check that out, and then you get a full closed-loop and a case study for the rest of the organization for three or four people and they're happy.

Josh: Totally. I give Slack credit for that.

They did roll out OKRs team by team and it was an evolution and stuff, but it was also a thing that was told that was coming and it was met by some degree of dread I think for reasons again, I understand but disappointing.

I don't know. I've worked with a lot of companies when I was a Cloudera and it was always fascinating to me to see in whatever industry I worked in, what do people actually care about?

And this is where-- I don't know if I can say this but whatever.

The non-profit industry is in many ways the absolute worst about this.

Where for a lot of nonprofits when I talk to them, it's very clear to me that they don't care about measuring the impact of what it is they're trying to change, they care about making the impact look good so they can get more donor money.

That's what they actually care about. And it's very disheartening to find that in a lot of places.

A lot of industries where the incentives are not at all aligned with actually changing the problem, and that's the same thing we're talking about here.

It's when you're talking about shipping features, you're basically assuming that these features are going to solve the problems the features are intended to solve and that someone else has done the process and is an all-knowing oracle who can just predict ahead of time that yes, if we ship this feature it will solve all of our problems and stuff and, obviously--

And that's just not true for anyone ever under any circumstances, right.

And so I just-- I've never understood the appeal of doing something and not measuring the impact of doing it.

And maybe this just makes me too much of an engineer. I don't know.

Why would you do anything if you weren't concerned about measuring the impact of it? What is the point?

Stefania: Exactly.

Josh: What are you doing? Well, I'm going to ship the next feature that we need to solve the next problem.

But how do you know ... Because they said it would that's-

Stefania: Exactly. This is a really good point about ... And I want to tie this together with the experimentation-

Josh: Yes.

Stefania: The evolution into being an experimentation organization.

Josh: Yes.

Stefania: And I think trying to close the loop on-- Not even with an experimentation, just some simple way of measuring.

Did we see the ... Without too much over-engineering for the first iteration.

Don't adopt an A/B testing tool for your first thing that you want to measure.

Josh: Of course not.

Stefania: Did someone convert to doing the thing that we wanted them to do?

And do the people who do convert to the thing that we just built, do they maybe do something else now in the product as well?

And just be a little bit retroactive in that.

And I think that is a really good birth into being an organization that wants to make-- Or sort of be a little bit more experiment-driven.

Josh: Yes, it is. And I think the thing I console myself with, Stef.

And again, as the bittersweet aspect of doing the Slack experience, is that all companies reach this point.

They always do. They have to. If they don't reach this point they go out of business or they get acquired by somebody, right.

They all have to reach this point. Getting there is messy, and hard, and is different for everyone and stuff like that.

I used to start by asking, under what conditions would we roll this back? We're doing this thing.

Stefania: Right.

Josh: Under what would have to happen for us to say, "Oh God, what have we done? Let's roll this back."

And if the answer is nothing, there's nothing we can-- This is the right thing to do. Period. Full stop.

There are no circumstances under which-- Signups could go to zero we don't care, right, whatever, this is fine.

Okay, then do it and don't measure it, right. It doesn't matter.

But as soon as you know-- I find that at least at Slack, I had a lot more success...

And I've talked about this a few times. Is framing, experimentation, and metrics in the early days as a de-risking mechanism.

Not as a North Star. The North Star is your vision product manager, CEO.

That's the North Star. What does productivity feel like to you?

That's true vision. I'm sorry, I'm being sarcastic now. I'm making fun.

What's bad? What would badness look like? How would we know if something was bad?

How would we know if things had gone off the rails?

That's actually, think of it as, i t's an insurance policy. It's not a North Star.

It's not going to replace your decision-making ability.

It's an insurance policy. It's a check. It's a way of confirming-

Stefania: It's a regression measuring-

Josh: Regression measuring tool. It's that. It's safe.

It's not threatening. It's totally fine. And I found I had a lot more success with that.

And hopefully, none of those product people will hear me saying this.

This is the gateway drug to the purely metric OKR experiment-driven world that I'm going to pull them into kicking and screaming.

That's okay, they can always go to the next company.

They'll just go work at an earlier stage company and they'll be happy.

It'll be fine. But I do think it's the way. I do think it is-- It's the safe place to start I think for a lot of data teams.

Stefania: I totally agree with it.

Because it's not only always about starting the experimentation culture, it's also about what you're identifying and talking about right there is, not all things that you've built are necessarily--

You don't necessarily have to measure them as an experiment.

Josh: Of course not. Absolutely not.

Stefania: Back in the day when I was facilitating a lot of purpose meetings, as we called them, there was a way to bring the data consumers and the data producers together and talk about hey okay, what's the goal of this release here?

Well, based on that we should probably measure this and this and this.

Okay, based on that we probably need this and this and this data point.

I would typically throw in for the developers that, obviously, had objections about duplicating their code base for an A/B testing purpose, and they just wanted to iterate on the code and just ship things.

And I would typically try to just leave it up to them and say, "Okay, cool. I mean, are we concerned that our day one retention goes down because of this? Or is that"

And if they were, it would be their estimation.

Josh: That's right. Exactly.

Stefania: Awesome. So on this note-

Josh: Yes.

Stefania: I think you talked about the growth teams challenges.

And we've touched a little bit on data culture and how org structure impacts it.

And we talked about being the founding data person and all those things and how that should and can evolve.

Can you talk a little bit about how the org structure evolved at Slack and what were the--

Or at any company that comes to mind for you for this, and how you think a, p otentially ...

What was a good example of a team that managed to do this well for getting data to work with product and how could they maybe have done things differently if it wasn't good?

Josh: Totally. When I think of the companies that did this well, and I don't include Slack as one of the companies that did this well--

I think generally when I give people advice I don't ... I can't tell you what to do.

I can give you detailed instructions on what not to do.

I think of Airbnb as a company that actually, in my perception, got data right, at least organizationally for a long time.

And again, they made mistakes and they had pain and suffering along the way because that's just life, right.

Stripe also probably qualifies by this metric.

And I don't think it's a coincidence, these are both remarkably successful companies.

You can think of this ... Actually, with early Facebook and early LinkedIn too.

And the thing that all these things have in common was there was a person there early, early, early in the company deeply trusted by the founders.

Something like Riley, Airbnb. My friend Michael Manapat at Stripe. DJ Patil, obviously, LinkedIn. Jeff Hammerbacher at Facebook.

Where it had ... They had a relationship, strong relationship, with the founders.

DJ because they were all at PayPal together. Jeff, because he and Mark went to Harvard together. Blah, blah, blah, blah, blah, right.

A lot of trust there. A lot of delegation to that person, a lot of executive support, 100%, right, to build a strong and largely independent data organization within the company.

And also not for nothing, a business where-- Was Airbnb with search, Stripe with fraud, Facebook with ... Well, I guess ads whatever it is.

Where there's this significant data component where being super good at data is deeply important to the company, right.

These are the conditions I think under which data teams truly thrive for at least a while in this sort of independent function with that sort of strong executive support, right.

Now, this is not most places.

And this-- These are sort of magical conditions. You can't engineer this, right?

You can't put the CEO and the data person in a math class together in college, it doesn't work that way, right, and so most companies don't come up this way and Slack certainly didn't.

And as a result, data at Slack was sharded across every organization.

So the analytics team reported up through business operations, which is finance and G&A and stuff like that.

So again, you got to remember Slack's early product-led growth, our billing system depended on product usage and product analytics because it was you only paid for the seats you actually used.

And so okay, well what does that mean, right?

So the billing systems, the ARR or the MRR, all this early stuff was all done by the data analyst team off of read replicas of production data to figure out how to bill people because that was what you had to do.

The marketing team had their own MarTech Stack which was all just for them.

It was a lot of third-party services and stuff like that, right.

So I mean as-- Oh, and then I mean, obviously, customer success needed lots of data because Slack was doing very early major investments in customer success so they needed all of this data, right.

And then when things got big and out of control enough to the point that they hired me and hired a data engineering team, the company's already 250 people at this point.

Well, I'm an engineer and my title is Data Director of Data Engineering so I report up through the engineering hierarchy.

And, of course, the engineers need data too.

They have performance problems and stuff like that they need to understand so everyone needs data.

And so when I looked at the org and I looked at all the customers I would have for the data platform I was building, right, the only person all of these people have in common as a boss is the CEO of the company.

And that's really tricky.

If you find yourself in a position where you have all these different customers and this is the only person they have in common, you're fucked. I don't really know to break this to you. You should quit right now.

Because the thing that these other orgs had was they had like data orgs led by one person who was trusted by the CEO to do this data stuff basically, right, and therefore they all had a boss and therefore they can have alignments around what the hell should we do?

What are our priorities? Whereas me trying to build infrastructure, trying to hire a team, I don't have time or to understand the full context of all the things these people need and what I can do with the three engineers I have to support the entire company.

And so I did a bad job of it because I wasn't super good at it and was under fairly impossible circumstances.

I think even the person who was good at it would've had a hard time with, right.

There's a fun conversation going on right now, Stef around data product managers.

I don't know if you saw this one go by. Data product managers.

Hiring the first data product manager at Slack was the greatest day of my life. It was so great.

Stefania: When did that happen?

Josh: Oh, it happened honestly, ironically right as I was conceptually on my way out.

It would have been-- God, it must've been 2017. I can't imagine I beat it.

Jesus, Stef, I made it two years without her. Her name's Austin Wilt.

We interviewed a whole bunch of people for it. It was like one of those classic interviewing things where you interview the first person and it turns out they're great, right, but you don't hire them because it's literally the first person you hire.

So we interviewed this wonderful woman named-- who's had a phenomenally great career, but it was like she was great.

We were like "She's great but, obviously, we can do better we're Slack."

And we interviewed 20 people and they're all terrible. We let her go and she went to Uber or some shit. I'm like no, right.

It was just the worst. And then we finally found Austin and we finally interviewed Austin.

So this is a woman named Austin Wilt who I absolutely think the world of and I don't mind embarrassing in the podcast by talking about how awesome she is.

And hired her to be our data product manager because literally, her first job was to unblock the shit show of requirements, gathering, failure.

What is actually important for the business across marketing, customer success, engineering, analytics, finance?

Oh, we want to go public someday. Oh, blah, blah.

What product-- What actually do we need to do? What is actually important?

And since I had failed to do that, it fell to her to figure that out and I was deeply grateful for it because it made my life just infinitely better as an engineer.

As an engineering manager.

Stefania: That's incredible.

Josh: Right.

Stefania: So that sounds like a crucial hire.

Josh: It's an absolutely crucial hire. It's an absolutely crucial hire.

My other sort of bit of advice would just be if you're going to do this job just--

And your only person you're going to have in common is the boss of all your customers is the CEO, just make sure you're just God damn as absolutely aligned with that CEO as humanly possible about what he or she wants from the data team.

Because I wanted to build a data infrastructure that Google engineers would be jealous of.

That was what I wanted to do and that's what I did.

And that's sort of why Slack still largely runs the data infrastructure I built six years ago.

We did a good job. We did, we built some kick-ass data infrastructure.

But what Stewart really wanted to know it was, right now at the second, wherever I am on earth, I want to know how much money is Slack making?

How many seats do we have? Blah, blah, blah. This is actually what I care about.

This is actually what I want to have happen.

And I didn't do that because I'm like oh, that's just some silly flight of fancy the CEO has.

He doesn't actually need to know that, he can know it tomorrow it's fine.

Not actually true. Don't dismiss what the CEO wants because you think you know better than he does.

Why this wasn't obvious to me? I'm not sure.

I'm not very smart, but I learned my lesson now I guess. I don't know.

So can I blame it on being a new father and I wasn't sleeping for six months and stuff?

I don't know if that counts. I don't know if that's a valid excuse.

But just brain-dead stupid stuff. Just so stupid.

Stefania: I think another aspect to think about that it's just like when you're building the first steps of the data-driven organization, it's really difficult to thread the delicate path of when are you supposed to use your time to do stuff that just does not scale at all and will forever depend on the human throughput of yourself versus trying to build something as an investment for the future?

Josh: Yes. True for every engineering manager and everyone.

The nice thing is I always talk to--

I can talk to any engineering manager, engineering leader and they can tell me, no matter where they work and what field, where they are in the company, why their job is the hardest job in the world.

They have a point. I hear about the mobile engineering problems we had at Slack.

I'm like oh my God, thank God I'm not that guy, that's horrible.

And then you have the engineering directors who are in charge of the core crown jewels of messaging, which Slack is the absolute gold standard core thing.

But then it's like you have lots of resources, but what you also have is a lot of attention.

A Lot of attention. Maybe too much attention.

Maybe it'd be better if Stewart paid attention to some other product for a little while so we can get some stuff done around here.

Just maybe it'd be nice to not be the crown jewels of the company.

It'd be nice to be off to the side and not have this glare facing you all the time, right.

Stefania: Excellent point.

Josh: So everyone's miserable. I don't know. I enjoy it.

We used to do these data team lunches at Slack in the early days.

We'd get everyone together in the before times.

Once a quarter you'd all go out for lunch or you'd go over to Pinterest, or Dropbox, or Airbnb, whatever, we'd all have lunch together and we would get the data teams, we'd talk about our problems, and it was just an awesome exercise and say, "Thank God I am not Dropbox's date director or data engineering. Jesus Christ this guy's just trying to keep his name note up." Anyway.

Stefania: That was very good. I have two follow-up questions.

The first one is a simple one. How big was the organization when you hired the data product manager?

Josh: Oh, the company had gotten big by that point. It's a good question.

I think Slack was 240 when I joined and it was maybe 1,200 or so when I left.

So I mean, I think-- I mean, I'm projecting because, obviously, we grew in fits and spurts and stuff like that, right, but I want to say 600, 700 something like that.

Stefania: And do you think it should have happened sooner?

Josh: Oh, hell yeah.

They should have hired her before they hired me for the love of God.

Oh, hell yeah. I mean, to be building stuff in isolation with someone who doesn't ...

In this impossible job that's distinct from every other engineering director in the company who has features to ship, and there's me, I'm over here.

Okay. Supporting customer success, and marketing, and finance, and the product-- Oh, the product too. That thing.

Stefania: I actually love that your answer was this, even though I'm not-- I don't know if you're right that they should have hired her before you.

I'm giving a talk at DBTs conference in December-

Josh: Coalesce.

Stefania: Coalesce.

Josh: Great. That's awesome.

Stefania: The name of the talk is, Don't Hire a Data Engineer Yet.

Josh: That's great. I have another-- Stef, I have a favorite tweet that's not as popular as the data scientist one but it's the same thing which is the question I get asked all the time which is, should I hire a data scientist?

And the answer is no. No, you should not.

There aren't that many good ones and I need all of them, and you should not hire a data scientist.

Stefania: I at least totally agree with one of the implicit things that you're saying there is, it's really important to have good infrastructure, but the first thing to solve really is building relationships, and bridging gaps between data needs and stakeholders, and all those things.

Josh: Yes. Totally.

And to your point, I mean, again, things have gotten much better.

You don't need to hire me to build a custom version of Presto.

You don't need to do that anymore.

You can just like swipe-- Again, swipe your credit card it goes a long way.

A credit card reasonably smart engineer can really build you some pretty kick data infrastructure these days with not much work.

It's great. Again, we have-- I think, I'm fortunate we have some very data-savvy product managers.

I'm pretty sure one of them could have done it themselves it's not that hard.

I mean, I did it faster because I'm-- It's what I do, but they could've knocked it out in a couple weeks.

It took me a day, it would take them a couple weeks, right. Anyway.

Stefania: This is probably a good segue into, what are some of the things you wish existed or you wish you had back then?

Or even probably now? But I know that you have already shared your sailing a dream boat right now, but can we talk a little bit about that?

Josh: I mean, I think honestly then, Stef, I dreamed of being where I am now.

I think where I don't-- It's funny I remember working with Jeff at Cloudera back in 2012 and us talking even back then about thank God we don't have to think about infrastructure anymore.

Thank God we can just-- We can start talking about problems.

When you start thinking about the data, the machine learning, and stuff like that.

That turned out to be not remotely true at all for a better part of a decade, right.

So I mean, I feel like the world that I would wish for in 2015 is from an infrastructure perspective, is the world I live in now.

So again, I am-- If happiness is all about managing expectations, I am the world's happiest data engineer.

And I think what I wish for inevitably I think this is the thing that I feel, I don't, I haven't started a data tools engineering company because I am ...

Again, I invest and I look around and I'm like okay, this is good. Good people are building the right stuff.

I wish I could make them play nicely together and stuff, and I feel like a lot of my job is drawing connections between things that I think are important, right.

But really building this end-to-end semantic layer where it's from the ingestion side of things, the data generation side of things, all the way through the modeling and the transformation, all the way to the visualization to the end, this is what we're doing.

This is the world right now. And what the right combination of pieces is, whether it's we need the metrics layer or we need this or we need that or whatever, I'm not sure.

We're figuring it out and it's super fun, and it's really exciting.

And that said, if someone said to me, "Josh, we're going to figure this all out tomorrow it's done,"

I'd be fucking great. Next problem. That's awesome. That's what I want.

It doesn't work that way, obviously. I'm saying this now, right.

Thank God. I don't know what-- Stef, what comes after the semantic layer?

What do we do after we're done with this layer? What's next?

Stefania: Those are the big questions.

Josh: It'll probably be-- Again, bringing this back around to the-- To Chad Sanderson.

It's when we solve the semantic layer problems it will reveal the problems that we need to solve at the next layer.

The semantic layer needs to solve this problem of--

Because the plumbing is so good I can just do whatever I want and build my own sort of cognitive superstructure, which makes it difficult for us to work as a team and a company so, therefore, we need the semantic layer to solve that problem.

And then when we solve that it will reveal I guess the next problem.

Because again, if you told me, "Josh, in 10 years the data infrastructure is going to be so good that this is going to be the problem."

The problem is that anyone-- Everyone can create their own individual, personalized data world effectively they live in.

That's going to be the problem. I'm going to be like sure whatever.

I wouldn't believed you. It would've been-- Well, you sound like a crazy person.

Stefania: Exactly. I do want to also just follow up with my other follow-up question which was, you were talking about--

When you were talking about the data product management and you also talked a bit about the growth group in our last conversation.

And you talked about a lot of the tooling that was developed and the processes that were developed around logging.

They were sort of I guess developed through the growth team partially because they needed to bridge the gap between different stakeholders.

Josh: They needed it. Absolutely.

Stefania: Can you talk a little bit about what stage of the company that was at and how that developed?

Josh: So I don't know the growth story as well as I would like in the sense that I was, obviously, paying attention to my own absolute dumpster fire over in data most of the time, right.

But I mean, I think-- To me again, the story of every company, Stef is really a story of hiring.

Who hired when and stuff like that?

I remember when we hired Kelly Watkins, who's now the CEO of Abstract, that leveled up--

So Merci Grace led growth at Slack. She hired Kelly.

Kelly was an absolute step function raise in terms of the quality of our growth team, data usage, all that stuff. She was just amazing.

She just-- Again, she came from that world, right.

And she went on to become eventually the head of marketing at Slack so she was--

And now she's the CEO of Abstract. She's an absolute total badass.

And then Fareed Mosavat, who's over, he's doing, he's a fellow or something at OnDeck right now helping new people start companies.

He came from Zynga and Instacart.

Again, super data experiment-driven people who were used to these logging problems and stuff.

It was really the thing with my--

With building the data infrastructure I built and building it in the early days of Slack, was building a ghost town for this empty city.

But what I was really waiting for was these people to come.

I was waiting for the Kelly's, and the Fareed's, and the engineers we hired from Google, and Facebook, and stuff who expected this stuff.

Who knew that this stuff had to exist, and they knew that they knew how to use it, and they knew how to make it sing.

And again, in the bittersweet sense, it's the great joy I got was seeing them use this stuff to make Slack better.

Really was-- Again, even as I left they were taking the foundations I built and making them better, and filling in the things I missed, and fixing the problems I introduced and doing great stuff with it.

And it was, anyway, I just-- It was an absolute.

Absolutely bittersweet thing to see them do such awesome stuff, but sadly not be a part of it anymore so it's one of those things.

It's just about people. It really is about people, Stef.

It's about people who've been there, who've seen it done, who know how it works and can bring it with them. Absolutely.

Stefania: Exactly. This is awesome. And this has been absolutely fantastic.

I want to maybe wrap things up with just talking a little bit about how people think about building their data cultures.

For example, talking about misconceptions and what people should do to get things right.

And a segue into people's biggest misconceptions about data and how data and product analytics should work together and work in general.

One thing that comes to mind is what you just shared.

How people think about-- What are the CEO needs versus what is the infrastructure that we need?

And I want to leave it as an open question to you.

What do you think are people's biggest misconceptions about how data and product analytics should work or do work?

Josh: I mean, I think the biggest illusion to sort of dispel yourself of is that the data is ever right.

And I'll use just-- My favorite one.

I think one of the warm-up questions here for this was, you're asking what do people always get wrong in analytics?

The thing I see most common is users. How many users do we have?

How many users are there of Slack? Well, I have no idea.

It's like how long is the coast of England? Well, it depends on how you measure it, right.

Stefania: Exactly.

Josh: That's the biggest misconception.

I think that people operate under some sort of platonic ideal where it is possible to actually determine how many humans used our product today and it just isn't.

It's just not. And back to the idea of the trusting data stuff.

I think the problem people have is they go in naively assuming that there is a concrete answer to this question of how many users do we have?

Because it seems like the kind of thing that's pretty straightforward, right.

It's like how many people use it?

It doesn't seem like this should be so hard.

And when they find out that it's not actually easy and it's basically in fact pretty God damn near impossible to actually figure this out, they are very much disheartened, and disillusioned, and they trust nothing.

And they go to the way ... They go way existential it is whatever.

They're done with all this stuff, right.

And then the galaxy brain thing because it's just all--

We speak in memes these days, right. Is figuring out yes, this is wrong and yes, it's a lie and yes, there's all these problems with it, but we still got to use it to make progress.

And figuring out how to understand the limitations of this thing and still make progress with it despite them to make things better for everybody, that's the enlightenment moment, but it's hard and not everyone gets there and stuff.

A lot of people were just like fuck this we can't possibly know so let's just ship features basically and that's the thing.

Stefania: This is the perfect segue into I think what I want to maybe leave our audience with.

Just well, okay, and that's very disheartening and I agree and I've seen so many people go through that.

What do you think is the first thing teams should do to start getting their analytics right?

Josh: I mean, the first thing I did at Slack, the very first thing I did, which I did an okay job of but did not do--

And I'm not here to shill for Avo although, Anyway.

Was introduced evolvable logging, schema ties logging.

And I said in my very first week at Slack-- I got everyone together and said, "Here we go. There's Avro, there's Protocolbuffers, there's Thrift, we're going to pick one of these things and from here on out this format is the only thing we will accept in the data warehouse.

We will not ingest anything that doesn't come with a schema with types.

And we're building this and it's designed to be evolved, and it'll have all this tooling, and it will impose all this costs on everyone forever who wants to write anything to the data warehouse.

It's not a free for all. It's not JSON anymore, but it will pay dividends for the company basically forever in ways that you can appreciate."

Slack right now can go back and reprocess the application logs from 2015 whenever they want. They can just do that.

They always can whenever they want to. It's amazing, right.

Not a lot of companies can say that I assure you. That's the most important thing.

That was the very first thing I did, Stef, literally providing that foundation and that container.

And I think once again, what's exciting about the new world we're entering is like Avo exists and you can do a much better job of it than I did because you can impose not just structure and syntax but semantics in a lot of ways around these things which is exciting to me.

It makes me very happy. That's the most important thing.

If you get that it's-- I don't know. Jeff wants told me about managing.

He said, "If you hire the right people and you motivate them properly, you can do everything else in management wrong and you'll still be okay, but if you don't get those two things right, you can do everything else wrong and it won't matter you're still screwed."

And I feel like in many ways my career is a living embodiment of that principle.

That I-- To the extent that anything I did right it was because I hired well and I motivated people because I did literally everything else wrong.

And that's how I feel about the modern data stack stuff, which is if you get serious early, even more serious than I was about committing to structured logging, committing to strong semantics around this stuff, it will pay dividends for you.

I know it's not free. I know it sucks.

It will pay dividends for you forever. Forever, and ever, and ever.

Your future you's will thank you. That's actually not true.

They'll still hate you for the few decisions you made so never mind.

They won't appreciate you, but I will appreciate you. I will see you.

It's actually my favorite thing to do in data conversations with people is talk about this stuff because the data people get it.

They know how great it is because they've lived the pain and suffering of not having it that way.

So anyway. That's the most important thing to get right. It really is.

Stefania: Excellent. I think those are great final words. I want to thank you so much for taking the time. It was amazing.

Josh: Thanks for having me.

Stefania: It's been super insightful. I know that this is just--

Number one, it's so good to get the validation that you're not alone in the world, but I also know that everything that you're sharing is super valuable for so many people that are not only just starting their journey but also on their journey and trying to do things better.

Josh: That's right.

Stefania: So thank you. I really appreciate your time.

Josh: My pleasure. Thanks for having me.

Subscribe to Heavybit Updates

Subscribe for regular updates about our developer-first content and events, job openings, and advisory opportunities.

Content from the Library

Visit library

Jul 6, 2022

Podcast

The Right Track Ep. #12, Building Relationships in Data with Emilie Schario of Amplify Partners

In episode 12 of The Right Track, Stefania Olafsdottir speaks with Emilie Schario of Amplify Partners. Together they discuss...

May 31, 2022

Podcast

The Right Track Ep. #11, Defining North Star Metrics with Che Sharma of Eppo

In episode 11 of The Right Track, Stef Olafsdottir speaks with Che Sharma, CEO and Founder of Eppo. This conversation includes...

Mar 23, 2022

Podcast

The Right Track Ep. #10, Getting to Know Your Users with Boris Jabes of Census

In episode 10 of The Right Track, Stef speaks with Boris Jabes, CEO and Co-Founder of Census. They discuss the impact of SaaS on...