OCT 16, 2024

24 MIN

Ep. #22, Back to the Real World with Elijah Ben Izzy

GuestsElijah Ben Izzy

light mode

about the episode

In episode 22 of Generationship, Rachel Chalmers speaks with Elijah Ben Izzy, CTO at Dagworks. Elijah emphasizes the importance of streaming in chatbot interactions, highlighting how partial, real-time responses can significantly enhance user engagement. The discussion covers Burr, which simplifies AI workflows through state machines, and explores its use in debugging and complex AI/ML systems. Elijah also shares his thoughts on the sustainability of AI's hype and its energy demands, offering a balanced view on the future of this rapidly evolving field.

about the guests

Elijah Ben Izzy is currently CTO at DAGWorks, which aims to solve the problem of building and maintaining complex data pipelines. Previously at Two Sigma, he was building infrastructure to help quantitative researchers efficiently turn ideas into production trading models. At Stitch Fix he ran the Model Lifecycle team — a team that focuses on streamlining the experience for data scientists to create and ship machine learning models.

show notes

about the episode

about the guests

show notes

transcript

Rachel Chalmers: Today, I'm thrilled to welcome Elijah Ben Izzy to the show. Elijah has always enjoyed working at the intersection of math and engineering. More recently, he's focused his career on building tools to make data scientists and researchers more productive.

At Two Sigma, he built infrastructure to help quantitative researchers efficiently turn ideas into production training models. At Stitch Fix he ran the Model Lifecycle team, a team that focuses on streamlining the experience for data scientists to create and ship machine learning models.

Elijah's now the CTO at DAGWorks, which aims to solve the problem of building and maintaining complex data pipelines. In his spare time, he enjoys geeking out about fractals, pouring over antique maps, and playing jazz piano.

Eli, welcome to the show.

Elijah "Eli" Ben Izzy: Thank you.

Rachel: You write about a lot of the cool projects that are happening at DAGWorks. Let's talk about how to build a streaming chatbot. First up, why do we care? Why does streaming matter?

Eli: Yeah, so it's a little bit of a parlor trick almost. The problem is that LLM's take a while to respond. They're doing a lot of complicated stuff, and there's sort of this inherent thing where, as they take longer to respond, they can do a better job. So you kind of want to trick the user into keeping their attention there.

Rachel: It's just like the little spinning donut on Mac OS.

Eli: It's a classic UI trick. You want to keep the user there thinking about it so that they don't get sick of your thing and go to the competitor. So you have to give them something to think about. That said, the AI is not ready to give them sort of a well thought out thing, so it gives them the response piece by piece.

And the nice thing is that transformer models, sort of self attention-- All these things are built to make this fairly easy to do, so that you can actually start giving useful information back fairly quickly.

There's a metric in the AI world called "Time to first token." This is the idea that if you can sort of get a quick time to the first thing that the user sees, they'll be able to sort of engage with it better, and they'll be able to get value sooner. And thus be less likely to sort of give up or be like, "Oh this thing's broken." Turn it off, turn it on again.

Rachel: We are all dopamine junkies.

Eli: Exactly. And I think there's something else a little less quantifiable that streaming's just kind of fun. Like the first thing you do in programming, is you print something and you're like, "Cool." It says "Hello world."

And then you print something again and again and again. You do that in a while loop, and now you're like, "Okay, hello world, hello world two, hello world three."

And it's like that's more fun. So it's sort of a basic like same thing like is animation is like "Oh, things are moving on a screen. "This is cool."

Rachel: We have talked about Burr on the podcast before, but give us a refresher, and tell us about some other tools that might be useful as we build our streaming chatbot.

Eli: Yeah, so Burr's interesting. It's really simple. It's just sort of a way of orchestrating functions. And we're starting to see this pattern show up in entirely different places. So our other project, "Hamilton," there's kind of a American history joke for those who are aware of Hamilton and Burr, two figures in American history.

And there's all sorts of references buried in our examples and stuff. So, there was ways of orchestrating functions. And this is sort of the fundamental abstraction that we're finding really useful in the AI and ML world.

Burr is specifically about sort of orchestrating functions by representing them as a state machine, meaning you do something, then modify the state, then decide what to do next, do something, and you sort of keep walking through that.

And it turns out that can represent like a whole bunch of computation. So if you have an AI that's making decisions, you can ask it, "Okay, what model do I want to call?" Then it'll think of it, give the model back, then you use that to choose which sort of thing to do next, and then call that model, and you sort of keep going from there, and build it out kind of as a flow chart.

So it's nice for these things where you have actions which do something, these sort of tools where you can have, your actions can call out to external things, and then you can sort of string them together nicely.

And the thing is, without a system that models it formally, such as a state machine, this can get really hard to work with. So you have code that's calling other code, that's dependent on AI. The AI is doing all sorts of crazy stuff 'cause it's AI, and you have no idea what's going on.

Burr sort of helps in two ways. It helps you mentally model that, and it helps you track it, and understand what happened, and sort of plug it into different production concerns. So really it's just Python library that helps with all of this.

Rachel: I really like your characterization of a state machine as a good abstraction for an agent. 'Cause I think it's easy even for us who know what's going on under the covers, to think about agents as magic, and as more powerful than they are.

Thinking of the most state machine, you know, input, output flow charts gives us I think a much better handle on what's actually taking place. I think it's easier to reason about agents with that abstraction in mind.

Eli: Agreed. And I think that people like to sell agents as "Oh it's code that's writing code, that's doing code stuff." Or AI that's doing all this, and sell it as this whole sort of crazy system. But really it's just, yeah, like what do you do and do it, what do you do and do it.

And the sort of interesting thing about an agent, is that the system is deciding both what do and how to do it. Whereas with a purely human in the loop type system, you'll have a human being the one deciding what to do, and then the system actually doing it.

Rachel: There was another cool Burr application, you wrote that used the state machine abstraction to build what's essentially a time machine for debugging. Can you tell us more about that one?

Eli: Yeah, I was super stoked. So this is actually a request that came from a customer. They were building, so they were robotics I think called Peanut Robotics. They were on sort of the Discord asking for help.

And it turns out what they really wanted to do, is they had this whole complicated set of interactions. They had all these different functions which determining how the robot would interact with different things in its environment. They're using LLM's to do it.

So it's kind of their sort of new revolutionary thing. But they really wanted to go back in time, and figure out what it did at any given point, and sort of rewind back, and then make a decision and see what would the counteracts all be.

So we gave this ability, it's a pretty simple thing in the library, but Burr, to get into the nitty gritty details, Burr this capability to persist, so you can sort of persist your state, and then load up from where you left off.

And then with this time travel capability we built, you can persist your state, and instead of starting off from where you left off, you can start off from some number of steps back from where you left off. And sort of fork the application.

So if your application is running along and you're like, "Whoops, it made a mistake," you can go back in time, fork it, and then you can compare that to the next application, and thus allow you to sort of explore what happened, go back in time. And I think this is sort of a very critical way to debug.

'Cause if you're making chaotic decisions, if something a few steps ago can influence what you're doing now such as the robot, where not only is it sort of making these continual decisions, but it's also impacting the environment which can impact future decisions.

This is a really important thing to be able to really sort of iron down what that one point that was important was.

Rachel: Yeah, I mean it's magical enough with deterministic code, but as we move into this more stochastic world, being able to rewind, and play out the different sliding doors futures is pretty exciting.

Eli: Yeah, absolutely. And I think this sets a Burr apart from a bunch of other frameworks. Now I say it and we release this podcast, and they're all going to build the same-

Rachel: Yeah.

Eli: feature. It'll be, it'll be really great. It's just a good feature to have. But most people, it's kind of interesting. Most people think about these chaotic systems, and they don't even think about the question of "What if I could go back in time and figure out what was what it was doing?" The only question I think about is "What if I could even know what it was doing?"

So Burr, because it's sort of an opinionated, standardized way of writing it, kind of gives all of that to you for free, and that's what I'm really excited about.

Rachel: What are some of the other exciting projects that people are building with Hamilton and Burr?

Eli: Yeah, so tons of stuff. So yeah, Burr we're getting a bunch of cool things. Obviously robotics application, streaming voice for support.

So there's people who are streaming back stuff from open AI. Then running that through a voice text model, and doing that in a sort of fast enough way to have it be a live conversational response that I'm super excited about.

There's a company doing it support workflows. So instead of managing a complicated set of chains, this is like an agent really. It's like saying, "Here are the things you can do. Decide what to do." And then a lot of RAG workflows we've seen, so people like, we'll so this gets on the Hamilton side, people often ingest the documents using Hamilton.

So build sort of a set of assets and like run through six million say LanceDB, and then use Burr to manage the query. They can keep track of conversation state, they can massage the prompt, they can sort of take that, and filter out the stuff they find out from the database and put it into the LLM.

So that's kind of some of the cool uses in Burr. In Hamilton, there are like too many to count. They're really excited about it. So some of them, large enterprise using Hamilton for feature engineering, a lot of sort of large scale stuff on top of PySpark, which naturally have to be very messy.

Hamilton sort of helps you clean it. The British cycling team is using Hamilton to do, to optimize their Olympic team, or to optimize sort of the velodrome conditions.

Rachel: Okay. So they're not actually optimizing the cyclists themselves? I think that's outside the rules.

Eli: Yeah, I mean the cyclists are, the only people who can optimize the cycles are the cyclists, but they're using the data, and Hamilton is the sort of core driver in that pipeline. If you ever go to gov.uk, and enter "feedback," for any part of the government, it'll go through a Hamilton pipeline where they do NLP.

We recently had a really cool set of sort of asynchronous text to SQL workflows. So there's a new startup, Wren.AI is using Hamilton Async to sort of squeeze performance out of their text to SQL workflow. It's very core to their library.

A really great one recently released in the journal of open source software is Nature F. This is a model to sort of understand the climate, and the impact of building heights on climate, and then a whole bunch more. I could talk forever about this because these are all really exciting.

But the high level is that recently there's been a whole sort of explosion in the capabilities of data technology, and I think that people are starting to see that they need a layer on top of that. We're calling this sort of assets.

So if you're thinking about it, you've got like, okay, the data layer, you're messing out with Snowflake, you're sending data into an out of your data warehouse, and then sort of orchestration you think out like, okay, running jobs. But what you really want to be thinking about is this notion of assets, which we can map very cleanly in Hamilton and Burr to functions.

So instead of thinking about sort of all the sort of building blocks of computers, you think at a high level of what are the functions you need, what are the data sets you want to compute, and what's your code actually doing? People are starting to pick up Hamilton and Burr to represent this. We're calling it the asset layer. Super excited about that, and it sort of feels like it's just growing quickly.

Rachel: And most of what you're talking about is addressing Python devs obviously, but that particular abstraction, the asset layer, is that something that you could conceivably sell to data scientists?

Eli: Oh yeah. So I think half of the Hamilton users are data scientists, half of them are engineers. Burr, it's like the new AI engineers, which is like, or data set, like say same deal

Hamilton's a little easier to pitch to engineers, especially the asset layer. Data scientists tend to want to move quicker, they want to experiment versus data engineers, and sort of ML engineers who want to be able to build something a little more reliable.

So there's a constant sort of trade off between how reliable you build, and how well structured it is. We actually wrote a whole post about this trade off, but at a high level, yes, the asset layer is sort of a good way of thinking about it for everybody. And especially if you get anything into production, it becomes really useful.

Rachel: You seem to have quite a concentration of cool use cases in Britain. Do you have like some crazy boffin in London running around singing your praises?

Eli: So yes, we do.

Rachel: That's so cool.

Eli: It's really cool. And I, so we, I like went to London. I did like a whole trip where I did a talk in Lithuania. I went to London, and I met some of them.

I think the, actually the consultants that started using Hamilton in England are in Manchester. Sort of one group of consultants, they brought it to the British cycling team, then somebody from the uk.gov interviewed with the consultancy, which asked them a question, and that's how they figured out about it.

And it's like kind if you look at like a, the same type of math as like viral spread.

Rachel: Yes.

Eli: Right? You just see that like the initial point is really important to figure out where it is. So we have internal metrics that we look at, and England and the US far way above the others. And then a lot of Europe too.

I think there's also a thing where Europe is sort of tightly regulated, and if you're tightly regulated, you have less room to mess around.

Rachel: Yeah.

Eli: So in thinking about things in terms of assets allows you to answer questions about assets, which is really what the government wants to know.

Rachel: Yes.

Eli: Right? So understanding privacy stuff, GDPR, all of that, is, people think about that first and foremost. In Europe and in the US, that's often an afterthought.

Hamilton allows you to both move quickly to sort of get the US speed of development that they've come to expect, and to trace through and understand what your codes actually doing. I think that's another reason for it.

Rachel: Speaking of moving quickly, as a startup founder, do you worry about this massive AI hype wave? What's going to be left behind when it crashes, if anything?

Eli: Yeah, I mean I don't know. I don't like hype a lot. I just find it to be like pretty insufferable. It kind of brings out the worst in people. As you throw money, then all the vultures are attracted.

The money on the other hand, I'm actually like, I think that AI hype is probably going somewhere. And most hype cycles have turned into something, right?

It just hasn't turned into exactly what we thought it was. So like if you like iOS, and mobile stuff, and cloud, that hype cycle is like that's actually just how we program now.

Rachel: Mm-hmm.

Eli: But yeah, I think it's going places. That said, I think that some parts will crash.

I think the obsession of talking as an interface is a little short-sighted. As in we've been spending years and years developing interfaces that are extremely well tuned. So now to replace it all with conversational stuff is like, well it works in someplace, it doesn't work in others.

Rachel: Mm-hmm.

Eli: Not that just magic replace all people thing.

So when the tide rolls out, I think there's going to be a lot of AI use cases there. I mean I use ChatGPT all the time, and if I use it I'm like okay, plenty of other people do it too. It's actually adding to our value, it's adding to the GDP. But I don't think it's going to be replacing literally everything we do.

So I think it's somewhere in the middle as most of these are. So when it rolls out, I think that obviously the people who are good at it, and the tools that are actually useful, that aren't just built around, how do I take advantage of AI, and VC money pouring in, will still be used, and the ones that aren't useful will probably end up falling apart.

Rachel: The one that makes me a little anxious I guess, is thinking about how we are in an experimental phase in a lot of these use cases are somewhat speculative, but you know, the fossil fuels we're using to underpin all of these technologies are not speculative, and they're really like getting used up. Do you think the benefits will outweigh the harms long term?

Eli: I don't know, to be honest. The way that I think about it is there will always be use cases for energy. So whether we're burning through it in AI, or we're burning through it in crypto, the thing that is most important, is to get to a point where we actually have an abundance of energy, and our energy is cheap. I think that is the like-

Rachel: Yeah.

Eli: The core problem to humanity. And if we can solve that, world pieces within our grasps.

So yes, I'm quite worried about that. Whether I think it's worth it, I don't, I don't know. I'm hopeful that the, that okay like, use energy, come up with things, those things help make us smarter.

The fact that we're smarter can help find cheaper energy than we paid for the energy in droves in the beginning. On the other hand, it's just burning through energy, and it's all sort of funneling money to funny places.

There are cheap also cheaper ways to do energy. So I think if it forces us to optimize society, maybe that's good, but that requires a bunch of incentives to align, which is complicated. So I don't really have a good answer to that, other than "Yes, I'm worried."

Rachel: I don't think there is a really good answer to that. I just, I think it's worth thinking about.

Eli: Oh absolutely. Yeah, I mean this was, this was my big complaint about crypto.

Rachel: Yeah.

Eli: It'd be like, you're just like using like 2% of the world's energy for this. It's like you're just moving money around. And they had good answers. They're were like, "Oh yeah, we can like use factories in China that don't need to shut down or whatever, and we'll just burn through."

So there is something to that like force it to be more efficient. But that might be a better argument in theory than a practice.

Rachel: Yeah, I guess we'll see how it plays out.

Eli: Yeah.

Rachel: What are some of your favorite sources for learning about AI?

Eli: Right, so I try to stay off LinkedIn, 'cause there's very little learning that goes on LinkedIn.

Rachel: Lot of self-congratulation.

Eli: Yeah. Self-congratulation, self-promotion, and like I use it strictly to self-promote, and to promote others in a transaction. Most of everything that goes on there is transactional, and learning is not a, I don't think an inherently transactional activity like a lot of people do.

Rachel: Yeah.

Eli: And then like, stuff from LinkedIn will be like three weeks old digested Hacker News stuff.

Rachel: Yep.

Eli: But sometimes you'll find interesting things at "Hacker News." Really, I'll read papers if it's new and if I really want to learn. And honestly, going to ChatGPT to ask about it has been extremely helpful. Like ChatGPT and Claude.

I found this to be the, like the superpower use case for LLM's for me, where I can sort of do interactive learning. I'll start off at like the lower level of understanding.

So if I really want to dig into like transformers, and attention sinks and all that, I'll ask it like, "Tell me, gimme a lesson plan." "Tell me about these things."

And then I'll sort of like ask it more about a specific thing. Then I'll, the really cool thing is I'll ask it to evaluate momentum model, and I'll present here's how I think it works, and I'll be like, "That's right, that's right. This is not quite right."

And that's a very great way of doing interactive understanding. So that's where I go to learn about AI.

Rachel: Have you tested that in domains where you're pretty deep? Have you compared it to your own existing knowledge about certain fields?

Eli: That's a great idea. I haven't actually. I should. I think the problem is I get kind of bored of that very quick, 'cause it's weird.

Rachel: Yes. Yeah.

Eli: So, which is interesting, like if I'm asking in TypeScript where I'm good at but I'm not like an expert. It'll give me back some answers, and I'll be like, "Wow, it's really smart."

Then I'll ask about it in Python where I've had like way more experience, or Java. And it'll give me back answers, and I'm like, "Eh, you don't know this thing that's new." Like this, like that's a classic beginners mistake you just made.

So I think, from that experience, I think it hasn't been super helpful in domains that you understand well. On the other hand, if you go into it thinking that you don't have any domains that you understand well, so go into it with humility, it'll probably teach you something.

Rachel: Yeah. I mean that's kind of my mental model for it now. You know, and it spicy predictive text mode. It can get you up to mid. So if you're bad at something, it can get you up to average.

But beyond average, it's still down to you. It's still down to you to understand nuance, and edge cases, and all of the subtlety that create real mastery. I think where these things are going to be really useful is like bringing the bottom up, and letting us build from a more sound foundation.

Eli: I think so. So I have a, I have a rant that I go on when I talk about AI stuff with like fellow YC founders.

Rachel: Mm-hmm.

Eli: And I think they're all sick of it by now. But the basic idea is that an LLM can function as two things. It can be a database, and it can be a reasoning agent.

So it can sort of like help you understand. And it obviously uses database information to be a, as a reasoning agent. It's actually bad at both of these.

Rachel: Yes.

Eli: But it's good enough at both of these to provide use. In some cases, you can sort of push it in the right direction, and it can be better at one, and better at another depending on how you build it. So the database gets at that first getting you from zero to mid. Right?

Which I think is actually like, people spend a lot of time on that. And if we can get people to learn faster, then we can get them into expertise quicker, and then make sort of more growth as humanity.

But getting from mid to expert, I think there is still a use case for it to help you think about things. And maybe it's as simple as something that you call rubber ducky debugging.

Rachel: Yeah.

Eli: Or rue golden lab debugging as my old mentor used to call it, where you like, have some code, you start talking about it, and if you talk to a wall, or a golden retriever who's just going to look back at you all cute, you'll start realizing the bugs that you had.

And sort of the first line of defense, so maybe it's the forcing you to think about it and then the added sort of thinking back, is helpful enough to help us learn in a more expert case.

Rachel: And it's back to maps isn't it? We've just invented maps. We've got this huge end dimensional space of written perceptions about the world that this thing has been fed on, and it knows a little bit about how those parts are related, and a little bit about how to navigate it. And yeah, it gives us sort of a crystal that we can catalyze other solutions around.

Eli: Absolutely. On a side note, have you ever asked it to create maps?

Rachel: I have not. What happens?

Eli: It is utter junk. You like none, even the best models aren't good at this. So it'll be like okay, "Draw me a map of California." It'll have like two bays. It'll be like San Franbopula.

Rachel: I mean honestly, I feel bad for these things if they are sentient, 'cause we're keeping them in like a float tank, a sensory deprivation tank, and peppering them with questions. And the poor things really need to get out and touch grass.

Alright, I'm making you God Emperor of the solar system for the next five years. You get to decide how everything goes. What does the world look like?

Eli: So yeah, for the next five years, I'm excited to see like code standardization. I think this is a natural thing in compute when you build higher and higher level frameworks, the lower level pieces get more and more standard, right?

Like "Here's the different operating systems we use." "Here's the different programing we use." And I think on this asset layer, there's a lot of room for standardization. So, not as God Emperor, but as a startup founder I would sort of push towards 'em. That's kind of what we're working on. Which will allow people to move faster as they come up with the right abstractions.

Especially in sort of the data space, understanding how data flows through, and really be able to do stuff with the data, as opposed to arguing about sort of the ways to structure code.

I think that if I had my ways, people would start realizing how you can use AI to help, and just sort of the ways that we talked about, and not think about it as this like one sort of cheap solution that'll solve all of our problems, or this one cheat solution that'll get to them, get them rich, and really think about like, "Okay, how can we use it to help our, solve our problems?"

How can we use it to reason better? How can we use it to teach people, to make people more powerful? So yeah, push it in that way. And then obviously as God Emperor, I would be very excited about sort of the two things that I'm really excited about, which is fusion and quantum compute.

I think these are pretty essential in getting us to the next stage. So as God Emperor, I'd push towards abundant energy. I think that'll be like, that is the sort of underlying problem that we're all facing. And I would absolutely love to see even more progress on that.

I mean obviously that there's so much money in fusion quantum compute, but I think that's just like, betting on the future is huge, and those are the sort of clear areas that we can improve in.

And then finally I would see if I could get people off social media, and sort of push them back into the real world. I'm starting to realize that in the Covid era, we all started working from home. We have been isolated, and I think it's kind of changed the way that we think, quite a bit.

So I think that we can, I think there's a lot of strength in local communities, and connections among people, even in online communities. But really I think there's like a way that we've kind of lost our way, and are just sort of thinking in short term crazy mode because of the sort of volatility of the internet.

Rachel: Yeah, it's fascinating how little Spanish influenza is directly mentioned in 1920s' literature, and yet a whole roaring twenties was clearly a post-traumatic response to both World War I and the huge wave of death in in the Spanish flu.

Eli: Yeah, that's a great point. I mean I am kind of worried that our reaction to Covid is just a roaring twenties, but on the internet, which sounds a lot less fun. Okay, maybe I'm old school. Maybe Gen Z has this figured out.

Rachel: Last question. My favorite. I'm giving you a generationship. We're going to the stars together. What are you going to name it?

Eli: I want to call it "Morning Sun."

Rachel: Oh that's a good one.

Eli: Like the sort of New Horizons. It's a name of a street that is dear to me. And yeah, Morning Sun, it's the, sort of promise of the future.

Rachel: We will get that built for you right away, Sir.

Eli: Please do.

Rachel: Thank you so much for coming on the show, Eli. It's been a pleasure.

Eli: It was a blast.

Rachel: Good luck with everything.

Eli: Bye.

Rachel: Take care.

Content from the Library

Visit library

Apr 22, 2025

Podcast

Platform Builders Ep. #4, Building Affinity: From College Dropout to SaaS Leader with Ray Zhou

In episode 4 of Platform Builders, Christine Spang and Isaac Nassimi interview Ray Zhou, co-founder of Affinity, about his...

Apr 17, 2025

Podcast

Generationship Ep. #34, Together with Nathen Harvey

In episode 34 of Generationship, Nathen Harvey brings data, humor, and heart to a conversation about AI, DevOps, open source, and...

Apr 10, 2025

Podcast

Open Source Ready Ep. #11, Unpacking MCP with Steve Manuel

In episode 11 of Open Source Ready, Brian Douglas and John McBride sit down with AI expert Steve Manuel to explore the Model...