This week on The Data Stack Show, Eric and John welcome Misha Laskin, Co-Founder and CEO of ReflectionAI. Misha shares his journey from theoretical physics to AI, detailing his experiences at DeepMind. The discussion covers the development of AI technologies, the concepts of artificial general intelligence (AGI) and superhuman intelligence, and their implications for knowledge work. Misha emphasizes the importance of robust evaluation frameworks and the potential of AI to augment human capabilities. The conversation also touches on autonomous coding, geofencing in AI tasks, the future of human-AI collaboration, and more.Â
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:13
Welcome back to The Data Stack. Show we are here today with Misha Laskin and Misha, I don’t know if we could have had a guest who is better suited to talk about AI, because you have this amazing bass you in your co founder, working in sort of the depths of AI, doing research, building all sorts of fascinating things. You know, being, you know, part of the history is background of acquisition by Google and, you know, on the Deep Mind side, and some amazing stuff there. So I am humbled to have you on the show. Thank you so much for joining us.
Misha Laskin 01:05
Yeah, thanks a lot, Eric. It’s great to be here. Okay, give us just a brief background on yourself, like the quick overview. How did you get into AI? And then you know, what was your, what was your high level journey? So initially, I actually did not start in AI, I started in theoretical physics. I have wanted to be a physicist since I was a kid. And the reason was, I just wanted to work on what I believe to be the most interesting and impactful scientific problems out there. And, you know, the one miscalibration that I think I made is that when I was reading back, and like all these really exciting things that happened in physics. They actually happened basically 100 years ago, and I sort of realized that I missed time. You know, you want to work on not just impactful scientific problems, but the impact of scientific problems of your time, and that’s how I made it in today. As I was working in physics, I saw the field of deep learning growing and all sorts of interesting things being invented. I actually would maybe get into AI by seeing AlphaGo happen, which was this system that was trained autonomously to beat the world champion at the game of Go. And I decided I needed to get into AI then. So after that, I ended up doing a post doc in Berkeley, in this lab called Peter, Peter Bill’s Lab, which specializes in reinforcement learning and other areas in deep learning as well. And then I joined Deep Mind and worked there for a couple of years, where I met my co founder as we were working on Gemini and leading a lot of the kind of reinforcement learning efforts that were happening at Gemini at the time,
John Wessel 02:41
yeah, so many topics we could dive into. Amisha, so I’m gonna have to take the data topic. So I’m really excited to talk about how data teams look the same and how they look a little bit different when working with AI data. What’s the topic you’re excited to dig into?
Misha Laskin 02:57
I think on the data side, there are many things I’m really interested in, but something I’m really interested in is, how do you set up kind of evaluations on our data side that ensure that you know, you can predict where, where your AI’s will be successful? Because when you deploy, when you deploy it to a customer, it’s sort of, you know, you don’t know exactly what the customer’s talents are, and so you need to set up evals that allow you to kind of predict what’s going to happen. And I think that’s part of, a big part of what a data team does is setting up evaluations. And it’s maybe one of the least, maybe it’s one of the last things that a lot of people think about and think about AI because being about language models and reinforcement learning and so forth. But actually, the first thing that any team needs to get right in any AI project is setting up clear evaluations that matter, and so on the data side, that’s something I’m really interested
Eric Dodds 03:50
Misha Laskin 04:43
Yeah, I considered cowboy first. But yeah,
Eric Dodds 04:46
cowboy, fireman, physicist.
Misha Laskin 04:51
Well, what happened is that I’m not from the states originally. I’m Russian and Israeli, and then I went to the States as a kid. And. And when I moved, I didn’t really speak the language very well, and didn’t have a community here, and so I ended up having a lot of time on my hands. And my parents had a library, you know, number of different kinds of books, but one of the books that they brought with them were these lectures in physics by fine and this is kind of a legendary southern book that I recommend anyone, anyone should read them, even if your physics or not, because it’s kind of an example of really clear and in that way, clear and simple and very beautiful thinking. And I read those books, and it was just so interesting that the way in which Feynman described the physical world, the way in which you could make really counter intuitive predictions about how the world works by just understanding how it works from, you know, a set of very simple assumptions, very simple equations. The short answer is, I had a lot of time in my hands and got interested, actually, in a lot of things. I, at the time, got interested in literature as well. I ended up double majoring in literature and physics. But it was literature and physics that I got interested in at the time, and then ended up going kind of hard committing to physics. Wow,
Eric Dodds 06:11
absolutely fascinating. Yeah, when you when you’re chatting before the show, and you said, you know, I realized it was 100 years too late, I was like, oh, theoretical physics. The answer to that problem, is it, you know, is traveling through time, yeah. So you can get back to, you know, back to that era, yeah.
Misha Laskin 06:27
Well, it might be that the problems that we have in physics today are just so hard that it’s really hard to solve them there. I think that progressing is probably not, is definitely not being made nearly as quickly as it was 100 years ago. And there’s so much to discover. And one of my kind of folks with AI is that we develop a is that are smart enough as scientists, that they help us answer these fundamental questions that we have in physics, which I think, like to me, seemed like a complete sci fi thing even a few years ago, but now, almost counter intuitively, it’s, I think everyone we like theoretical math and theoretical physics is gonna be one of the first use cases that we apply to spend next generation of models that are coming up today. Fascinating.
Eric Dodds 07:08
Yeah, let’s dig into that a little bit. Because one of the questions, just one of my burning questions to ask you, was, what do you know? What do you envision the future with AI to be like, what does that look like for you? Maybe in sort of some of the like, the best ways possible. So, for example, you know, AI can help scientists accelerate progress on these, like, monumentally difficult problems to create breakthroughs. I mean, that’s incredible. What other types of things do you see in the future that make you excited and in the ways that humans will interact with AI or the way that it will, you know, shape the world that we
Misha Laskin 07:49
live in? Yeah, I think that. I mean, I’m very personally, like, quite optimistic about AI. Obviously, there are a lot of things that we need to be careful about from a traditional safety perspective. But there’s one quote that I heard a friend say that really stuck with me, which was, you know, so artificial and general intelligence, AGI. He said, You know, I think AGI will come and no one will care. I hadn’t heard that before. Then I thought about it, and I think it’s, I think that’s what’s going to happen. But from the perspective of right, we have computers today, we have personal computers today, which is a massive leap from right, what we, what people had, you know, decades ago, or personal phones. And I would say we and we don’t care, like, we just don’t know our lives in any other way. Like we don’t know what life is like before computers or before personal phones, even though, right, the iPhone, yeah, I remember when it was like not having an iPhone, but from a day to day perspective, I never even think about Sure. So I think what’s going to happen is that, you know, all of the ways in which AI is going to transform us are going to be similar in perception to the way technology has transformed us already. And so what I mean by that is that I think that in AI there are oftentimes, like, a really polarizing, like, either, like, hyper optimistic we’re going to be, you know, you know, it’s going to be completely transformed world, which obviously it is, or, like, doomsday scenarios like, things are really down poorly. And I think the reality is that it’s a remarkable piece of technology that’s probably more transfer transformative than mobile phones or, you know, computers themselves, but the effect on us as people is going to be that we just live our day to day lives, and, you know, and it doesn’t it changed our day to day lives, but we won’t even remember what live used, where large used to be like. So yeah, I think what’s going to happen, for example, from a work perspective, is that, you know, now we don’t really take notes, right, with pencil and paper. We have, like, much better storage systems on the computer, forum notes and things like this. And so, right? We’ve accelerated. The amount of work we can do just by having a computer, then knowledge work we can do. And I think there’s going to be some massive increase in, like productivity, especially knowledge work to start, and in physical work as well. But let’s just think about knowledge work, I think, in the future, and this is kind of how I at least think about AGI, is that it’s a system that does the majority of knowledge work on a computer. So what I think that means it’s not if it’s like a zero sum pi, and that we go from today doing, let’s say, almost 100% of knowledge work on a computer to us going to 10% AI going to 90% and now we’re doing 10x less work. I think it’s going to be that we kind of work the same amount that we did before, but we’re getting 10x more things done, and we don’t even remember what it was like to, you know, get the amount of things down that we do today. I think that’s what, that’s what the world is going to look well,
John Wessel 10:52
that fits the historical curve too, right? Like we don’t even know what it’s like to sit down and hand write a memo and then wait several days for it to get delivered, right? Like, compare that to email, for example. So it seems like it would fit that curve, right? Like, you get that drastically faster, more leveraged life, and it’s just the life you live. Yep,
Eric Dodds 11:17
yeah, absolutely fascinating. One, one follow up question to that, Misha, you talked about your co-founder developing some pretty amazing technology that could mimic what humans do, right? And actually, you mentioned, you know, seeing the world champion, world human champion, get beat by AI. And then I believe your co-founder developed an autonomous system that could play video games by looking at a screen, which is pretty wild. And one of the interesting things is that maybe you can give us some insight into the research aspects of this, but replicating things that humans can do seems to be a consistent pattern. But one thing that’s interesting about your perspective on, you know, we sort of go about our day to day work, and we get 10x throughput. One thing that’s interesting about that is, you know, is that a replacement of some of the things that I’m doing as a human? Is it augmentation? Can you speak to that a little bit? Because just replicating, you know, the keystrokes that I make in my computer isn’t necessarily the way to get 10x right? And I think we know that context is something that the AI is amazing with, right? It can take context and really do some amazing things with it. So can you speak to that a little bit in terms of replicating, humans, augmenting like, what does that actually look like?
Misha Laskin 12:44
Yeah. So the first thing that I’ll say is that the kind of algorithms developed leading up to this wallet that we’re in right now at tee on, and the things that you mentioned too, that Giannis, my co founder, worked on, which we call geek networks, in the case of video games and AlphaGo in the case of the Go example, were actually super human. So they got to a human level, and then they exceeded it and became superhuman. So when you look at an AI system playing Atari, it looks kind of alien, because it’s just, it’s just so much better now, you know, than a human could be. And the same thing was true for go. And now what you said was right in that the way these systems are trained, they especially like, let’s take AlphaGo net as an example. It had two phases. The first phase was you train it to mimic human behavior. So you have all these games, online, games of go, like, similar to how it just has online game servers. Sure, they’re a bunch of online game servers to go to. And they picked a bunch of those games and filtered them for like the expert amateur humans and taught a model to basically imitate like expert, amateur human behavior. And what that ended up getting was just a model that was pretty proficient, but still just kind of a human model. And then after that, they trained that model that, you know, that sort of human level model with reinforcement learning, based on feedback of whether the model was winning the game or not. And the thing with reinforcement learning is that you don’t need demonstrations from people, and you just need a criteria for whether the thing that the model did was correct. And as long as you have that, which, in the case of the game of Go, is, did you win the game or not, sure you can basically push it almost, you know, if you throw enough compute at it, it will get the super human model right. It will just find strategies I’ve never even thought of, and that’s kind of what ended up happening. So there’s a famous move called Luke 37 in the game of AlphaGo against Lee Sedol, the world champion. And go and move. 37 was a move that looked really bad at first, like analysts were looking at it. We’re confused, and Lee Sedol is confused. Everyone was just really confused by it. And. Then it turned out, a few moves later, that it was actually a really creative play that was just really hard for people to wrap their minds around. And it turned out to be the right play, in retrospect. So we are all friends. What I’m trying to say is like we have the blueprints for how to build super human intelligence systems, and so I think we are heading into an era of super intelligence. Now, it does not necessarily mean super intelligence at everything, but we will have models that are super intelligent at some things. Well,
Eric Dodds 15:35
I think that’s a great time to talk about reflection. So tell us about reflection, and because that’s a focus of what you know, what you’re trying to do at reflection. So tell us about reflection and what you’re working
John Wessel 15:48
before we jump into that just because I think I’ve seen a lot of this thrown around in, like, news articles and stuff. So you’ve got AGI, right? And you’ve got the super human, and I think there’s been some chat around that, like, Oh, we’re like, we’re like, moving past AGI to superhuman. It’d be, it’d be awesome, I think, for the listeners to just take a minute and be like, All right, what do we mean AGI? Obviously, that’s like, general intelligence, super human. And then, like, just parse that out for them a little bit. Because I think, yes, words already are just getting, like, thrown around. Sure people repeat them and like, right? You know, what
Eric Dodds 16:19
Does it mean to go beyond human level proficiency and be superhuman? Yeah,
16:24
right, yeah,
Misha Laskin 16:27
yeah. And I think, you know, if we put other words into the mix, that may be good to kind of talk about later, is also the word agent, right? I think the word Yeah, let’s throw that into this, yeah, then super again, yeah, exactly, many things. So at least the way I think about it is, first, I don’t think about binary events, like there’s AGI and then there’s super, basically AGI. I think about it more as a continuous spectrum. And that’s kind of how, like in the game of Go, for example, there was no it’s really hard to pinpoint a moment when it went from, you know, human level intelligence to super human. Like, the curves are smooth, like, so it’s kind of a smooth continuum. And even, you know, sub human intelligence, like, it’s smooth from sub human to human up to super human. So it’s really around, like, if we have discovered methods that scale, that the more kind of compute and data we throw at them, the just predictably right, they scale in their intelligence. Then those are the kinds of systems that we’re talking about. So to answer your question, to me, the distinction between sub human, you know, intelligence, human intelligence and super intelligence is just where on the smooth curve of intelligence are you? Yeah? Now it’s helpful to be, you know, yeah, it sounds to define, like, what some of these things are, and different people have different definitions for AGI. I think that it’s centralized, like the community has converged on what people agree it to be, but we have a version that we’re working with, a working version that is kind of meaningful to us. And that’s kind of how we think about AGI, which is, it’s a functional definition. It’s just, we’re thinking about digital AGI. We think the same thing can be applied to physical AGI, right? It’s a system, isn’t it? We don’t know how, like, we don’t know it’s on, you know, it can be a model. It can be a model with, you know, with tools on a computer, but it’s a system that does the majority of knowledge work on a computer. And notice, I’m not saying the majority of knowledge work that people do today, because I think the knowledge work that’s done, you know, even a few years from now, is going to look largely different. So just at a given point in time when you assess the work that’s being done on a computer that’s struggling with economic value is the majority of that being done by humans or by computer, basically the computers themselves and to me, that’s kind of what AGI is. So it’s more a functional definition. And what that means is that the only benchmark that matters is whether AI is doing meaningful work for you on a computer. It doesn’t matter what math benchmark itself. It doesn’t matter. None of the academic benchmarks matter whatsoever. All that matters is it doing meaningful work for you on a computer or not? And so what’s an example of like products that I think you know, make an equal impact along that kind of benchmark? Let’s say GitHub co pilot, right? GitHub co-pilot, you can just track the amount of rights, like code that it writes versus the amount of code that a person writes. Now, of course, you also have to decouple the amount of time a software engineer thinks about the design of the code and things like this, but it’s hard to argue that it’s not doing work on a computer like it’s definitely doing some work on a computer. So on the smooth spectrum from sub human intelligence to human intelligence, super intelligence, I think co-pilot is on that spectrum, right? It might not be general intelligence, but it’s on the way there.
John Wessel 19:59
Okay, so quick, quick follow up. And then I definitely want to dig in on reflections application of super human intelligence, but something that’s frustrated with me a little bit, and how we talk about this is we’ve got this, like a curve that you just explained, but then we treat the human intelligence as like a static factor, like some kind of standard to get to. And like, I would, I mean, the way I think about it is, like, that human intelligence has changed over time, for sure, and will continue to change. And I think there’s an aspect of, like, whenever we talk about AGI, like, when is AGI gonna happen? It’s like, Well, I think humans are gonna get more intelligent too. And that, like, you know, like, even with a game of Go example, I would think it’s very possible that, like, if somebody used, you know, this model to essentially, like, learn new go strategies and therefore, like, they’re better too, now maybe, you know, maybe the AI is still better than them overall. So, like, maybe just briefly, like, love your thoughts on that? I
Misha Laskin 20:58
I think that’s actually exactly what’s happened, that the Go community and the chess community, they both, yeah, they both learn from the AI systems now. So right, what made 37 special people analyze it and have incorporated that into their game play. One of the things I’m really excited about is, you know, I just remember what my life was like as a theoretical physicist, which is, I mean, it was very like theoretical physicist, like, you know, write equations on a chalkboard, and, you know, derive things pencil and paper. And you basically sit in the room, think really hard, derive things, go talk to collaborators, and, you know, kind of try to sketch out ideas on the chalkboard. And what I’m really excited about, you know, AI, especially the United States, super intelligent in some aspects of physics, that it’s going to be this sort of patient and infinitely available thought partner scientist to be able to do the best work. So I think, you know, that kind of for for a while, it’s going to be the combination of, you know, the scientists together with an AI system that works together to accomplish something, because something that’s kind of counter intuitive we usually think about intelligence is this very jet like general thing, because humans are, like, generally intelligent, and these AI systems are generally intelligent, and it will continue to be as well, but it but general in their case, means something different than in our case. That is to say they can be intelligent across many things, but there are some things where they’re not going to be as intelligent that are counter intuitive to us, because you’re like, Wait, that’s like, so easy for us, right? It’s kind of like the Yeah, we have these systems for playing like, Go, but it’s really hard to train robots to, you know, like, move a cut somewhere or something like this, right, right, yeah, yeah, yeah, yeah. So yeah. That’s how I kind of see the interplay. I think that this universal generality as we see it as sort of maybe, if possible, but as someone who needs a goal, these AI systems like end up spiking at many things that are counter Intuit to us and end up being, you know, pretty dumb. Many things that are intuitive, and we’ll sort of CO evolve together with them. Yeah,
Eric Dodds 23:04
yeah, that’s such a helpful perspective media. And I want to return to the point that you made around the definition of AGI, or the working definition reflection around, you know, AI doing the majority of knowledge work on a computer, but with the important distinction that you know, that’s not just a wholesale replacement, you know. So it’s not like, you know, the human is not even interacting with the computer. It’s that the knowledge work that a human does actually changes. And I think that’s a really helpful mindset to have in that when we talk about, you know, the future of AI, we tend to think about how it impacts the world as we experience it today, when, in fact, it will be a completely different context, right? There will be new types of work that don’t exist today, you know, which is really interesting. So just appreciate that
John Wessel 24:00
there’ll be things that it’s bad at, like, there’ll be lots, maybe more human cup movers, or the equivalent of that, maybe, yeah, and knowledge work that, yeah, be interesting.
Misha Laskin 24:12
Yeah, there’s actually a and seeing from, I think it was like, Willy won CO, you know, Charlie and Chocolate Factory, yeah. And it’s, I think it’s the Tim Burton, Johnny Depp one where they show, like, his father being on the conveyor belt line and, like, screwing on caps, like a piece of toothpaste, and then one day he gets replaced by a robot that does that. Yeah. When I was at Berkeley, I studied robotics, and, you know, how to make robotics autonomous. And then I thought about that, and it was like, that’s actually a really hard problem. Yeah, you know, like, that requires dexterity, that requires like, like, it’s all those things that, you know, in the movies, we think, like, you can, you can do that easily, that that was like a one thing that’s counter intuitive. It’s really nice. Yeah, sure. That’s hilarious,
Eric Dodds 24:53
yeah. I mean, that was truly, truly fantasy, you know, in the movie, well, let’s. Over to reflection. So you described reflection. I mean, you and your co founder have backgrounds in research, and so I’m assuming that’s still a big part, because you’re trying to solve some really hard problems which requires research, you know, but you’re also building things that you know, that people can use you’re, you know, I know you’re still early on the product side of things, but what can you tell us about about what you’re working on and what you’re building?
Misha Laskin 25:29
Definitely happy to share more. So the way we think about our company, and the way we thought about it since we started it, is that we, you know, we’ve been on the path of as researchers of building AGI for the better part of the decade now, or that that was kind of our interest, right? Giannis, my co-founder, joined DeepMind in 2012 as one of the founding engineers, when it was just a crazy thing to even say that it just seemed like a complete sci fi dream that you know you want to work on AGI and in the scientific community, most people kind of even ostracized you if that’s kind of what you want to do, because it was just such a crazy, like almost unscientific thing to say, like, it’s just not serious. And so he joined at that time, and this is when these methods and reinforcement learning were developed that resulted in these projects like deep tree networks and AlphaGo. But ultimately, you know what? The reason he joined, the reason I joined, you know, AI as a researcher, is this belief that, at first it was pretty vague, like, what does it mean that? I mean, there’s a belief that, like, maybe we can build something like AGR, you know, within our lifetime. So might as well try it like to see it as the most exciting thing you can do. But since then, I think it’s gotten a bit more concrete. And now I think we’re in a world where this definition of the system that does the majority of meaningful knowledge work on a computer is in the realm of possibilities, like it’s not. It doesn’t feel like sci-fi to me at all. It seems like something that we’re just inevitably headed towards. And so if that’s the system you want to build, you then have to think backwards towards. What does that mean from a product perspective, from a research perspective, and we basically, like, started thinking about, well, what does the world look like a few years from now, like, you know, once we start making, as a field, a wedge into starting to do some meaningful knowledge or computer, where does that even start? Where does that happen, and what does the world look like? And one useful place to think is that now that we have, you know, before language models, we didn’t even know what the foreign tactic would be, right? It was the fact that language models work was pretty crazy. It surprised everyone, and it’s still today. I just remember what the world was like, before then, and it’s just kind of magic that it even worked. Like we, this is one of those things, right? It happened, and we don’t care. Like, yeah, language models are just magic.
Eric Dodds 27:48
Yeah, I just want to stop and appreciate that you have been researching AI for a decade, and that’s the way that you describe it, was that everyone was surprised at this, because I was thinking, I wonder if Misha, you know, sort of could see this acceleration happening. But it sounds like that was a pretty surprising leap forward.
Misha Laskin 28:11
You know, I saw it happening before my eyes, because I was, you know, like many researchers, was on the front line. But there was always this question among many researchers, myself included, which was like, yeah, it works at this and this, but will it really scale? And do these things, you know, and different AI researchers at different points in time in their careers, got scaling pilled and realized that, wow, these things do scale. Some people, for some people, it happened earlier, right? I think the opening I crew, it happened earlier. I was, I would say, somewhere middle on that spectrum. So, you know, early enough where got, got to be, you know, part of, like the early team in Gemini, and really built that out. But still, I feel like I was, it felt like I was a bit late to
Eric Dodds 28:52
the game. Fascinating. Okay, sorry to interrupt. Okay. So yeah, reflection, you are looking several years ahead, imagining what it takes to, you know, for AI to do a majority of knowledge work on a computer, and you’re working back. So where, where did you like, where’s the focus, right? Like, you know, because that’s a pretty broad thing, right? Like, knowledge work on a computer is pretty
Misha Laskin 29:18
broad, yeah. So I’ll start with the punch line first, and kind of explain why, just to contextualize. So we decided that the problem that needs to be solved is the problem of autonomous coding. So if you want to build for this future if you have systems that do the majority of knowledge work in a computer, you have to solve the autonomous coding problem. It’s kind of an inevitable problem that just must be solved, and the reason is the following language models, like the way language models are most likely going to interact with a lot of software. The computer is going to be through code. You know, we think about interactions with computers through keyboard and mouse, because the mouse was designed for this. Yes. And by the way, right, the mouse was invented, what like 60 years ago, like the angle bar, kind of mother of all demos was in the 1960s so it’s to, you know, it’s actually like a pretty new thing, and it was an affordance that unlocked our ability to interact with computers. Now we have to think about it as knowing that now the form factor that’s really working is language models. What is the most ergonomic way for them to do work on a computer? And by and large, it turns out that they actually understand code pretty well, because there’s a lot of code on the internet, and so the most natural way for language models to do work on a computer is basically through function calls, API calls and programmatic languages. And we’re starting to see the software world kind of evolve around that already, like Stripe. A few months ago, it released an SDK that is built for, like, a language model to basically transact on stripe reliably. And we think that a lot of the software, like Excel, for example, do we think that a language model is going to drag, you know, an AI is going to drag one else around that people do to, like, click a table in Excel and manipulate data that way, almost certainly not. It’s going to probably do it again, through function calls, right? We have, like, we have SQL, we have querying languages. And so we kind of need to think about, how do we believe Saho will get re-architected in a way that is ergonomic to the AI Institute. So that’s how I was thinking about things. And if you think about that way, you just realize that, I mean, there’s always going to be a long tail of things that, you know, there are going to have code affordances for, but a lot of the meaningful work will like a lot of these big pieces of software that people use today. And you know, where you do most of your work today will have like, affordances through basically programmatic affordances. So if that’s when you believe the world looks like, at least, you know, like significant parts of knowledge work on computers done that way, then the bottle neck problem is okay. Assume it’s like, has you it has all these programmatic affordances. How do you build the intelligence around and so the intelligence around that is an autonomous coder. It’s something that kind of, you know, it’s not just generating code, it’s also thinking. It’s reasoning, right? It’s saying, I think now I need to go, like, and open up this file and, you know, search for this information, and then, you know, maybe send an email to this person, right? Like, I need to be thinking and kind of reasoning. But then it was, it’s acting on a computer to code. So then we found thinking backwards, and we thought about, okay, what is the category that today, like, basically, today has, like, the affordances that we need to start and it’s, like, very valuable, and something that we do all the time that we can have kind of high empathy for as product builders, and so we it inevitably just converge on autonomous coding for us both, because we believe that that’s sort of the gateway problem to automation a whole bunch of pieces of software, they’re not coding, but coding is also the problem setting where that is ripe today For language models, because the ergonomics are already there. Like, because, like, language models are good at code, because there’s a lot of code on the internet, and so you don’t need the ergonomics. You know how to, like, you can build tools to read files, to our current terminal, to read documentation. And so it’s just kind of a right category today that’s truly valuable that we understand very well, that also is the bottleneck category to the future. So that was how we ended up kind of centralizing CO fascinating.
Eric Dodds 33:29
What talk a little bit about automation. So, you know, operating autonomously makes total sense of code. What do you know based on your study of AI over the last decade? That’s an area that’s right for this. What are other areas where you think that automation is, what are the other areas you think in the relatively near term you know are right for automation as well?
Misha Laskin 34:00
There are a bunch of them the way I would think about it, the way, the way we think about it. And this is true both for what we’re doing and both for other companies that are working in this sort of, yeah, automation with AI kind of building, building, autonomous agents, be it for coding, for something else. Is that? What you’re really saying? Yeah, I think the good analogy here is this sort of transportation, like we’re going from cars as they are today, to autonomous vehicles. I think it kind of lands here as well. And the way to think about it is that chat bots, like chat, GPT and complexity, and you have a co-pilot of these products that are much more, you know, chat like you ask them something, they give you something back. We think about them as, like, the cruise control of the vehicles of transportation vehicles, because they kind of work everywhere. They’re not fully autonomous on anything, really yet, but they work for everyone. And so there are these, like, general purpose tools that are kind of, you know, that are cruise controls augment the human Now, if you’re trying to build. Build a fully autonomous experience like, you know, this people refer to as agents today, the same thinking, you know, it’s much closer to how you would think about designing an autonomous vehicle. Autonomous vehicles don’t work everywhere from day one. They have a geo pricing problem, and the kind of players that one are, you know, way more, I think, is I Governor Waymo, when I was in San Francisco last it was just this magical experience. And they did a fantastic job by basically nailing San Francisco, and they geo fence it. And you can, you can’t be on high rivers, you can’t do all these things that you can do in a normal car. But within the geo fence area works so well that it is just a transformative, magical experience. And I think that is how people should be thinking about autonomous agents. So we shouldn’t be actually promising. You know, we were promising a fully autonomous vehicle, like in the future. So right? Promising a thing that automates a lot of stuff in a computer future that’s clear where things are going. But today, the important problem is geofencing. And so yep, what are examples of that? I think customer support is an area that has shown this kind of workflow work really well. How do the geofencing analogies transfer there? It’s that some tickets that your customers, you know, are asking about can be fully resolved, like maybe they have written a simple question that’s actually an FAQ or something like this, and so you’ll route that to an autonomous agent that will just solve that. And the tickets that are more complex, you’ll send to a human, or if, like the customer, asks it to be escalated, you’ll send it to a human. So there’s sort of a like, I think that successful product form factors in agency and autonomy have this sort of geo fencing baked into them. But they kind of take on the thing they can do well and then help the customer outsource the thing that they can’t do well yet to, like the normal, you know, state of affairs.
John Wessel 36:48
So I’m curious about your opinion on this. I think there’s an interesting loop here where, yeah, like, it makes total sense, like, interacting with this AI thing, and then, like, human, you know, human in the loop type thing. But I think there’s also this aspect that enough companies have to, like, generally be able to do this well, from a human adoption standpoint, right? Because let’s, let’s say that like this, but say this was a solved problem, but essentially so say it was a solved problem. But essentially, 5% of companies like the technology where this works. Well, like, humans are going to be like, I want to talk to a person. They’re just going to like, you know, try to get, try to get past the AI agent as soon as possible. So I’m curious about your thoughts with that, because there’s this, there’s this, like, what’s possible problem, and there’s like, will humans adopt it? Will humans use it, because you guys must, you know, face that building a product. Yeah?
Misha Laskin 37:45
So I think for us and for others, like to complete the customer support thing, the ideal experience is a human doesn’t even know, like it, just the customer came in, their problem got solved, and they didn’t know. They didn’t care, or,
John Wessel 37:57
know, give it a name, give it a face, right? Yeah. And
Misha Laskin 38:01
That’s the way we think about autonomous coding. So the kinds of things, you know. So when we think about geo fencing, we think about we, you know, you want to go for tasks that are actually pretty straight forward for an engineer to do, because these models aren’t, you know, like super economic yet, but you want these tasks to be things that are tedious and high volume, and the engineers don’t like so there’s so many examples of these things, like code migrations. There’s so much part of, like a migration when you’re doing like, this version of job to that one that is kind of thankless work. Or, you know, suppose you’re relying on writing tests. Suppose you’re relying on a bunch of third party APIs or dependencies. It got, you know, an API update. It wasn’t backwards compatible. Your code fails. Your engineer has to change what they were doing to go fix that. And right, it’s sort of, again, undifferentiated work. That’s especially for companies that have very sophisticated engineering teams and are doing a lot. They end up having a sort of backlog of these kinds of small tasks that actually do not really like, well, differentiating tasks for them as a business at all. And so these are the kinds of tasks where a product like ours comes in. We, you know, when customers kind of talk to us, we like we, they don’t even think necessarily of like a co pilot like product, because they think about, if we can just automate these, you know, for them, some subset of them, right, some subset of his migration tasks, or like third party API gradient, or have you some sub set of her backlog, then it’s something their engineers never even had to do. And so whereas, like the co pilot, helps them do the things that are on their plate faster and interesting from the developers perspective, like so much customer support use case, it should be indistinguishable for the task where it works from like a competent engineer sending them a pull request to review, right? Like a failure mode for a company that does autonomous coding is that you took on more than you could chew, and your agent is sending bad pull requests, and now the developers are wasting their time, like. You jump right, right, right. So as long as from like, you know, you have to be pretty strategic about the tasks that you’re that you pick on and sort of not promised, you know, set expectations correctly and deliver an experience that is basically indistinguishable from a competent engineer doing this. Yeah,
John Wessel 40:18
so that’s really interesting. So essentially, and I’ve, I mean, this is an overused term, but essentially, like, this could look like some kind of, like, self healing component of an app. So, like, from an engineer’s perspective, like you could, like, engineer this end of the app, and it’s able to autonomously take care of API updates and maybe a, you know, a couple other things that’s, yeah, that’s really interesting.
Eric Dodds 40:45
One question I have is around what it takes to get to fully autonomous, right? So you use the example of tests, or, you know, API integrations, or, you know, other things like that. And you use the example of of self driving vehicles, right, even within the context of geo fencing, right for way Mo, still in the development curve of that, you know, they had vehicles that could do a lot of stuff, but like the last 20% 10% was really hard, because they had to deal with all these edge cases. And even, you know, geofencing, I think helped, geofencing helped, you know, limit the scope of that, but it was still really difficult to, you know, to solve for all these edge cases. Is it the same way when you think about an autonomous, autonomous coding, like is the last 10% really difficult to go from, you know, this is something where you’re where it is truly autonomous.
Misha Laskin 41:47
Yeah, I think that it’s, there’s kind of a yes and no part to that. So the part where I think the analogy to autonomous vehicle breaks is that an autonomous vehicle is truly autonomous, and, like, safety is so important that there’s, like, absolutely no way it can do anything wrong, right? But in this instance, right? Suppose that, like a coding agent did most of what you asked it to do, but didn’t do like, you know, it missed some things. Well, if it was, if it did stuff, it was pretty reasonable, right? Then you just go into code review, then tell Hey, you missed this. Yes, just like you would within developers, yeah, I think that Tom the kind of failure tolerance is, I, like, you know, there’s more tolerance in, like, digital applications like this. Now, the thing is, what you want to avoid is, you know, a model that you asked it to do something that came back and just, it just wasted your time, basically, right? It’s like, there’s, it’s kind of the amount of time it will take me to go, like, back and forth, just wasted time. So it’s similar to how, like, when you hire someone, if you if it’s someone who, like, let’s say he was just not trained in a software engineer and like, it would take longer to, like, upskill them and train like, yeah, to be a software engineer and just to do the task yourself. So I think that the actual eval is like, is this like, net beneficial to you as a developer? Like, are you spending less time on doing things you don’t like to do with this system, or not rather than like, meeting that level of perfection in autonomous makes total
Eric Dodds 43:14
sense. Okay, you mentioned evals early in the show we were talking about earlier, and how you said that’s one of the most important aspects of this, especially as it relates to data. So, yeah, I think the last topic we should cover is your question, John, which we made everyone wait a really long time around. You know, data teams games. Yeah, exactly, exactly. So John, why don’t you revisit your question? Because I want to wrap up by talking about the data aspect of this. I mean, I could keep going, asking a ton of questions, so interesting, but,
John Wessel 43:52
yeah, I think, you know, obviously a lot of our audience, you know, works on data teams, and I think I’m personally curious, and I bet a lot of the audience is curious about what, what does it look like? So say, I’m a data team that works for reflection. I’m on that data team and dealing with AI agents and on a daily basis, like, what? What is it? How is it similar to what I might do at a B to B tech company or, or an industry and, what are the, you know, main differences
Misha Laskin 44:23
something, as I mentioned earlier, when you kind of first asked the question, I think something that is possibly the most important thing to any successful AI project product or research, is getting your evaluations right. So actually, the most like successful AI projects, they typically start with some phase where they don’t, they’re not training any models, not doing anything like this. They’re just figuring out, like, how are we getting value and success? And the reason this is something that when we typically write, adopt, let’s say, when you see, like, all these coding products and like, AI products in the market, there’s a sort of, like, shooting from the hip thing, where it is. Like, I put it through some workflow. Here you go, customer like, right? Does it add value or not? Whereas, the way, like I’ve seen, like, successful products like this built out like, how does, for example, like, when, when you know, a company develops a language model, like a GPT model, or general endeavor, how does it know that thing that’s training like the people that, if users will like it, right? You have to, you have to develop all these evaluations internally that are really well correlated to what your customers actually care about. And so in the case of chat bots like that, evaluation is basically like preferences. If you have your data team, what does it do? What does a data team do for like, a normal language model, chat bot like product, they get a lot of data from human ratings. That is, you know, they have different problems. And then, you know, those raters, basically, you know, say which ones which prompts they liked more over the others. And so typically, it means that the thing that gets weighted is, like, more helpful responses, things that are formatted nicely, things that are, you know, safe, right? Like they’re not offensive, and those, it’s really important to set up those evals that you’re benchmarking internally to actually correlate with what your customers actually care about in your end product. And I think that that’s something that it’s kind of a new way of operating, because you’re like, these systems are deterministic, like, you know, like software as we know it, and so when you’re shipping like something that is probabilistic, that is going to work, in some cases, not work. In other cases, you have to come in with some degree of confidence, like, whether you know we’re coming to a customer. Sometimes our use cases will not be a good fit for us, because we built evals and we were able to predict that actually, for these use cases, like, models are not ready yet, yeah.
John Wessel 46:46
Can you give us an example of just a really simple eval? Like, what would that look like? Yeah? So
Misha Laskin 46:53
for example, like, for coding, right? That’s kind of what we’re building, these autonomous coding models. And the eval. What is the eval there? The eval there will be, from a customer perspective, will they actually merge the code that are and how long of an interaction, or back and forth? Will it take them to merge? Right? So then the question is, well, we want that experience to be delightful for customers. We’re not going to, we don’t want to, like, set up complex evals for every customer, because that’s just going to be a waste of their time. So it’s how do we set up internal evals that are kind of representative of what our customers care about? And so an example of this is, well, if we care about the merge rate, like the merge rate of pull requests from our customers, then we should be tracking like the merge rate on similar kinds of tasks to what our customers have. So, you know, some things that are right. So we have different task categories, like migrations, cyber security vulnerabilities, these sort of third party API breakages, right? And you know, your data team, what it does is that the eval side of it is that it curates data sets that are represented in that and then for every version of our model, right? We run, we basically run it through these evals, and we have different evals for different use cases, and we’re seeing like, where our models stack up. You know, some of them, they do better, some they do worse, but it allows us to come to customers, and when we’ve identified a use case that is a good fit, have high confidence that it’ll be a delightful experience. And I don’t think most teams that build products that maybe do not come from a research background are as scientific about it, because setting up the eval takes a really long time, and it’s just kind of a pretty complex process, right? Where are you going to like, where are you going to source like, the coding raters, or going to basically rate whether you merge these things or not? How are you going to manage that team? Where are you going to source the tasks from that are representing what your customers care about? These are the kinds of questions that the data team answers, and more. So right beyond that, it’s how do we collect the data that we need to train models to be good at the things that the customers care about in various aspects, right? How do we collect data for supervised fine tuning? How do we collect data for reinforcement learning? So the data tree, you need to be as nimble on data research as you are on basically software and model research. Right? We think a lot about algorithms and model architectures and things like that. And the thing that maybe is equally important but less frequently talked about, like in papers, is the data research that needs to go into, like operational data research to make sure that these systems are reliable the things you care about,
John Wessel 49:37
right? I love that it is so interesting and very true of people too.
Eric Dodds 49:41
Well. I was just gonna say there’s timeless there’s a lot of timeless wisdom in that approach as well. Well as we say, We’re the buzzer. Misha, I do want to ask one really practical question. I know reflection is still, you know, in stealth mode in many ways, but I know. Probably a lot of our listeners have tried or are exploring different, you know, tools around augmenting the technical work that they do every day. From your perspective, if someone is saying, okay, you know, I see all these posts on Hacker News about, you know, these tools and, you know, bots that can help me, you know, or CO pilots can help me write code. Where would you encourage people to to dig in if they feel either overwhelmed or they’re kind of new to exploring, you know, exploring that space of like aI augmented technical work and coding specifically, I
Misha Laskin 50:35
think that if people are just kind of dipping their toes and just getting started and trying to explore this space, the best thing is to sort of use products that are, you know, use coding products that, like a co pilot or a cursor, that are these kind of initial, you know, like they’re kind of, as you’re talking about, like cruise control, right? I think that that’s how I actually, you know, I started using both products. Like a lot of members of our team use those products, and, you know, they’ve been very, very informative. And as I said, kind of, in a sense, sort of complimentary. I think that getting, like, getting autonomy right and getting agency to work is a more complex and nuanced problem. And typically what we find we talk to customers by the time they’re thinking about automated agency. They’ve already, like, been using co-pilot for some time, and like, they’re pretty well educated on, like, what kinds of problems they believe they have or can be automated, right? So if it’s someone coming from blank slate, I would probably, you know, take a like, off the shelf product, like a co pilot or cursor, and give that a shot and sort of start just trying it out empirically and seeing, like, what sorts of value it’s driving for them.
Eric Dodds 51:42
Love it, all right. Well, Misha, best of luck as you continue to dig into research and build product and when you’re ready to come out of stealth mode, of course, you know, tell John and I so we can, you know, so we can kick the tires, but we’d love to have you back on the show to talk about some product specifics in the
Misha Laskin 51:58
future. That sounds great. Thanks, Eric, thanks John for having me.
Eric Dodds 52:02
The Data Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.