Episode 208:

The Intersection of AI Safety and Innovation: Insights from Soheil Koushan on LLMs, Vision, and Responsible AI Development

September 25, 2024

This week on The Data Stack Show, Eric and John welcome Soheil Koushan, a member of the Technical Staff at Anthropic. During the conversation, Soheil discusses his journey from self-driving technology to his current work at Anthropic, focusing on AI safety and cloud capabilities. The conversation also explores the allure of machine learning, the challenges of ensuring AI safety, and the dynamic between research and product development. The group emphasizes the importance of responsible AI development and the complexities of defining safety across different cultures while also highlighting the transformative potential of AI and the need for ongoing dialogue about its implications, and so much more.

Notes:

Highlights from this week’s conversation include:

  • Suheil’s Background and Journey in AI (0:40)
  • Anthropic’s Philosophy on Safety (1:21)
  • Key Moments in AI Discovery (2:52)
  • Computer Vision Applications (4:42)
  • Magic vs. Reality in AI (7:35)
  • Product Development at Anthropic (12:57)
  • Tension Between Research and Product (14:36)
  • Safety as a Capability (17:33)
  • Community Notes and Democracy in AI (20:41)
  • Expert Panels for Safety (21:38)
  • Post-Training Data Quality (23:32)
  • User Data and Privacy (25:32)
  • Test Time Compute Paradigm (30:54)
  • The Future of AI Interfaces (36:04)
  • Advancements in Computer Vision (38:46)
  • The Role of AGI in AI Development (41:52)
  • Final Thoughts and Takeaways (43:07)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:06
Welcome to the data stack show.

John Wessel 00:07
The Data Stack show is a podcast where we talk about the technical, business and human challenges involved in data work.

Eric Dodds 00:13
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Soheil Koushan from Anthropic. Soheil, we’re so excited to chat with you. Thanks for giving us some time. Yeah, of course, I’m really excited to be here. All right. Well, give us just a brief background.

Soheil Koushan 00:42
So I started working in AI in 2018 self driving was my first gig. I worked on five for five years at a self driving trucking company called embark, trying to make big rigs that drive themselves. And then continued my work in AI by joining Anthropic I joined earlier this year, and I’ve been working on making cloud better for various use cases, especially in the knowledge work domain. So

John Wessel 01:07
So hail, we talked about so many topics before the show. It is hard to pick a favorite, but I’m really excited about talking about use cases and talking about common mistakes people make when interacting with LLMs. What are some topics you’re excited about? Yeah,

Soheil Koushan 01:21
I think I’d love to just talk a bit about where I think things are going from here.

John Wessel 01:32
Awesome. Well, let’s

Eric Dodds 01:33
dig in soil. What interested you about machine learning? I mean, that was something that you wanted to explore. You considered graduate school. You ended up joining a self-driving startup, but of all the different things you could have done in the technical or data domain, machine learning drew your attention. What kind of specific things were attractive? Yeah, it might, I

Soheil Koushan 01:58
might be oversimplifying it, but to me, it felt like magic, like seeing some of the early vision models do things that you know, I as an engineer, as a software engineer, as a technical person, had no idea was possible and had no way of explaining how they worked. Like, anytime something is cool and you don’t know how it works, it’s like, indistinguishable from magic, like, there’s a quote that goes along those lines, and with the realization that, wait, I could do this, like I could be a magician, like I could build this, I could figure out how it works. So I think it was just like a shock around, like what I would have previously thought was impossible.

Eric Dodds 02:35
Yeah, do you remember maybe one of the specific moments when you saw a vision model do something, and that was, you know, there are probably multiple, but like, one of the moments where you said, Okay, this is totally different.

Soheil Koushan 02:52
Yeah, I think for vision, it was probably like the early bounding box detectors, like I remember playing around with classical, like heuristic ways of trying to, like, understand what’s in an image using OpenCV, and then, like, seeing the first like real deep learning based bounding box of textures that can also track objects over time, after having played around with, like, algorithmic computer Vision, and then seeing, Oh, whoa, this is like, way better. It’s able to work in a variety of conditions, different angles. Was like, really cool. And I had a very similar moment in LLMs, like, I remember seeing the, I think it was GPT, two or three blog posts that had Alice in Wonderland, maybe, like the whole book, or maybe a chapter, and it was like recursively summarizing it, to distill it from 30 pages to 10 pages, and then from 10 to sort of five, or sorry, from 10 to like five, and then eventually one paragraph. That was one of those holy crap moments for me, where it is, it’s like it requires an actual, like deep amount of understanding of the content to be able to summarize and to then be able to do it with, like, new phrasing, new ways of like, rewriting the story was a leap that I had, like, never seen before until that point. So those are like two, two key moments that I remember.

John Wessel 04:15
So we’re gonna spend a bunch of time talking LLMs and AI, but I have to ask on the computer vision side, that that, I mean, self driving still, like, very, you know, present in the news and stuff, but computer vision in general, like, you know, I think LLMs have really, like, taken, taken over, kind of the press. What are some computer vision applications that you think people, like, don’t know about, or some, like, really neat things that, like, maybe wouldn’t show up for an average person. Yeah,

Soheil Koushan 04:42
so I hate to use self driving, because it’s probably overdone, but also probably the average person doesn’t know that. If you come to San Francisco, you can take a self-driving vehicle anywhere around the city completely autonomously, and you can download the Waymo app and do it today. There’s so much work that’s gone. To that, like over 10 plus years of engineering, and I think it’s definitely still the coolest application of computer vision. I do think that even the longer term horizon, like VR will probably be a very interesting application of computer vision, like I saw metas, latest release segment anything to, I think, was the model that they shared, and it is a transformer based model that essentially allows you to like, pick any arbitrary object and have it segmented, and be able to like, understand the semantics of, okay, this is an object versus background, but also be able to track that over time in a way that, like is was extremely robust, especially if once the frame goes out of the scene and comes right back. So there’s, like, so many cool applications in VR, and I think the technology is advancing pretty quickly, and, you know, maybe even just actually going stepping back from VR, like people are working on humanoid robots. I think that’s a whole topic of like, worth discussion, and I don’t have actually strong opinions on it, but a humanoid robot would require the level of computer vision understanding that goes actually beyond what cars are able to do today. So that’s, I think, another area where revision will, like, become really important, yeah.

John Wessel 06:22
And it’s always fascinating to me, right where, like, a lot of times, like, you see the advances like, or the threshold of like, super usefulness, like, post, like, like, so let’s say everybody kind of moves on to, like, humanoid robots, and then all of a sudden, like, cars finally hit. Like, oh wow. Like, this is like, we’re here, but everybody else has kind of moved on and to the point like, I think that happens because, like, you know, you if you can get it right for robots, which is even harder, harder goal, like, you can you solve some of those, like, downstream problems that I needed for, like, that last 5% for cars or for trucks, yeah.

Soheil Koushan 06:57
And like, you know, there are real applications of computer vision today, like in manufacturing and sort of factories, like, there’s robots that do a lot, and a lot of them have really advanced heading edge, like computer vision going on. So beyond just sort of like futuristic use cases, there’s a lot of really cool use cases today. Yeah.

Eric Dodds 07:16
Okay, so hail, you saw, you know, early major advances in computer vision, and then LLMs, and it was magical, but now you’re behind the curtain, or have been behind the curtain, Does it still feel as magical? Or do you feel like a magician?

Soheil Koushan 07:35
That’s a great question. Yeah.

John Wessel 07:39
So one of the things

Soheil Koushan 07:40
What’s kind of surprising is that when I was working on self driving, I kept being a bit of a pessimist, like I was like, Hey, I think this will take longer than people are saying. I think that we’re being a little bit optimistic. There are so many situations where I can fail. The level of reliability we need is so high that it’s longer than people think. And I don’t feel that way about LLMs and transformers. I actually feel like the hype is actually warranted. And in both situations, I was like, behind the curtains, right, right? And I don’t think I yeah, my only takeaway is that I do think that this is real. And I do think that, I do think that this is like magic that we’re building, and I do think that it will progress really rapidly. And I, yeah, I’m super excited to be a part of it. And I do think, like, Anthropics finding makes a lot of sense when you are aware of just like the rapid pace of progress in space.

Eric Dodds 08:39
I wanted to actually dig into that. And I really enjoyed consuming Anthropics literature, because I think, number one, the clear articulation of incredibly deep concepts is absolutely outstanding. But two, I think you address a lot of really serious concerns around AI and the future, and specifically, like any technology, what happens if it’s used in ways that are really damaging? And so I just love to dig into that and hear from someone inside of Anthropic and maybe one thing that we could maybe start out with is just to I think this is something a lot of people talk about, especially when you if you think about your average person, right, they’re not deeply aware of the inner workings of an LLM or Transformers or other components of this. And so what are the dangers? What do you think about, like, what are the real dangers? Anthropic that makes safety such a core component of the way that you’re approaching the problem in the research?

Soheil Koushan 09:54
Yeah, I think my mental framing of it is that this is, like, incredibly powerful. Technology, and incredibly powerful technology can be used for, you know, good or for harm, yeah, and this is true for all kinds of technological innovations that we’ve made, right, like social media can be used for good or for harm, right? The internet has obviously been 99% good, but can also be used for harm. But I think, you know, the current pace of AI progress is showing us that this technology is super, super powerful. And I think, you know, I try to put myself into the mindset of, like the Anthropic founders, right? So they’re part of OpenAI. They’re working on research there. I think Dario was like, Head of Research, or VP of research at OpenAI, and they’re seeing the progress that’s being made from our GPT, one to two to three, and they’re like, Okay, this is going to be huge. Like, this is one of the most powerful technologies that humanity’s ever created. It’s very possible that in a few years, we’ll have, like, super intelligence. Like, we need to, like, think about this seriously. Like, it’s like, you know, nuclear weapons. Like, they’re like, we’re able to build this thing that can, like, blow up an entire city like this is pretty serious stuff. We need to, like, think about the implications of this, right? And so. And the other thing about a sort of AI and the current sort of like technology is that, like, it’s kind of inevitable, like, even if you know, OpenAI were to suddenly stop building it, like other people will build it and it will exist. So it’s almost a necessity that someone is taking, like, a good, hard, serious look as to the implications of what we’re building in a way that’s maybe a bit more serious than back in social media days, where, you know, it was like, This looks fun. Let’s just build it and, like, not really think through the implications of this technology. So that’s kind of an Anthropic-like mission. I think it’s basically to ensure that the world safely makes the transition through transformative AI. Like transformative AI is happening. It’ll very likely be built at one of these three labs. But what’s most important is that the transition that humanity is making goes well, is that the world, you know, ends up being in a better place in the end. So that’s kind of the mission. And I think everything that Anthropic dies does is connected to that mission. And so like, like, doing interpreter research, doing safety research, doing capabilities research, doing building products are all in service of this, like, bigger goal,

Eric Dodds 12:20
yeah. How does the product piece play into that? Specifically, because it’s an interesting approach, right? Because usually products sort of come sequentially after research, right? If you think about academia, right? You have a bunch of research that’s done, and then it’s like, okay, well, we could build a company around this, or a product around this. And those things are happening very simultaneously at Anthropic, at a very high sort of, I guess, level or pace. I’d love to just know how that works and why. Yeah, I

Soheil Koushan 12:57
I mean, I think product is incredibly important, and Anthropic is investing heavily into it. You know, we hired the co-founder of Instagram and the former CTO Mike to sort of lead our product work here. And I think it’s important for a few reasons, like one, having your technology into the hands of like, millions of people is really helpful for understanding it, figuring out the dynamics of, how do people use this thing when it’s out there, in what ways does that work in? What ways is it not? Because again, if the goal is to make this useful for humanity, it should be interfacing with humanity, and we should figure out how humanity is going to be interfacing with it, so we can learn and make it better and make it maybe more steerable. We figure out what people care about and don’t care about, and actually see it’s back into our research, right? So that part is super important. It’s also super important as a business like Anthropic needs to have a thriving business. It needs to be a serious player, from a financial perspective, to be able to have a seat at the table, whether that’s in the space of government, in the space of having sort of investors, invested in tropics, so that we can continue our work. And so I think those two together make it so the product is, like, very important for us. Is there a

Eric Dodds 14:09
tension in the company between the research side and the product side? And when I say tension, I don’t necessarily mean in a like, challenging way, although I’m sure that there are some challenges. But is there a healthy tension there in terms of balancing both of those and just the way that the company operates? Because the outcomes and the way that you would measure success historically tend to be very different.

Soheil Koushan 14:36
Yeah, I actually think that it is very healthy here at entropic like specifically research breakthroughs create space for new, incredible products, and relaying that all the time to the product folks is super valuable. And then the inverse is also true, where, hey, we have this product, but it’s really lacking. Me, specific ways these can then feed back into research to figure out, well, why can’t Claude do this? How can we make it better at this? And so this, like constant back and forth between product and research is, I think, really key to, like, building long, lasting and sort of, you know, useful products, like artifacts, is, you know, on the surface, just a UI enhancement, like, you know, you could recreate artifacts in other places too. But because of this, like, constant back and forth between research and product, we’re able to, like, come up with paradigms, figure out things that work and don’t work, and ship them and create, like, really meaningful value for people in a way that, you know, I think you’re not seeing as much innovation when it comes to this. I like, in the industry broadly, you’re especially seeing it at startups like I think startups in particular, come up with really good ideas, but I think at the biggest companies, everyone’s kind of working on the same thing. So yeah, I do think that that sort of interplay is really important. And another one is just like, Well, what about safety and product, right? Or, what about safety and research? Like, how does that play into sort of like, Are there tensions there? And I think one thing that’s really helpful there is the responsible scaling policy that we have, which basically sets like, the rules as to what kind of models are we willing to ship, and how do we test them for the things that we care about, like, does this model make it easier to create bio weapons or not? And if that’s the case, then we will not ship it, regardless of whether we have, like, really cool product breakthroughs that will go on top of it. And it kind of becomes like the goal post and sets the stage. And as long as we all agree on the RSP, the need for one, and then also just some degree the details of it, then there’s no, you know, then you can debate the RRSP and, Hey, are we being too strict? Are we not? But the decision about whether to not to launch something is just about, does it sort of fit with the RSP or not? Like, it’s not like, I want to ship versus you want to ship. It’s like, does it? It’s like an objective question of whether it sort of fits within the RSP or not. So that’s like, a really cool tool we have to be able to scale responsibly and, like, make sure that everyone’s aligned and on the same page about it. Another note I have on this is I kind of view safety as a capability, like we talk often about this idea of Race to the Top. So if we’re able to build models that are less jailbreakable, that are more steerable, and follow instructions better and don’t cause harm for people, that then creates incentive for everyone else to match us in that capability. And these are capabilities people will be willing to pay for a model that doesn’t actually like his customer support bot that doesn’t accidentally say rude things to the customer or accidentally be like, you know, make decisions that it shouldn’t, and it’s really good at instruction following. Those are like capabilities in jail, reviewable. You can convince it to give you a discount, right? Those are things that are actually valuable for people, and so safety and capabilities, a lot of times are actually, like, combined, like, one thought experiment I have is, if you truly had an incredibly capable model, then you could just tell it, hey, here’s the Constitution. Here are the 10 things that humanity cares about. Follow it, and then you’re done. You know, like, because it’s so capable of understanding and like knowledge and like it can think through things really deeply, giving it the exact list of instructions that you want it to follow, and then it can sort of be perfectly aligned to those, right? Yeah, so that’s a bit of a thought experiment, but I do think there’s actually overlap between safety and capabilities.

Eric Dodds 18:34
Yeah, I love that. Okay, John, I know you have questions about data, but I have one more question on this sort of safety and Anthropics, you know, convictions and view of things. So we talked about a model that harms someone. And I think one of the really fascinating questions about developing AI models is that if you look around the world, the definition of harm in different cultures can vary. And so how do you think about developing something where safety is a capability, when there is some level of subjectivity in certain areas around these definitions that would define safety as a capability. Yeah,

Soheil Koushan 19:22
This is really hard. Like different cultures have different definitions of harm, and I think hopefully we get to a world where, to some degree, it is almost like democratically decided what we’re training these models to do and what we’re asking them to behave like. I think for now, the best we can do is sort of come up with a common set that has the biggest overlap with most places in the world, and is like following all the rules and regulations that every place is decided on. So it’s like the minimal set of overlaps, but in a future where we have, like, really easy to customize models, you could give it a system prop and say, Hey, actually, in this country, it’s a bit more okay to talk about this. Or in this country it’s. Not okay to talk about this in this way. And, you know, I think hopefully we can, like, give people to the degree that is, like, you know, reasonable, the ability to, like, steer the model, to sort of behave in a way that makes sense for their locality. Yep. Yeah, there are limits, of course, but yeah, sure, sure, yeah.

Eric Dodds 20:19
I mean, it’s, I love that. He just said, this is really hard, like, yeah, that’s sort of fundamental, you know, I think philosophers have been, you know, debating the roots of that question, you know, for millennia.

Soheil Koushan 20:32
Yeah, I think it kind of happened with Elon and Twitter. He bought it. He’s all about free speech. And then he realized, Okay, well, there’s a reason, like, we have some level of fact checking, and there’s some level in our community notes is actually a very prominent feature now. And like, I think as soon as you sort of think about it a little bit further, you realize that there’s some level of democracy or community or sort of connection, or sort of alignment that needs to happen between groups of people. It’s never like purely, sort of like clear cut,

21:03
yeah, yep.

John Wessel 21:04
So on the data side, I was excited about digging into this. Obviously, you have a ton of data that you use, you know, to train these models, a ton of compute required. So huge, large-scale problems there. I want to talk about that. But I actually some other things you, you said, prompted this question in my mind, when you’re talking about, like, you know, we would want to ship a model where you could build a bio weapon, how do you get the right people in the room to know that would be possible? Because I don’t know anything about bio weapons. Like, presumably you, you don’t either. So like, like, let’s start there with data. Like, how do you even know, like, what you have? Do you kind of have, like, a panel of experts that span a bunch of different, you know, knowledge domains? Is that exactly

Soheil Koushan 21:49
that? So, you know, we have teams of people who are focused on exactly these sorts of questions. We like to leverage external experts. We leverage government agencies, and do all kinds of rigorous testing to understand, you know, risks around bio, around nuclear, around, you know, cybersecurity. It is really a panel of experts that contribute to making research to decisions,

John Wessel 22:11
yeah, okay, that’s awesome. So on the technical side, like, tell us a little bit about that. Like, how does that look? Obviously, it’s tons of data, you know, that goes in these, these training. What are some like, scale problems, technology type problems you guys have faced?

Soheil Koushan 22:24
Yeah. I mean, the scale of data is massive, right? Trillions and trillions of tokens, like dealing with the entire internet, text,

Eric Dodds 22:36
dealing with in

Soheil Koushan 22:39
many ways, yeah, dealing with the entire internet, yeah, it’s not just a sort of a data storage issue. There’s all kinds of other problems with internet data, and there’s multimodal data now, obviously, right? Like, there’s a lot of that, and that takes up significantly more space and much harder to process and networking and all that. So the data challenges are massive, and I think, like, you know, we have a lot of people working on data, like, a pretty sizable part of our team is working on data related to link. So I think it’s really important. Yeah, I think I’m a little bit lucky because I am a bit more on the post training side. So the size of the data sets that I’m working with are not as big as the sizes of data sets that folks are working on and pre-training, but you know, I think post training is scaling up. It’s getting bigger, the scale of the data, the amount of compute that’s going into it. So the challenges on this side are also growing. One thing that’s nice, though, is with data that is like, smaller and higher quality. You can visualize it better, you can leverage models to process them for you and give you insights. And I think that makes the problem a little bit more tractable on the post training side, but that’s kind of like the data angle for me. I think we also focus a lot on high quality data, so we have things like labeling lenders that we work with, and we ask them to do different tasks, and then they do it, and we go back and forth on, okay, we want you to do it slightly differently. We want you to increase the quality here. I want to increase volume. And so there’s a lot of data collection that happens on our side that’s a little bit different than sort of like the raw, you know, text crawling that happens a bit more on the pre training side, but that’s kind of the setting the stage as to, you know, what data looks like in me, and what it means here at Anthropic, yeah, is the

Eric Dodds 24:34
before the show. You talked a little bit about data as a differentiator and digging a click deeper into which part of that supply chain of data is the biggest differentiator, you know? Is it the pre-training side? Is it, you know, the labeling stuff, you know, where you’re doing a lot of the stuff in post training? Is there a particular part that has an asymmetrical impact on the outcome?

Soheil Koushan 24:58
Yeah, I think. Like, one example of a differentiator when it comes to data is, I think when cloud three came out, a lot of people felt like it was a really fun model to talk to, like It felt a bit more human than other models and was a bit more playful. And I think that’s like an example of a data collection effort that creates, like, tangible value for people, and that’s more on the post training side than it is on the pre training side. So it could be a differentiator. I think, like, you know, Anthropic pretty clearly, doesn’t train on user data. So you can look through a privacy policy, but everything that you sent to Claude is, like, never used to train Claude or to make Claude better. But I think you know, for certain companies do train on user data. You know, you have to sort of think through exactly what they’re training on, whether that’s okay or not. But that is pretty valuable, right? Like being able to think through interactions that someone has had with an LLM, places where they’ve complained, you know, that sort of thing can help, like, feed into how to make improvements or not. You can still on Cloud, like, give thumbs up and thumbs down buttons so you can directly get feedback. And those do get, like, reviewed by somebody to sort of help understand, you know, what in what way did Claude, like, succeed or fail? But I think, like, yeah, large scale data, proprietary data, is, like, pretty valuable. We also have, like data partnerships with companies that have valuable data, and we work with them. We figure out, sort of what’s the licensing model that they want to have, and then we, you know, after sort of figuring out a mutually beneficial agreement, we started including that into our training data. But yeah, I think on the opposite side of that is, I do think there is a cognitive core that we need to, like, get to when it comes to building LLMs, like, right now, a lot of there, and this is something that, like kirkpati mentioned in like, a podcast maybe a week or two ago, like, a lot of the parameters of these big models are going into, like, memorizing facts and the like core common sense and cogni capabilities can be distilled into like, a smaller data set. And this is where I think bigger models can help train smaller models, to help them to like, reason to like, know the basic information. Like models that need to know every single thing that happened on Wikipedia, but they need to be able to, like, know how to look things up on Wikipedia if that’s like, something that needs to happen for a given user request, right? So, yeah, I think data is really important. But once we have really capable models, we can try to still distill like that back down into very core cognitive capabilities that then can be used to, you know, create new data and have the models, like, run on their own and sort of learn from their own mistakes. And that can help, like, address like, the data bottleneck too.

John Wessel 27:48
So I’m curious about two things, like, on the use cases side of things, how, like, how internally do you all use, you know, LLM technology and, then, and then as a follow up, like, let’s dig in a little bit too. Like, how do you see people, kind of, outside of, like, your world, using it, and maybe, what are some of the mistakes they make? Yeah,

Soheil Koushan 28:12
Personally, I use Claude The most for coding. So I think most inquiries involve like, hey. Like, you know, make this better. Or, how do I do this? I miss language, but I think a lot of it is also just, like, general world knowledge, like, hey, like, my drain is clogged. Like, what would be the right thing to use? Like, things that would previously go into Google, and then you’d have to, like, open some blog posts with, like, 16 ads, and I’m at the bottom. It’s like, Okay, put bacon soda and vinegar, you know, like, now it’s just like, baking soda, vinegar, you know, it’s like, very direct. So, yeah, right, yeah,

John Wessel 28:49
that’s like, my one where it’s like, you have to scroll so far, and there’s, like, a recipe, like, like, you know, like, 30 pages, and it’s like, the very bottom, yeah, totally,

Soheil Koushan 28:58
yeah. It starts like, by explaining their life story and like this. And it’s like, okay, like, I just want to make spaghetti. Like, teach me how to do that. Yeah, so, yeah, just like, for all kinds of common queries. Like, one fun example is I had a friend who was a teacher and, or He is a teacher, and I reconnected with him after a long time, and I told him, I work in frothy. He’s like, Oh, yeah. Like, I use it all the time for like, lesson planning. Like, it takes care of so much of that. I’m like, Hey, today I want to do like, a lesson about x. And you like, come up with some ideas and then make like, some homework assignments. And it can like, you know, it’s, he said is, like, does a great job at that. So there’s all kinds of like, you know, things in the context of work that are super helpful. I use it a lot to just, like, do question and answering too. Like, instead of, like, reading some long thing, I’ll just, like, take it, I’ll throw it into Claude and be like, hey, like, this is the specific thing I’m looking for. Is it in here? Can you answer it? And that’s like, a big time saver. So I think probably I, you know, I should probably talk to, like, one. More average consumers understand where they use LLMs, but I think most people aren’t aware. Like, probably the average person in the world has never heard of anthropology and probably the average person in the States hasn’t really used LLMs to their maximum potential. And so I think it’d be, really should figure out where, like, the discrepancy is, and like, you know, where people are not aware of how like LMS can make their lives easy. Because I think it’s easy to be in an echo chamber like San Francisco and, like, assume that everyone’s using it exactly the way that you are. But I think that’s probably, like, very far from reality. Yeah,

John Wessel 30:39
so on the Pro on the prompting side. Just want to ask, there’s a, I mean, there’s a lot out there about, you know, people have done some pretty wild, you know, things with prompting, and created, you know, personas and all that kind of stuff. I’m curious, from your perspective, like, what, like, what do you think is the most helpful, just broadly, like, things you can do when you’re trying to get the best answers out of an LLM, when you’re interacting with it.

Soheil Koushan 31:06
Yeah. So we actually have this tool called the meta prompter, where you tell it, hey, I’m trying to do this. Can you help me, like, write a prompt and it’ll, like, recursively work on the prompt with you to make the prompt better invest suited for an LLM, so that’s like, an example of a tool that I think can help people, like, do prompt engineering. Actually, I think there aren’t, honestly, there aren’t, like, very specific tips that I have when it comes to prompting. I think using that tool can help you, like, see examples of, oh, this is what a good prompt looks like, versus a bad prompt. But I think in general, like making what you’re saying easy to follow, and having examples is probably, you know, an advice you would give to any person trying to explain something. But I think it is, especially true in the context of LMS. Just like examples in particular really help models like figure out what you’re trying to

John Wessel 32:01
do. Hmm, yeah, I the, what I’ve seen, which I think relates to that, is, it seems like, especially like technical people, like, want to program it right. Want to be like, Okay, well, how do I prompt? Like, all right. And then I’ve seen some very complicated things, like, Oh, I know an engineer wrote this type of prompt. And, you know, it’s relatively hard to benchmark that versus, like, a more simple prompt. But I’ve also seen some very simple prompts that seem to have, like, pretty similar outputs. Is that, like, your general experience too, where, like, there’s some really complicated stuff and some simple stuff, and maybe the gap isn’t very big, you know, between the two?

Soheil Koushan 32:40
Yeah, yeah. I do think, like, as your instructions get bigger and bigger, like models today do struggle with, like, internalizing all of this, and may start forgetting little pieces of it, and it’s not like perfect and so, yeah, if you can distill it into like, the most key, simple parts, I think that would generally be helpful. Yeah, I think one other tip when it comes to prompting is to think of every token that the model has as input and as output as compute units, and by, for example, telling the model to, hey, can you like, explain my question and describe your understanding of it before answering it. Like what you’re doing is two things. Like, one is you’re just giving the model more ability to compute, like it is, like, you know, every single forward pass creates some amount of computation to happen, and you’re giving it more of a chance to think. And I think that can be pretty helpful and like a very complicated question, but also you’re giving it a chance for it to think out loud and put things out on the paper. And every single time it puts down a token for the next token, they can look at what it wrote down previously. And so having the model be very explicit, think out loud, be descriptive, and reason gives it just like, it costs you more money, right? Because it’s like more tokens that have to get processed, and it costs more from a computer perspective, but that then can help make the model smarter and give you answers that are a better line. So, you know, I was putting together an eval one thing I added before my actual question was describe this document, to figure out what are the relevant parts to it, and then answer this question, and that sort of thing can help a lot. Yeah, and I guess we’re entering this sort of paradigm now, of test time compute, where you know you can scale train time compute, and you’re trying to put more into the model, but you can also scale test time compute, which is having the model explain itself and think out loud and do chain of thoughts. And it turns out, you know, that that can scale pretty nicely with capabilities, especially for certain types of things like problem solving and math and coding. And so that’s like a lever that you can pull. You can use a bigger model, where more compute when it. Training, or you can ask it to think out loud more and leverage test time computation to get a better grant.

John Wessel 35:05
Interesting. Yeah,

Eric Dodds 35:06
That’s interesting. That’s a very helpful way to map your interactions to those different modes of computation. That’s super interesting.

Soheil Koushan 35:18
Yeah, well,

Eric Dodds 35:19
we’re gonna close in on, probably a topic that we could have started with and taken the entire show up with, which is looking towards the future. So one of the ways that I think would be fun to frame this, so we’ve just been talking about very natural, you know, day to day, ways that, you know, we interact with, you know, with Claude, right? So, how do I unclog my drain? You know, make this, you know, make this Python code better explain my question. You know, all those sorts of things. But when we think about it, I love the way that anthropology talks about the concept of frontier, and, you know, both in terms of research and product and models. And one thing that’s really interesting about the way that most people interact with AI, at least two interesting things to me. One is that it is so consumer in nature, in that there is it, I guess, to put it in a very primitive, like a primitive analogy would be, it just feels so similar to like, open Facebook Messenger, you know, or open Claude, and the interface is very similar. You know, there are just so many ergonomics that are really similar. So that’s one way which is very consumer and is, ironically, just not super different than a lot of interactions that have come previously. The other really interesting thing is that, in many ways, it’s disappearing into existing products, right? So increasingly, the products we use will have these new ways to use features, or new features that feel extremely natural, but are like, Whoa. That was like, that was really cool, right? And it’s like, okay, well, there’s something happening under the hood there, but it’s so natural within the product experience that the AI component is sort of blending into the product in a way that isn’t discernible, which actually to your point earlier, you know? It felt kind of magical, right? And it’s like, well, maybe that’s the point. Those don’t feel super different to the consumer necessarily, right, or to the person interacting with it. It just feels like a more powerful version of things that we were doing before. And that may be that that, you know, that’s probably an under-thought about the frontier and especially the research and all of the crazy things the Anthropic need to say. Could you create a bio weapon with this that feels so distant from the way that a lot of us interact? So that was a very long preamble. But how do we think about the future? Because it’s kind of hard as a consumer to think about the power and the future of AI, I think because the way that we interact with it on a daily basis almost obfuscates that a little bit. Yeah, I think part of the explanation

Soheil Koushan 38:11
for why people don’t fully understand the safety implications is because, maybe because we’ve, as an industry, done a pretty good job of doing rlhf And making sure that the models act in a reasonable, aligned way. Like, I think if we throughout these the base model that has no alignment work done on it, like, people would be like, Whoa, this model will just completely ripped into me and, like, made me feel shitty, or Whoa, just taught me how to, like, do something that’s pretty illegal, like, we’ve done a good job of preventing those sort of interactions. And so people are like, Oh, they’re super safe, like, they’re super harmless. And it’s like, great, that’s exactly what we were, yeah, yeah. I Yeah. And this is just today, like, as they become more and more capable, like, it becomes an even bigger problem. But yeah, I think that means that we’ve done a good job of just, like, aligning them, making sure that they sort of act in ways that people would expect and then are harmless. And I think, yeah, you know, on the point about, like, user interaction, and whether it’s like a specific app or disappearing into the product, and just user interaction broadly, like, you know, I tell my parents that I, you know, work in AI or Anthropic and I think my mom was like, Oh, man. Like, it’s so scary. Like, things are changing so fast, I’m gonna be so obsolete, like, I wouldn’t even know how to use the future thing. And I’m like, actually, the future thing will be way easier to use than anything you’ve ever used in the past. Like, you will be able to talk to your computer like, you know, 40 years ago, you had to be an expert to use a computer. You had to, like, understand the command line and understand, like, exactly the command you’d need to use to, like, execute something very specific. Today, you can literally talk to your phone and be like, Hey, how’s the weather from on the trip that I’m going to next week in New York, and it’d be like, here’s the weather. Like, it becomes more and more natural and more and more human. Like, which is actually going to. Increase accessibility, and it’s going to make all these things easier and easier to use. And I think there is a little bit of, like, jumping the gun, a little bit where people, you know, this is where things are going, but if you kind of build it before it’s ready, you end up with, like, lackluster product experiences, like, okay, like a an AI for, like, creating slide decks. And you’re like, This sounds cool, like, let me explain the slide deck that I want, and it like, does kind of a half assed job, and it doesn’t really create exactly what you want, and that creates a bad user experience, and then people are distrusting of it and don’t use your product anymore. Like, there’s definitely a certain level of capability that needs to exist for that feature to actually feel magical, to actually feel useful, and to actually, like, you know, not be frustrating to use. So, but once, once those are there, interfaces will be very natural. There will be, like, the most natural human interfaces that we’ve ever had before. So, yeah, I think it will disappear into the things that we use every day. Like your laptop will be completely AI based, or AI driven, and like, the way, you know, interact with your phone will be like that too. And some of it will create, like, full new modalities. Like, you know, one, one really cool idea I have is, you know, I think in five years, like, you know, maybe more or less whatever you can be like, Okay, I’m trying to, like, install this shelf, and I don’t, like, fully get it, and you would just pull out your phone and be like, Yeah, this is the show. The shelf. Like, these are the instructions. And then you’ll have this video avatar that pops up and talks to you, and, like, has a virtual version of the shelf and says, Okay, you see this part of the shelf? Like, like, drill this part. And then you’ll look at your thing and be like, Oh, okay, I see. And this will be, like, generated on the fly. And like, you know, you can’t get more sort of intuitive than that, like a literal person on your phone explaining something with what you’re seeing right outside of the phone. That sort of thing will, I think, very likely exist. So, yeah, it’s going to be like a crazy future. Wow,

Eric Dodds 41:55
That’s pretty wild. It actually is putting desks together for the kids. And, you know, you get those things and you have, like, this little allen wrench, and it’s like, not fully like, this sequence is like, you know, important, if you get one thing wrong, you know, you start over practically, yeah, yeah. So let me know, actually, yeah, you’ll

Soheil Koushan 42:16
be the first to know. I’ll let you know. Yeah,

John Wessel 42:19
yeah. So full circle now, now I’m curious about whether you spent the time, you know, with computer vision, now with LLMs or and we talked about different applications for LLMs, I mean, chat. So when everybody knows, are there some cool things going on with computer vision, type, you know, type technology in LLMs? I mean, I’ve seen some things like, what are some things that you see in the future for that?

Soheil Koushan 42:45
Yeah, so like, Claude is multimodal, so you can take, you know, a picture of something, whether that’s like some document you’re looking at, or, you know, something in the physical world, and ask questions about it. And it’s like, particularly good at, like, explaining what it sees and going through it in decent amount of detail. But the area that I’m most excited about is actually, you know, kind of away from what I was working on before, which is the natural world, like computer vision on images, and actually vision on digital content, so a PDF, right? Or like a screenshot of your computer, or like, a website. I think that as an input exists today, I think it’ll get better and better and then the related capabilities, like, okay, you know, the first demo of, I think multimodal chat GPT was, here’s a sketch of a website, and, like, you throw it in and take a picture, and it tries to, like, write the code for that, like, that will get better and better over time. And obviously there are multimodal output models like dolly right where you can ask it to generate an image. There’s now video with Sora and a bunch of other companies doing that audio output too, with sort of voice mode that’s coming. And also, Google has their own, and there’s a bunch of others like Moshi. So the three main modalities, text, audio and vision, and they can be at the input or output. And you know, in the case of Claude, you have text and images inputs as well as Texas output. But this list will be continuing to expand over the future. And GPT four Oh is actually a three modality and three modality output model. I do think that’s the future. I think vision , in particular, is useful. Like, I think audio, just a personal product take is, I think, very useful from a product perspective. Like, I don’t think audio is adding new capabilities into the model, but it is a much richer, more human way to interact with it, whereas vision is truly a new capability. Like, you cannot describe, you know, that table and the hole and where to drill it as sort of a text as you know, you could, but it’d be way, way harder than like, here’s an image like, do this like, so I think vision actually does add new capabilities. And yeah, you’re seeing a lot of that for like. One of my focuses on sort of multimodal vision in the context of knowledge work. So how do you make Claude really good at, like, reading charts and graphs and being able to, like, answer the common questions you might have about, like, a report and stuff? So that’s, I think, super valuable. One thing I’ll also just add on the prior like, self driving work to like what I’m working on today is that, like people talk about AGI, like, I kind of think that AGI, depending on how you define it, is already here. Like, these are general purpose models that can perform generally intelligent behavior. And it’s about, it’s more of a question of what data you feed in and where it’s like when I was working on perception and vision, like it was a very narrow model, like it could do bounding boxes on cars and people and pedestrians and lights and stuff. But we were slowly starting to make it general. We were slowly starting to add other types of things that you want to detect, whereas, like Claude and transformers and autoregressive transformers in particular, for general purpose thinkers. Their general purpose like next token predictors and so many things can be framed as a next token prediction problem. And so that’s one of the things that I see that’s different, is about what I’m working on now versus before, whereas, like, I’m working on something very general, which is why audio just kind of works. You just, you know, discretize it, tokenize it, and then throw it and then throw it in, and then starting, you know, with some tricks and with a bunch of things, you have the same engine that’s creating text output, creating audio output. And I think that’s, like, super cool in general. The same way that your brain is a general purpose cognitive machine, there’s been people who, like, have had different parts of their brain, like, ablated, and suddenly they can’t do a specific skill or a specific like type of kinematic motion, and then other parts of their brain reconfigure and allow them to do that over time through retraining, especially if they’re young and early, right? So there’s tissue in here that is the general purpose system, and I think we’ve unlocked that. We have found a digital analog to a general purpose cognitive engine, and now it’s just a matter of scaling. It is the way that I feel. Wow.

Eric Dodds 47:07
Well, bricks are messaging us that we’re at the buzzer, although I could continue to ask you questions for hours or perhaps days, but so far, this has been so fun. I cannot believe we just talked for an hour. I feel like we just hit a record. You know, five minutes ago. Really appreciate the time. It’s been so wonderful for us, and I know it’ll be for our audience as well. Yeah, thanks

Soheil Koushan 47:29
for coming on the show. Really glad to hear that. Yeah, appreciate you guys. This was really fun, and I hope people get some value out of it.

Eric Dodds 47:36
The datastack Show is brought to you by Rudderstack, the warehouse native customer data platform, Rudderstack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.