Episode 2:

Data Council Week: AI Isn’t Just Hype – How To Successfully Apply LLMs Today with Tristan Zajonc of Continual

April 17, 2024

It’s a special edition of The Data Stack Show as we come to you from the Data Council in Austin, Texas. Brooks and Matthew co-host the show to bring you some bonus episodes from some of the leading voices in the data space. This episode, Tristan Zajonc returns to the podcast to discuss the evolution of AI and its integration into applications. Tristan is the Co-Founder and CEO of Continual. In this discussion, the group covers the shift towards generative AI in data science, the progression of machine learning in production, Continual AI copilot platform and the importance of reliability and low latency in AI responses. The conversation also touches on the challenges and future potential of AI copilots in complex industries and large enterprises, considering regulatory and technological breakthroughs needed for widespread adoption, and more.

Notes:

Highlights from this week’s conversation include:

Tristan’s Background and Journey into Data (1:14)
Evolution of Machine Learning and AI (3:13)
Impact of Generative AI (6:33)
MLOps and Challenges in Early Data Science (8:48)
Success and Applications of AI Today (11:34)
Continual AI Copilot Platform (18:04)
Challenges in building remarkable AI assistants (19:58)
Reliability and accuracy in AI responses (25:31)
Regulation and adoption of AI assistants (31:30)
Future of AI assistants and Continual AI (33:12)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Brooks Patterson 00:25
What’s up DataStack Show listeners welcome to episode two of data Council week. 2024. We’re in the field at data Council Austin. This is the third year in a row. Eric and Kostas both had conflicts this year. So I’m filling in for Brooks, the producer of the show coming out from behind the scenes to bring you a few special episodes this week. Matt k d, my colleague RudderStack, who brings over a decade of experience working in data science is joining me to dig into the technical details. But today is all about Tristan Zajonc, Tristan is co-founder and CEO of Continual. He’s a returning guest. We had him on the show way back in September 2021, which feels like many epochs ago in the data world. So we’re excited to catch up with Tristan today. Tristan, welcome back.

Tristan Zajonc 01:06
Hey, great to be back and in person here. Yeah,

Brooks Patterson 01:09
Love, love getting to record live in person is a special treat. Interesting, lots to cover today, especially in the midst of what I’m reading about, I guess we’re now calling this the AI revolution. But before we get there, they just give us a kind of quick background. How did you get into data and kind of get to where you are today with continuing? What well,

Tristan Zajonc 01:32
I’m one of those people who have been around since Farben, before the Gen AI wave that’s been happening. So you know, I’m a statistician by training. I was a grad student and actually came out of basically working on statistics during my grad student days. And that was back in 2013. And I think data science was the trend there data science was a huge term comes of hype around the ability to big data and data science to use data to do more, I got quite excited by that as a general area, and you know, kind of with the long term idea that this all led to AI and this was going to be a very exciting area to be in a decade or multi decade potential career. Because you sort of saw AI in the future, even if it still felt a little far off, got hints of it with things like what DeepMind was doing at the time with their screen enforcement learning. And there were lots of commercial opportunities around data science itself. So then went into the startup world, founded a date, one of the early data science reserves is platform called cents company was acquired by one of the large data platform providers club era, which was the leading provider of the Hadoop platform. So I’ve got to see the whole big data world that was really hyped up there, then, you know, we were really moving over towards machine learning operations, and how do we get all this into production, since data science, you know, and just analytics, plus wasn’t enough. So I saw that whole trend and participated in it. And obviously, the last, you know, two to three years, two years, maybe it has bitten this whole next wave of generative AI, which is undoubtedly probably the most exciting I’ve been in terms of the industry, which is fun when you get excited early on. And then you get more excited as things go on.

Brooks Patterson 03:12
So you really began with the end in mind of you thinking about AI, kind of at the beginning.

Tristan Zajonc 03:21
I was thinking about AI at the beginning. I mean, I know that I remember giving a talk and right 2012 or 13. And I was I think I was excited to talk about some of the stuff that was coming out of DeepMind. They were doing deep reinforcement learning for, you know, Atari Games, and it was kind of hinted that, hey, you could learn from scratch. So I was excited by that. I think that was what made me think the whole industry had a long future ahead of it. But I, you know, I was also, you know, just handling huge amounts of data was also interesting from a technical perspective. So that was the whole data trend. And then, you know, I was a big Bayesian statistician. So I lost, you know, I had probabilistic programming back then, which was a whole, I guess, a different idea. You know, we’re all intellectually interesting. And what the unstated thing was, okay, wait, there’s something here, you know, that’s, that maybe we could call artificial intelligence, how the past to it was a little bit unclear

Brooks Patterson 04:18
That time, tell us a little more about the path to it now that you’ve kind of lived through it. And we’ve been through kind of, you know, since when you’re talking about 2012-2013, you know, different generations of machine learning. And yesterday, we were chatting on the floor at the conference, and he said, you know, you really kind of have this kind of personal kind of story and connection to all. Can you just tell us a little about your experience going through these different generations and kind of how you’ve seen the technology overall?

Tristan Zajonc 04:54
Well, I do think of it as basically these three phases. I mean, the one which I was kind of alluding to just previously which was one sort of the data science phase? You know, maybe you put bait big data there as well, but it was largely around, okay, we’re, you know, think there’s value to looking at data processing data, understanding data, there is value data is the quote unquote, new oil or something, maybe it was, you know, wasn’t exactly clear. You took that to fruition. Yeah. And we knew we knew new tools to be able to handle the data at the scale that it could handle, we knew that we needed to go beyond SQL, and we needed to basically start to do data science, which was a whole set of new tools. And so that was, I would say, like, Gen, one of the sort of data science, you know, Jeff gen one, at least for me, personally, but I also think it was just in the industry. And then I think, generation two was what I would say was like production, machine learning or ml ops, where we all said, Okay, this Avaloq stuff, you know, isn’t delivering the value that we wanted, okay, what’s the problem? The problem is that we need to get the stuff into production, we need to have an impact. The business, if you’re in the business context, are actually impacted and users are. And so that led to the whole kind of the ML ops trend, where we said, Okay, how do we efficiently and reliably scale up production, machine learning where we can deliver this into real applications to make an impact. And, you know, that had a good run, which I participated in, and it’s still obviously important. And then I think, this really, this last wave Gen AI, honestly, is a huge transformation. I mean, it’s a big break, it’s actually not a continuation, I think, from the MO, the traditional analog world, it completely changed to capabilities, and the, the of these models, so all of a sudden, you know, we were, we would all go through talking about things like forecasting and classification, and, you know, maybe little bits of optimization. And now we’re thinking about, you know, generating, you know, some of the most creative texts that you could imagine or images or thinking about and agents and kind of autonomous agents that are taking these tasks. So it opened up a huge new set of potential application possibilities. And then I think secondarily, and maybe even equally importantly, was that it just was immensely simpler to implement, and to productionize. And so this sort of the rise of sort of in context learning is zero shot learning, the fact that you can have these large foundation models get amazing breakthrough results with a very limited effort, results that were never previously even possible to do you think like the use cases, even as basic as like summarization, right? Hey, you know, you have a whole bunch of user reviews that summarize them or identify, you know, that was just such a hard problem to actually productionize previously, and then became like, something that anybody could implement any engineer could implement in an hour. I mean, honestly, not even now we’re in a playground environment of one of these tools, you could get amazing results that previously you never were able to get. So that was a real profound transformation, both in terms of how I perceive what these models could do, and the timeline that that was arriving at. And, you know, this, the ability to actually put it into real products, put it into, you know, corruption, which was historically, you know, a huge challenge. And something I’ve been, you know, you tried to be simplifying with ml ops. And I’d spent a lot of time both, you know, at Cloudera trying to think about what does production machine learning, you know, look like? And how do you make it simpler? And then I earn, the early stages of my current company continually were really motivated by how do you radically simplify production, machine learning? And what are the kinds of different ways you can think about that? And then, you know, kind of this journey, I think, was a huge unlock in terms of realizing that potential.

Brooks Patterson 08:38
Matt, you, you’ve worked through this and have war stories as well. Anything to add? Maybe, especially on the ML ops side, coming from you?

Matthew Kelliher-Gibson 08:48
Yeah, I think so, because I got started about 10 years ago. And I think we’re first there, it was kind of this idea of, we’ve got all this data. Clearly, obviously, we’re going to do amazing things with it. But there was also kind of this idea of like, it’ll be easy and cheap, like when people first started and then you kind of got into it, and you’re like, Well, look, I can build the model in Python or whatever. And, look, it’s done. And it’s like, we can’t go anywhere. It’s stuck. No, now what so I can predict churn. Okay, how do you get it to someone who’s actually going to do anything with it? Yeah. And that’s where we then it was kind of that whole thing there that you kind of go up to, but, but yeah, I will. I think also that, that, that changes with the Ginni I mean, we got a project at RudderStack that I just ran where, you know, classifying tickets, right having it you say, what was this customer success ticket about? And like that it’s a project that like 10 years ago, we would have started with, hey, customer success, you need to take the next three months and just label tickets. Yeah. And we were able to, I got the first results in an hour. Yeah, like that’s better results

Tristan Zajonc 09:56
and better results, than you would have been able to get even more all our customers,

Matthew Kelliher-Gibson 10:00
Yes, all our customers elated, we would have still gotten even much harder. And we were able to ask it to do things like not just say, Hey, pick from these labels and do it, but also say like, if it you know, is there a source at names tell me what the source is, yeah, and format it in this way and do these types of things, things that just like, as you said, you know, even five years ago, like I mean, I’ve worked with people who have been doing kind of the NLP stuff. And like, they couldn’t do that, they were doing some great things. But they weren’t doing that five years ago, and in a lot of ways. Yeah,

Tristan Zajonc 10:30
that’s no, absolutely. So it opens up a whole new I mean, and then and then as soon as you it’s that easy, it’s good. And that’s easy. I mean, your creative juices start flowing in terms of where you can apply it. And so even if you know, individually, you know, each individual application may be relatively small, you know, in aggregate, when you add them all up, you know, you kind of totally change the way you support management. And that’s a use case that I see now. I mean, it’s one of the most, I mean, it’s completely completely disrupted. I mean, it’s a huge impact on the way we do customer success, it’s got one of the most obvious use cases, and people are already seeing a lot of success with that.

Brooks Patterson 11:07
What are some of the other use cases? I think, you know, we’re in this kind of put in the hype cycle where it’s just, there’s almost this just like general mandate, it’s like, figure out, you know, some way to harness AI, at our company, put it in, just put it in just slap AI on, you know, the customer success use case is one, what are some of the other ones where you’re seeing people actually have success with this stuff today? Well, I

Tristan Zajonc 11:35
I mean, it is. So yeah, there is a sort of gap between reality, I think the hype is justified, because the future is so exciting. And it does feel like we’re all in a world where we are going to have future breakthroughs. And these models are going to become increasingly capable, and they are going to be able to do more and more. But let’s be honest, you know, that’s, you know, we’re not there yet in terms of holding the full potential. And so if you’re just sort of okay, in reality today, with the current models, you know, where do you apply them and see success? I mean, one is definitely any unstructured to structured information task that has just been completely opened up. So if you knew the gap, well, you just gave as an example of that. But there’s many more broader exams, huge amounts of information coming in, you want to pull information out of that, put it into some sort of workflow process, even if you don’t automate that workflow process, but historically, but we’re working with a company right now that does, essentially, does loan decisions, you know, there’s loan officers. And you know, the first step is the end load applicant uploads a lot of data data, including all their like bank statements, which they just bank statements, transaction history, switching, largely just download from their bank portal and upload as a PDF. And then the loan officer wants to extract not just get structured information out of it, like what transactions were there? And what is the balance, but even subjective questions, like, do they have a regular payment schedule? You know, like, are they getting regular payments, do they have one or more main sources of income, you know, do they have other outgoing payments, that are, you know, that are regularly recurrent outgoing payments, right, because they have other loans, or like car loans or things like that. And that’s all from a really messy, very heterogeneous set of data. Yeah. And they now, you know, this company into Siena can now make that incredibly easy for these loan officers to first of all, you know, pre canned stuff out of the bat, you know, out of the box that, you know, as part of this product, they get a whole bunch of, you know, answers out of the box. And then they actually also can even enable the loan officers themselves to decide on custom questions that they want to ask of the data that can be run on every document that gets uploaded. So, so it’s so easy that the what was previously immensely complicated ml can now be pushed even into the hands of the end user so that they can do what would previously build in some ways a quote, unquote, new model, although it’s just a new prompt, essentially, in these days, and they can build and then they can build that into a product experience. So this one domain, it’s super successful, is absolutely this sort of unstructured to structured information. I see that delivering it. Another one, obviously, are these conversational, you know, experiences. We’ve all experienced it with Judge Abt. And I would say where today I’m seeing most success for that is one, these kinds of product success, product support type use cases where the customer, you’re, you have a product, and you’re asking a question about it’s a complicated product. And you’re sort of asking a question about the product itself, like how do I do X? And is this or where it is x in this product or something like that? Like, what are my w two, you’re in HR? And it’s just like you’re diverting, hey, you absolutely divert, you know, 50% or some significant percent of your support cases, while actually not being annoying. You know, traditionally, these chat bots were really annoying. And now you’re like, Yeah, I actually would like to ask the chat bot but you don’t want to wait it out a bit. You have a one hour SLA on your, you know, customer support, you know, human that’s actually kind of that’s the annoying desk versus the Hey, now you’re starting to say you can ask this to this chat bot or this You know, Assistant inside of a product, and then works. And I would say that it does work. We’ve gotten to a place where that works, it works well. And then the next level, which is only, you know, a continual now we’re thinking a lot about is he’s actually being able to ask questions in the application data that’s inside of these applications. So this is, you know, this doesn’t replace, I think, a UI, but for certain types of questions. It is, especially these ad hoc questions, one off questions that the product itself didn’t have. It’s not repeated efforts were unique enough that the product didn’t just create a button, just or a pre canned dashboard is something that answers it. There’s a class of type questions. And that’s how we use chat. GBT, right, we kind of do these loosey goosey kind of questions that don’t really know what the Google search and you can ask. And so there’s a version that happens within products, I see a lot of success for that. There’s, everybody’s excited for the next week, when we talk about things like, what’s common because they always say these are what are working today, everybody’s trying to get to the next level, right? Where you think about automating work, and you know, agents and things like, you know, things like multi modality true generation of different asset types, anything in certain domains, like, you know, obviously an image, you know, images and stuff, you’re seeing use cases, but those are two ones, there is another one, which is just generate, you know, there are certain workflows, generating RFPs generating link draft job postings, generating product descriptions, where, you know, generating summaries, right, those are a third version, which is really focusing on that generative part that does work today. And no question if your application has that, you know, completely works and delivers

Matthew Kelliher-Gibson 16:42
value. That was one of the first ones I saw, I was talking with people and they were like, look, we take these proposals that we need to write and we feed it in and it shoots out, framework fills in most of it, and we just go in and edit it and do that it was like, took the workload down credit terms.

Tristan Zajonc 16:58
Yeah. It’s just it’s just an incredibly time consuming thing. Yeah. Today, and it’s a one where, you know, these models are producing, you know, they’re even on marketing . I was able to get this working. But I mean, let’s be honest, these models don’t produce great marketing materials. You know, the readers are not really doing the readers of service quite yet. You know, sometimes when you rely too heavily on these tools, but there are a lot of other types of things, right, that are pretty formulaic. Yeah. Like a job posting where you just have a kind of the existing template you’re modifying, and you kind of actually want a lot of consistency in the language. Yeah, product descriptions are similar, where you really want to brand voice, you know, that’s doing that or, you know, RFPs, you know, where it’s like, Hey, we’re just, you know, making sure we’re getting the work done, and it’s documented. And it works quite well there.

Brooks Patterson 17:43
We’ve talked around a lot, I think. And I do want to get to kind of what’s next, and kind of what pieces need to fall in place for us to take things to the next level. But can you just, can you tell us about continuous specifically, and kind of where continual fits into all of this today? Sure.

Tristan Zajonc 18:04
So continuing, we’re building when we call it a co-pilot platform for application. So our goal is to help developers build custom embedded AI assistants inside of their product. So the core thesis is that every application out there is going to embed a co-pilot, or you could call it that’s the name, but you could call it a, you know, an AI assistant, sidekick or something inside of the product. And as these models can become veteran veterans, these assistants get better and better, they’re going to become more indispensable. And you’re seeing that today, you know, you see that with Microsoft Office 365, copilot, the Gemini Copa that’s part of the office, you know, suite from our office 365 suite from Microsoft, you know, they’re probably one of the more head folks, you see that with what Google is doing with what they previously called Google duet, but now they’re calling Google Gemini for the workspace suite. Right. So these are, you know, assistants that are embedded into software applications, you see that with Shopify is doing it with Shopify is sidekick, which is you know, kind of the copilot for E commerce that sits inside of Shopify into it is another, you know, kind of leading example, they’re doing it for the whole set of Intuit products, like from TurboTax to QuickBooks, Credit Karma. And these really are, you know, so the basic idea is, you know, all these, all of our applications are going to change, they’re all going to have an assistant inside of them, that obviously is going to include a conversational element to it, but it doesn’t have to just be conversation, right, it’s a multi modality multi sort of user interface is sort of instead of enhancements that you can add to your products, but it is intimately you know, connected to your product, your domain. And so we’re helping people do that helping, you know, be deeply connected to the data of your product, be connected to all the API’s of your product and both the backend API’s and the front end experiences. The integrated both conversationally and kind of getting out of the box conversational capabilities or conversational UI for your application, but also then build other features that are more general features things like you know, summaries or these information acts struction tasks all along is what we call like a standardized co pilot stack or engine. So all the data is flowing into one place, you’re being able to monitor it in a centralized place, you’d be able to refine it, you can evaluate it and see what users are doing, where they’re failing. And then you can kind of loop back and continually improve it right? Continuous. And so I’m super excited because I think we’re going through the evolution that we just talked about, generally, where, you know, you’re gonna start with sort of these bread and butter, more simpler use cases. But I think what’s exciting generally about this, and our goal is how do you build not just these these, like, you know, kind of a chatbot 1.5? But like, how do you build a remarkable indispensable as you know, Assistant, that just enables you to do you know, so much more more things faster, and do things that you were never able to previously do? And I think that, you know, it exists today. But you know, it’s gonna exist even more in the future? What

Brooks Patterson 20:53
do you see this kind of, to get to? Again, it exists today, but a more kind of widespread experience, where we’re able to build these really remarkable things using this technology, what are the kind of next kind of core pieces, or what problems we have to solve to get there,

Tristan Zajonc 21:16
I think the biggest one is reliability, and the low latency and low cost. So you know, in the assistant use cases, you need to respond quickly, you need to be relatively cheap, right to be able to deliver to the customer. And you really need reliability for the task for it. So it becomes trusted. And that’s a hard serve three set. They’re conflicting, three sets of things. Yeah. And you know, and some of them aren’t appreciated, like the latency one, you might say, hey, let’s do you know, let’s do that. Let’s use GPT. Four, and we’ll do reflection over our answer, and then reevaluate whether we were successfully answered with an answer, again, multi shot kind of responses, which can, you know, on benchmarks can improve performance. But, you know, if you’re in a conversational chat experience, it’s actually very quickly becomes quite painful. And so, as you drive latency down, you typically have to run smaller models, typically, smaller models are not as reliable, even GPT. Four is not reliable enough for certain types of applications, like a lot of function calling, which is or, you know, calling API’s, which is very important to these models, and certainly the ability to call, you know, multiple functions, more complicated queries, you know, so we have, we have customers that want to do you know, you’re into like a CRM, right, and you want to, here’s two examples of hard questions that are not possible today. So like, one would say, what happened yesterday? Or like, what were the top complaints over the last month? Yeah. And that’s not something that can easily be, you know, traditionally, one of the major technologies or kinds of ways we connect LMS to particular applications is through retrieval, augmentation, real rag. So we do retrieval over some knowledge base, and then, you know, kind of enrich the context. And then the LLM responds with that enriched information. But there’s no context window lengths, and limits there. And so something that’s a broad question like that, it’s very hard to do, you can’t do retrieval over that easily. Because you really need all the data and you’re trying to say, this is massive amount of things that happened yesterday, now, go summarize them, or pull out the major complaints out of all this, now, you couldn’t do it in a batch mode, right, you can build a customized workflow for that, but it’s not a song, it’s not something today, that’s easy, it’s actually easy to do that without kind of crafting a kind of a customized experience for that. The other one is something like, you know, you say, like, you know, go into every using the CRM example, like, you know, go into every deal that we currently have open, and I flag any customers that have that I should respond to, and create a to do task, you know, for that customer, that I should follow up on them. And that’s a task that you could imagine giving somebody on a sales team, right? Hey, go and do a deal review, write, you know, create a summary on the fly , create to do items or something like that. But it’s if you think about how to implement how an agent or kind of a co-pilot might do that, it today requires you to do perhaps hundreds of calls to an API, the back end, okay, look up all the customers, look at all the records and analyze that data. You know, it’s feasible to do. It feels like we’re on the borderline of doing it, but it’s not really you know, today, it’s not really, these models kind of can’t handle that level of like the number of function calls without a whole bunch of work. So we’re doing some of that work to make that possible. But not a whole bunch of that work. That’s not possible. Yeah. Yeah. I think there’s I think both of those, I mean, what I’m excited about is just using those as motivating to kind of get to the point of the future. There are potential things on the horizon. Right. So we have now models that have massive context windows, the Gemini model has, you know, 1 million, 1 million context window in the publicly available API and 1 million tokens in the publicly available API and, you know, has up to 10 million, you know, that they’ve shown that they can get it to, which is way Oh, you could you could solve that sort of retrieval use case in a fundamentally different way. Now they find They struggle with latency, it’s their case, you have to take time to do that it takes 60 seconds currently to give a response. So that still doesn’t work. But we’re pushing that envelope there. And then obviously, you know, you know, there’s a lot of excitement around agents and planning and reasoning. And we all sort of recognize that, you know, that is still a limitation. And there’s obviously a lot of teams working on that. And I think progress will be made this year on that a little bit unclear how fast the progress, you know, I’m

25:28
sure.

Matthew Kelliher-Gibson 25:31
So with that, do you kind of feel because I know, like, I know, especially when it first came out. And you could point out some of the errors in it. And the response that I know I got from a lot of people was like, Well, yeah, but just imagine what it’ll be like, you know, a year from now or whatever. But you do read some other stuff that talks about that, like, kind of, we’re hitting some of the limits of like, parameters and size or like latency and stuff like that. Do you feel like there we can get to where you’re talking about with just what we have? Or is there going to have to be some type of like, architecture or modeling change needed in order to get there?

Tristan Zajonc 26:07
i That’s a great question. And I don’t think I have a definitive answer there. Except that I do think we’re going to need some breakthroughs, to get to get the level of planning, performance reliability that we need at a low latency and low cost, like I do think we’re, I mean, if you look at, you know, like, for instance, the Google the new Google models, presumably, they have an unlimited compute budget there. Right. And they are just meeting GPT. For level, right? If you look at what clot what anthropic just released, right, which presumably was trading for hundreds of billions of dollars, they put up with quad three with their Opus model, it’s just beating or matching now GPT for level, right. And it’s not actually really solving. You know, it’s kind of meeting GB for level four, like, what we’re currently benchmarking against. Yeah, actually, not really, the next gen applications, like the next gen applications are more like these autonomous assistants. I mean, you could fully actually use your computer, and your applications. And like, honestly, none of them even can do that, like where they do it. And they do it so slowly and really badly. I think I think these models, I think neck token prediction, and these auto regressive models can go very far. Right? There’s no question about that. And I think you can solve some of the latency and costs with this mixture of extra mixture of x for type models, which is what is being done. Right, right. GPD for turbo, and, you know, these types of, you know, Databricks just released something today. All right, that was a longer slide. So you can get really far or you can get really far with this. But I think intuitively, you just think about it, you think, hey, there’s definitely opportunities to change the way we do planning. And there are lots of different research directions that people have talked about, and yes, some that something’s gonna work, right. You

Matthew Kelliher-Gibson 27:59
think like, cuz I know, when I saw this stuff I’ve done with NLP early on, one of the big things that we would talk about that was issued that people didn’t realize was they would think of it just accuracy, right? Yeah, be accurate. And it’s like, well, accuracy is important, but kind of how wrong you are is also important, because if you give any answer, it’s really wrong. Especially if it’s going to be in a b2c context. People go, that’s nothing, and then they’ll go back to it. Yeah. And it’s like, you’ve lost them right there. Yeah.

Tristan Zajonc 28:30
Well, yeah. Do I have a funny stoics article? I mean, you can test it out, see if they fixed it. So I Okay. The clock, three models from anthropic, was what just came out, you know, a couple of weeks ago. And my test is, of course, the egocentric test where you ask who is trustworthy in science? Because I know the reason I ask that is not because it’s mainly because I know I don’t you know, it’s a pretty long tail question. You know, as a nominee, it may fail at it right? And I know the correct answer. I really know what what my bio is and if it’s correct, and this model, which is like the biggest DVD for quality model says that Tristan Zions is the CEO of anthropic and it’s like that’s a very weird hallucinations, all the other things right that this guy can do in the fact that it probably knows a lot about anthropic generally because you know, right he does have a bunch of information out there, why would it make that hallucination with okay, there’s a random name and it’s, you know, and it’s probably something I you know, I showed it to actually even tropic engineer last night and he was like, we’re gonna we’re gonna we’re gonna fix it so might be fixed but um, but I’m very I mean, that that model is amazing. And I actually dismissed your point about okay, you know, it’s not reliable. I dismissed quad three as a result of that because like, okay, you know, it’s, it’s you to refuse it, you just have to, he’s not a notable figure, or something like that. But instead it kind of did the solution issue. And then more recently, I experimented with it. Oh, no, it’s actually a GPT four caliber model that has actually had some advantages right before so I think there is yeah, this reliability is huge. I mean, we talked to, you know, I won’t without naming company names. We work with a large financial like accounting company. And it’s huge. I mean, hallucination is a huge problem because they view it as like, Yeah, they’ll get sued, you know, they basically have to give tax advice. You know, it’s regulated. Same thing with the financial services sector is where you’re gonna you’re not a certified financial planner or advisor. And so there’s a lot of regulation around giving financial advice, tax advice, and makes so much sense. I mean, there’s so many tax questions that honestly, GPT, four can do a pretty good job answering, right. But there’s a whole, you know, a lot of legitimate concern, right, really has to meet a threshold. And that’s very high. And we don’t really quite know, I mean, obviously, human tax advisers make mistakes, too. Right. And so in that context, I mean, I think what you see is, you see, the first place these copilots get adopted is actually on the backline. Yes, the human tax advisor is using the copilot and sort of is checking in and accelerating their work and still on the hook for the funnel,

Matthew Kelliher-Gibson 31:06
essentially going to ask that if he thought that was the first step to It is advised by the, you know, advisor, and then once we feel like it gets to a certain point, then you can start to do it. I also think that’s probably going to be somewhat of a regulation question to have, like, just how do you handle that? Who’s responsible for it? If you have, you know, can you certify a chatbot to a financial planner or something like that? Definitely.

Tristan Zajonc 31:30
I mean, we see the customers that are like, interested in what we offer, we see basically two, two crowds. One is the startups, they’re all saying, hey, how do I deliver, like, next generation, like breakthrough experiences to the end user, you know, I’m willing to take some risk, and just, I just want to deliver to the end user, something that wasn’t previously impossible. And that’s obviously the startup opportunity. And then in the large enterprises, yes, there’s a lot of hesitancy around customer facing, you know, assistance, and features. But there’s a lot of appetite internally. So internally, if you have employees, you know, insurance companies, another one there, you know, basically, their insurance plans are extremely complicated in terms of what are the coverages in any given plan, even for the agent on the back end to understand, you know, if there’s a, you know, if my dog eats my couch, you know, a column covered? Where exactly is that an excluded household damage? Right, right. For this particular claim, this particular, you know, vintage or whatever. And so just this kind of like, hey, assisting agents on the back end to answer those questions or automate some tasks, or, you know, exactly the first place to start. You know, I think it’s a huge use case, obviously. But I do think, you know, to really be disruptive, you kind of have to push it all the way to the end user. Yeah, that experience. Yeah.

Brooks Patterson 32:45
That’s exciting. Well, we’re at the buzzer here. But yes, so, so wonderful to hear from you just kind of on where we are today. In here, you know, just about we have unlocked so much and can do so many things so much faster. A long way to go to I think get to where we the Envision

Matthew Kelliher-Gibson 33:09
humanity is safe for now.

Brooks Patterson 33:15
But just and tell us, for folks who are interested in continual AI copilot for your applications? Where can they find out more about continuing in maybe connected with you? Oh,

Eric Dodds 33:28
absolutely. So

Tristan Zajonc 33:28
So I mean, continual.ai Easy, you know, if you’re thinking about building an AI copilot for the Terminal application of a support application and product, you know, we’re a very easy way to do that. Make it remarkable, improve it over time. And we’re in early access right now. So you can sign up on our website, probably going to announce some things in the coming weeks. So stay tuned. Maybe by the time this is out will be up, or shortly thereafter. More probably. Awesome.

Brooks Patterson 33:55
Exciting. Well, Tristan, thanks so much for sitting down for a few minutes in person here with us. And hopefully Yeah, this is my second time on the show. So we’ll have to get you back on in another couple of years and give another update.

Tristan Zajonc 34:09
Absolutely look forward to

Matthew Kelliher-Gibson 34:10
Working towards your gold jacket.

Eric Dodds 34:15
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 2:

Data Council Week: AI Isn’t Just Hype – How To Successfully Apply LLMs Today with Tristan Zajonc of Continual

April 17, 2024

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter