Episode 157:

From Search Engine to Answer Engine Using Grounded Generative AI, Featuring Amr Awadallah of Vectara

September 27, 2023

This week on The Data Stack Show, Eric and Kostas chat with Amr Awadallah, the Founder and CEO of Vectara. During the episode, Amr discusses his extensive experience in the data industry and his new company, Vectara, which focuses on enabling companies to integrate capabilities in their products using large language models (LLMs) with security, reliability, and ease of use. The conversation also covers the advancements in data, computing, and algorithms that have led to emergent behaviors in neural networks, the practical applications of Vectara’s technology, the challenges and considerations in working with large language models, the importance of addressing technology misuse and aligning different value systems in society, and more.


Highlights from this week’s conversation include:

  • Amr’s extensive background in data (3:23)
  • The evolution of neural networks (9:21)
  • The role of supervised learning in AI (11:17)
  • Explaining Vectara (13:07)
  • Papers that laid the foundation for AI (15:02)
  • Contextualized translation and personalization (20:07)
  • Ease of use and answer-based search (25:01)
  • AI and potential liabilities (35:54)
  • Minimizing difficulties in large language models (36:43)
  • The process of extracting documents in multidimensional space (44:47)
  • Summarization process (46:33)
  • The danger of humans misusing technology (54:59)
  • Final thoughts and takeaways (57:12)


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show Kostus, maybe this is a trend. But Amr who has worked in data science and AI for multiple decades, and who started a company that is doing LLM stuff is going to be on the show. We had Brendan short on recently, who is also doing similar things with fellow alums. But this is fascinating. We’re getting to talk to multiple people who are building companies entirely based on technology that hit the news, you know, six months a year ago in terms of, you know, being a hot topic. So I’m very excited. Hammer has a really interesting past, he worked on data science stuff at Yahoo, in product development, founded Cloudera. And so he saw AI at the enterprise level for over a decade, and then now has actually spent time at Google, and now has founded a company that’s doing AI stuff, which is pretty fascinating. So this isn’t going to surprise you or any of our listeners, but I want to hear the narrative arc of that, right, like what data science AI stuff was doing at Yahoo. And, you know, the early 2000s. You know, we tend to sort of call that data science and the new Ceph AI, but in reality, it’s all part of one bucket. And I want to hear his perspective on that. And ultimately, I want to know why he founded decK, Tara, because he has so much context for the problem. It’s really interesting to me why he founded a company that sort of enables LLM technology like inside of companies.

Kostas Pardalis 02:10
Yeah, 100%, I think we have an amazing opportunity here to learn from, I don’t like a very unique experience, right? It’s, by the way, what is that? I think it’s very interesting. And it’s, like, similar to what we were saying about also about Brandon is that, again, we have someone here who has a very long journey in many different phases of like the maturity of the industry. We are talking from Yahoo. Before that, I mean, people who will share and listen to the episode and hear him talking about his PhD and like, what he did there with VMs, and then Cloudera public company. And after that VP at Google, we are talking about, okay, like someone who has experience in some very influential, let’s say, companies out there, at least like from the companies that came out of like Silicon Valley, right? So having someone who has done a journey like this, and now is starting again, something from zero, with Ella lambs. I think it’s a unique opportunity to learn both about what it means to build value out there. But also why he’s doing it with MLMs when he could practically do anything, right. So why in this space, why AI? Why is it important. And I think we are going to have very interesting conversations about many things that men can change around like these new technologies. And as we should already, I think we have the right person to talk about that stuff.

Eric Dodds 04:05
All right. Well, let’s dig in. Amr, welcome to The Data Stack Show. We’re so excited to have you.

Amr Awadallah 04:12
Good to be here. Well, you

Kostas Pardalis 04:14
have you have

Eric Dodds 04:15
a long history in data. Can you just give us a brief overview of where you started? And then after that, I’d love to dig into the vector era and hear about what you’re doing today.

Amr Awadallah 04:28
Sure, I was doing my PhD at Stanford. And out of Stanford, I started my first company. That was a company called Octavia, which was doing comparison shopping, online comparison shopping, and it was acquired by Yahoo. A year later, we were just fine people and to be part of YAML shopping so we became part of the back end we’ll be out shopping, worked at that at Yahoo for about four years, lots of data kinds of data to process all of the crawls of product information across the web, the specs, the images, the prices, etc. And With any IO, I shifted my career to be focused more on data analytics and data science for products and how can we design better products? And I ran a team called Product Intelligence engineering, and got to see the birth of Hadoop. And it solved a number of key problems for me in terms of scale, speed, efficiency, flexibility, and agility. And that’s when I said, Oh, if this works for me as Yeah, oh, this will work for many others. And I left them in 2008. teamed up with my co founders from Facebook, Google and Oracle. Cloudera spent 11 years with Cloudera. I was one of the founders and the Chief Technology Officer at Cloudera went public in 2017, was acquired back prop private, about two years ago. After toga, I joined Google clouds, where I was vice president of developer relations for a number of products within the Google Cloud portfolio, including AI and data products, but others as well. And then after Google Cloud, I started Victoria.

Eric Dodds 06:04
Very cool. Well, I, of course, Victoria is, you know, based on LLM technology, which we’ll talk about so of course, I had to ask chat GBT about you before the show started. And the very first line says that you’re a pivotal figure in technology, which we have now verified on the datasets show. And so that is, which, ironically, we will talk about truth relative to MLMs. In the show, that’s a topic that I definitely want to dig into. Tell us something

Amr Awadallah 06:35
That’s a true statement. That statement you repeated from GPT. Obviously, that is a true one.

Eric Dodds 06:41
Yes. Yes, exactly. We know, that’s

Amr Awadallah 06:45
meaning me being a pivotal figure in the data space.

Eric Dodds 06:48
Yes, exactly. Yes. The Data Stack Show is where you go to verify chat. You know, sort of confirms this. Yeah. This is where you find out if collusion hallucinations are real or not. Exactly. Tell us about Vectara. So what does the product do? What problem are you solving? Give us an overview?

Amr Awadallah 07:11
Yes, so I’ll give the shorter version now. And hopefully, during the show as we are with each other, we can expand on that. But essentially, Victoria is a gem AI platform. So our goal is to enable companies to be able to integrate regional capabilities and the products with proper security, safety, reliability, around that these implementations and ease of use. How to do that without having to go and research all which large language models I should use? Which retrieval augmented generation strategy should I use? Which encoder should I use? Which vector database should I use? No, we have a very simple API. On one end, you put in your data and the other end you issue your prompts and get back amazing results. So that’s what the cloud era Gen AI platform is about.

Eric Dodds 07:58
Yep. Love it. Okay, now, I want to, I actually want to rewind, I want to dig into Victoria. But I want to rewind, actually, and I want to get your perspective on how AI has changed over the last decade or two. Because I mean, Yahoo, you were using data science, right? I mean, in many ways. Certainly, we can discuss, you know, sort of the semantics of MLMs and AI, right. But it’s all based in data science, you are using data science to build better products right. at Cloudera. You, I’m sure got exposure to you know, just a huge footprint of the enterprise who were enabling data science, AI, ml, call it whatever you want to call it, right? Yes. And then, of course, at Google, I’m sure you saw the same thing working on the Google Cloud Platform. So I have two questions. The first one is, what are the main sort of conclusions? Have you drawn? Obviously, you started a company that is providing this as a service. But can you just give us sort of a narrative arc of what you’ve seen of AI, this sort of the storyline of AI over the last decade or so?

Amr Awadallah 09:21
Yes. So there’s multiple storylines, actually. So it’s hard to go over all of them right now. But I’ll try to pick the key ones. First, we have to always remind ourselves that AI Artificial Intelligence is a generic term. It’s an umbrella term, under which many technologies are lumped, like data science is lumped under that machine learning is lumped under that law of language models is lumped up if that your networks and deep neural networks, so it’s really an umbrella term, right? So within the AI space itself, there’s the field of neural networks and whether we can build these self teaching networks that can learn from us and expand our knowledge beyond what we know and solve new problems. Without us teaching us how to teach explicitly how to solve these new problems. Now it had a very hot timeframe around the early 1980s, and 1970s. But then an ice age took place where that technology just wasn’t working, all this neural network stuff stopped, and everybody stopped working on your networks, it just wasn’t paying out. And the reason why was twofold. First, we did not have enough compute power to make these things work. And second, we did not have enough data to feed these things, we’re still going through the digitalization of our knowledge, right over. So died, died off, but then drove forward to the late 2000s. We now had enough data and we had enough computers to make that work. And that’s when neural networks really came back by force. And that’s where we are up to date with the outputs of these language calls. So that’s one one arc, that’s a very key arc of the revival of neural networks. That is not about AI herself. It’s about the neural networks and how they came back. And, we were able to do that. Now AI was always very important. Through the years, like we had that statistical nature of AI all given a number of decisions that humans have labeled. For example, let’s be proud of how you label a transaction as fraud versus not fraud. Initially, humans did that, like there were humans looking at every transaction that visa is doing. And marking this visa transaction looks like fraud, and this business transaction looks at maximum power. Yeah. And that worked fine when we only had five credit card users in the world. But they very quickly figured that not to scale, we have to automate that. So statistically, now, let’s study how these humans have been labeling things as fraud or not fraud, and then leverage machine learning, which is learning from these statistical labels that humans have been placing to be able to do that at scale. And that was the previous wave of AI, if I might say, like, that’s how most AI algorithms work. And it’s called supervised learning. And that’s where we’re giving it we’re solving the problem for it is just taking our solution and scaling it up to run at larger amounts. The cool thing about this new wave now with large language models and deep neural networks, is it’s able to solve new problems that it hasn’t seen before. Right, we’re showing it some of our contents. And by the way, in an unsupervised fashion, just giving it tons of pictures, tons of images, tons of text that we have written tons of articles. And then by consuming these articles, it’s now able to do new things that we haven’t really been exposed to. And that’s why we’re all flabbergasted and impressed and excited about the possibilities that this new movement can produce.

Eric Dodds 12:52
Tell us about Victoria, what’s the overview? So you’ve you have all this history doing, you know, stuff at scale at big companies. And you decided to start Victoria? So what is Victoria, what do you do? What problem do you solve?

Amr Awadallah 13:07
Yeah, so hopefully, during the conversation, I’ll be able to tell you more about Big Pharma overall and what we do. But I’ll give you the shorter version of it right now, since the start of the conversation. Victoria is a Gen AI platform. So what I mean by that is we enable companies to embed Gen AI capabilities in their product. And we make it very easy for them to do that, and very safe and secure and maintaining privacy around it as well. So we have a very simple API that allows them to upload their data on one end. And then another API that allows them to issue prompts or questions against that data. And then these responses get activated right away, without them having to worry about which vector database I’m going to pick: which markup language model, which encoder technique, which segmentation technique, we just automate all of that. So ease of use is very core to what we do when we enable you to be up and running with Jenny I in your apps in seconds. That’s a big part.

Eric Dodds 14:04
Yeah, love it. Oh, man cost us. I’m jealous about the questions that you’re gonna ask. But I will get through mine. So I don’t want to oversimplify this. But, you know, I don’t want to assume that all of our listeners have studied, you know, the different disciplines within AI or even MLMs on a deep level. And so I’m gonna ask you kind of a dumb question. But if you think back to the work that you were doing at Yahoo, and the types of use cases that you’re enabling vector era, is the difference sort of data and compute as inputs, because it’s not like you weren’t doing primitive, you know, it wasn’t like the stuff you were doing at Yahoo. I’m assuming it wasn’t like that as you know, the Dark Ages. I mean, you were doing some advanced stuff and doing some really cool stuff. But is it just computation and data or, you know, what are the sort of the big shifts that are there now? New, I guess

Amr Awadallah 15:02
None of sports is not new, I need to compete on data and new algorithms that came into existence that allowed us to do what we’re doing today. So there are three very similar papers very similar and very important in research that, adding them with, with deep neural networks on top now of the availability of computers and data, allow us to do the amazing things that we are doing today. The first one was in 2013. And that paper is called the word vectors, paper. And the word vectors paper was about how can we take words from the languages that we speak English, Spanish, Chinese, you name it, and then map these words from a word space, which humans understand into a vector space, which is a numerical pointer in a multi dimensional space. But do it in a way that is very intelligent, so that words that have similar meanings, like queen and king, queen, and king have the same meaning just fit me versus love them has made them the most female, but it’s the same meaning at the end of the day. So we want to map them in that word vector space to be close to each other. And that was the first innovation. And that innovation was very significant, by the way, but it was really the seed that started everything else. And then after that came in 2017, the Transformers paper, and the Transformers paper was a very efficient algorithm that allows us to leverage deep neural networks to take into account the sequence of words, right, because we know the sequence of words affect the meaning if I say, Eric, killed a bull, it has a very different meaning than a bowl killed Eric. Yeah, right. Yeah. And it can happen, different situations, etc, etc. So the sequence of words is very important. And transformers allowed us to embed your networks with the ability to capture sequences very efficiently. That was the genius of transformers. That was 2017. And then 2018, came the birth paper, also by Google. And the birth paper was about how we can do unsupervised learning at scale, by taking lots and lots of text, any amount of text that we want, and then asking these neural networks to fill in the blanks, right? So this, I would say, edek, went buying eggs from that, and then I left a blank, and they had to fill in the blank, the blank was going to be the supermarket, that’s where it would buy the eggs from. And as you solve more and more of these puzzles, then the neural network started to learn not only the vocabulary and the grammar, it started learning the pragmatics of our language as well. So it started to learn that Eric buys eggs at the supermarket, but Eric cooks eggs at home, even though Eric cooks eggs at the supermarket is grammatically correct. And memory racks, right? Not pragmatic is not something that language Alerus started to comprehend and understand. Then your network started to comprehend, understand. And these three papers were the foundation. Once we take that now and start to scale it to bigger amounts of data, we start getting what’s called emergent behaviors, we start getting new behaviors out of these neural networks that we did not pre code or pre-determined that will happen. But because we started scaling them with these proper fundamentals, they start to exhibit these amazing patterns to the level now where we have these amazing things like the love language models that can solve puzzles, and explain jokes, and translate or rewrite the passage in the style of Shakespeare. That was it, so it wasn’t just that to answer your question. But if we added and sorry for the long answer. Computers were data and computers, combined with some amazing new software techniques and algorithms that happened over the last day. Yeah.

Eric Dodds 18:41
So helpful. Yeah, no, that was kind of the intention of asking a little bit of a dumb question just because you have such a deep knowledge of this and have studied it. And so I wanted to draw some of those things out. Two more questions. And then I want to hand it over to Costas, that’s generally a lie. So maybe it’s three more questions. The first one, and I want to draw on some of the academic work that you just talked about. So I recently read an article that popped up on Hacker News about various translations of Dostoyevsky’s Brothers Karamazov, right, and so that this article was discussing how translations have been pretty divergent, actually. Because it’s highly dependent on an individual’s exposure to a wider set of Russian literature. And so that, you know, we don’t want to dive into the humanities too deeply on The Data Stack Show. But what you just described is really interesting to me because it stands to reason that you could actually leverage some of those fundamentals to produce a pretty highly accurate contextualized translation that relies on the full documented history of Russian literature as a corpus of work. Is that possible?

Amr Awadallah 20:08
Absolutely, absolutely. Yes, you can. Both both angles are possible meaning the angle of generalize this search reflects the full richness of the style and the culture and political biases and the history of historical narratives in the answer, or you can also personalize it to the extreme where you can say I edit I believe, I’m like, my thought process. my beliefs, my cultural upbringing are way more aligned with this time only give me Yes, don’t give me the rest. So give me the point where we can personalize messages for every single person exactly in the way that they expect it. Yeah. Which are the byproducts of this technology. Yes. Yeah. It’s

Eric Dodds 20:48
exciting and scary, right. Like, if I can ask for the dossier that I want. It’s a little bit scary. Okay, well, let’s save that, because I think we should discuss that maybe towards the end of the show. But let’s get really practical. And I want to really diggin on that, Tara. As I hand the mic off to Costas. And so what I would love to do is to hear about something you worked on at Yahoo, and how you would use vector LoRa to do that differently. How would you inform building a product better, you know, with vector era back in your Yahoo days if the car was available? And so maybe can you frame it in terms of, you know, we had this problem, and we maybe approach it in this way. But if I had Victoria, what I would have done is, is this maybe that’ll help us understand a very practical use case for how, you know, a practitioner would leverage these APIs and how a company can actually sort of operationalize Lectora?

Amr Awadallah 21:54
It’s a very interesting question, but it’s their way, because at Yahoo, in my last four years, I was working on search, how to make YAML search better.

Eric Dodds 22:01
Oh, wow. No way. Yeah.

Amr Awadallah 22:03
And frankly, that’s the approach Victoria takes to serve is the right approach, because it’s not about search engines anymore. It’s about answering engines. Right. And that’s what we see from chat. GBT likes chat, GBT, why did people love chat so much? Besides the fact that it can help them write their homework? Is the fact that they can answer the question like when I asked, explained to me thermodynamics, it doesn’t give me here are 10 links of thermodynamics? Go read them? So you can figure it out? No, tell you? Would you like it to explain to you at the level of a five year old? Yes. And then give you the five year old explanation right there? And the answer, you don’t have to go click on anything if you aren’t getting the answer. So what we are very focused on at Victor with our first product will have many products more aligned with moving into the direction that I refer to as action engines in the future to take action on my behalf as well, verbally. Today we are focused on can I give you the right answer to your question. But how can we ground that answer in your own results now as an organization, so if you have a specific company, we’re not doing this for consumers, by the way for that for the web. you.com is a great search engine that’s trying to do that for the web. Of course, yeah, I was trying, sorry, Google, is trying to do that today with Bart, as well. And Bing is doing that with being tracked. We are focused instead on organization. So if you’re an organization that has a lot of knowledge, say, for example, you are an investment firm. And that investment firm has lots of analyst reports from the analysts at JP Morgan and Goldman Sachs, it has lots of investment, Mel’s that they wrote, who are they going to invest in and who they are not depending on the decision criteria, they have all of the PowerPoint slides from the intrapreneurs pitching them. And they’re storing that all in this very smart system. And they’re coming later and asking the question, oh, there is an intrapreneur. Now pitching me on blockchain from Greece, should I listen to them or not? And the system says nothing isn’t that. So I see this. So the system will respond back and say, instead of giving you a list of the previous memos, like go read these papers, memos will tell you now, this is why you should invest. This is why you should not weigh the pros and cons depending on your own historical decision framing and your historical knowledge that you have on these documents. And then if we get that answer right over and over again, then the next step is now me as a knowledge worker, meaning an investor working with this system, I would say, can you just put this for me into an email I can send to our Investment Committee for why we should pass on this deal brochure. Why should I look further at this deal? So that’s kind of the trajectory. I hope this makes it concrete to you what we do, we are helping you take your knowledge and activate that knowledge as a resource that is no longer you sifting through massive amounts of documents to come up with a PRD or with an analyst report or with pharma research or documents for why a new drug has adverse effects. Now we’re giving you the response right there, increasing your efficiency 10x While doing that, so This is really what this motion is about.

Eric Dodds 25:01
Yeah. 100%? And is this, it really sounds like it’s sort of not only sort of a b2b play where you can enable sort of a knowledge worker, but also, let’s say, B to C, where you have a corpus of knowledge, and you’re exposing that same knowledge to an end user who can also leverage that, right? I mean, I’m thinking about you sort of your typical, let’s take Docs as an example, right? If you have expansive products, like Stripe, say, extensive docs, amazing docs, their team has done an awesome job. But the search is really primitive, right? It hasn’t changed in 15 years, like the doc search. I mean, maybe the matching algorithms have gotten better. But imagine a world where they could provide a sort of answer based. Exactly. Yeah,

Amr Awadallah 25:53
yeah, exactly. And I mean, RudderStack, we are big users of further stack at Victoria, amazing product. Thank you for building that amazing product. Same thing like whether stack has lots of documentation, lots of knowledge base articles from issues that the custOmers, the Euro customers are facing as they’re developing. With your farmer. Imagine now having an engine like Victoria, sorry for pitching you on using us.

Kostas Pardalis 26:16
And this is all of that content.

Amr Awadallah 26:17
And when a developer now has a question, it’s not gonna point to them he has the opportunity, you have to be grateful to figure out how to do this, we tell them, here’s the answer. Here’s the steps you need to go through step number one, do this. Step number two, do that. Step number three, do this and you’re done. And if they have a follow up question, they can say, Oh, tell me more about STEM or the number three, and it will tell them more, almost like they’re speaking to a live customer support rep. So customer support use cases are actually the number one use case for us. That’s what most of our customers are using us for. And I predict that five years from now, call centers or having customer support reps is not going to be required anymore. This technology will completely not just from Victoria, but from other companies building this capability will completely be replaced by large language models. To your point on b2c, I cannot mention their name. But we are working with a very large and social media company that has user generated content about topics, many different types of topics that you can write about. And they’re evaluating our platform right now, exactly, to do what you just said, because when people are searching, they’re very frustrated, especially with keyword search. When you have user generated content, people say the same thing in many ways. They misspell it. So you never find the right answer when using legacy search techniques. And then that evaluating our system and the b2b to see concepts where now they can ask the question, and you’re getting back, not the posts that everybody else wrote. You’re getting digestion of the summary of the wisdom of the knowledge of all of these posts and the response that you’re seeing. And if you haven’t, click through and go deeper when you can do that. Yeah, absolutely. Right.

Eric Dodds 27:46
Yeah, I love it. And certainly, Victoria is going to recommend to any investor who’s interested in data startup, to scan the episodes for The Data Stack Show, because if they haven’t been on The Data Stack Show, that’s a you know, that’s pretty and then on there, that’s their suspects. Like I don’t know if you want to put your LPS money there.

Amr Awadallah 28:09
Absolutely, absolutely. There’s one other thing I want to double click on, which is why Victoria now and white, what am I doing differently for Victoria as compared to Cloudera, which was my previous iteration? Yeah,

Kostas Pardalis 28:21
yeah, I’d love that. So Cloudera

Amr Awadallah 28:26
a successful company, we did an IPO we got acquired for 5.3 billion from the public markets. My own means making 2 billion, almost 2 billion in revenue every year right now, very successful. But I have no shame in saying I was always super jealous of Snowflake. Snowflake is another company in our space that was able to achieve 10 times the valuation, the ultimate valuation that we were able to achieve at the LoRa, Cloudera. And when I studied carefully, Caldera had a very powerful platform, extremely powerful, extremely complex, that can do many things, machine learning, data science, ETL storage, compute High Performance Computing simulations, Monte Carlo, it can do many things. It was so freakin hard to use, right? Only the rocket scientist engineers could use it and get something useful done with it. And Snowflake, on the other hand, came in and said, We’re going to attack an existing problem, which is databases. Google has an amazing database called the query. Amazon has an amazing database called Redshift and extensions of that. Microsoft has of course SQL Server, which is number one in the world and they have Cosmos dB. Yet they were still able to come in and disrupt that entire market by doing one thing, but doing that one thing, me well, which is what ease of use, they nailed the ease of use, they said we’re going to have a very simple API, you upload your data on one end. On the other end, you run your SQL queries, you don’t have to worry about partitions and decK saying the primary keys for rebalancing the whole will take care of automating all of that for you. So you get an Amazing results. And that formula works. That formula works. There’s no question about it. So one of the key things that I learned from my experience with Cloudera, which I’m focused on at TikTok. LoRa is how can we do the same things for large language models and Geney AI at large? How can I provide developers with a super simple API, they plug in the data on one end, they get the issue that prompts on the other end with an API and get amazing responses in return without having to be worried about oh, which back to database, which encoder which neural network which foreign language model, which language, like everything just gets load, balanced and scaled, and secure and private automatically for them. So that’s kind of the main distinction, or the main lesson that I learned from my previous journey that I’m on playing in my current journey. And hopefully, it proves to be the correct lesson to be followed.

Kostas Pardalis 30:47
I will agree with you and I am super happy that you brought up Snowflake actually, because you mentioned a couple of things like the decisions that they made that I think are very important. And they are also very good, let’s introduce words, you know, created all this success with systems like subjectivity, right? The ease of use, from technology, I think it doesn’t matter who the user is, by the way, I would argue, and I’d love to hear your opinion on that. Because with your role, as VP of Dev, Relling, Google, you’re much more experienced than me on how developers like work, but even hardcore developers, right? Is of how easy it is to use and like, remove, like obstacles and allow them to go and do the work that they have to do. Like, it’s super, super important, this will never change. Right? And this is related to, like, the more general like, let’s say topic of human computer interaction, which I would like to chat about, but a little bit later, because I have a question that I want to ask you for quite a while now. We’re talking with Eric. So you mentioned that we had genes, certain things. Who, by the way, if I know that, like most people right now that they are using Google, they don’t realize that but one of the greatest successes of Google shirts, in my opinion, is how easy it is to use it right? It’s not like a textbook. So you just put something there and you get results. replicate data at scale. It’s extremely hard. Similarly with Apple and Dwolla. But we still like to search and search and return back data, right? They don’t give answers, as you put it, very well. Now, there is also something important here, though, that I get the data as the user makes decisions, and I am liable for these decisions, right? The machine is liable because the machines are not serving data right now. And we’ve seen what misinformation can do, right, especially with social media, and like all these things. So information is like a very powerful tool. So what happens when we move away from that? And now the Messina actually gives? gives answers, right? So there’s, like the balance of like, who is liable here? It’s not clear anymore. We’re not used to it. We don’t know how to deal with that. What happens like for example, inside the company, right, like the customers that you’re working with? I have something and the machine, hallucinates and talks about an imaginary PRD. That doesn’t exist, and I’m doing a presentation. Right, right.

Amr Awadallah 33:48
I mean, exactly what happened to a lawyer. I think there is a lawyer. There was something in the news a couple of weeks ago, but the lawyer that GBT GPT made up a number of cases, as far as defense, I think when the judge will like none of this exists, like

Kostas Pardalis 34:03
if 100% 100% That’s the right thing a little bit, but I think it is an important topic. I think it’s something that’s if we want to move fast and harness the value of this amazing technology. It’s something that we humans, because we have to figure this out, like we have to figure it out. Right, like so what’s your opinion on that? And what’s the tooling? Like companies like Vitara can bring out there to help in doing that? Yes.

Amr Awadallah 34:32
Yes. Excellent question. And there is another question you had there, which is that developers out there and ease of use and how I see developers in the world, given my role at Google. So first, answering that question on automation of decisions. We haven’t been automating decisions already and doing it this is not new. Like again, I give the example of credit card fraud. Credit card transactions are being scored and automated in real time, and they have been improving significantly over the last year. 20 years 20 years ago, if you were to travel anywhere and use your credit card in a new country, it would get blocked right away, right? Because they just had a rule saying, if not in a country , a blocked transaction was as simple as that. And now it’s a lot more dynamic and figures out, Oh, you just bought a ticket with your credit card, we probably are traveling. So these systems are getting smarter over time. Google Maps it for, for me, Google Maps is one of the best decision making tools that we have ever used at scale. We follow directions we do go right, we go right tells us to go left, we go left, we are literally doing what the app is telling us to do. If it ends up sending up the wrong address, do people file a lawsuit against Google? Or hey, you’re liable for sending me to the wrong place? Not yet. I guess I haven’t seen anything like that. That said, Now, when you apply that to autonomous vehicles, which I have been using Cruz and Waymo. In San Francisco, you can use those in San Francisco now. It’s a bit of a big question mark for me, like who’s liable now when the car doesn’t? Excellent? Is it the car manufacturer that made the car in the case of whim was Porsche? Is it? Is it the AI driving the car? Is it me, the person who is renting the car or buying the car to do something for them? It’s actually a very big question. I don’t know the answer to that, that’s actually going to be very interesting to figure out how we’re going to figure out the liability of the mistakes. But what I know for sure we need to do is we need to minimize the mistakes, right? We need to minimize the mistakes. Because the more we minimize them, the more likely we will use them in the correct way. Again, credit card transactions being a perfect example of that. So if we continue to have large language models that hallucinate or be very hard to use in a business context, I have to first start by saying that hallucination is a feature not a bug, like hallucination is something useful. We use it for some things, when you’re creating a new majority picture and it comes out and you have a picture of nobody you have ever seen before. Or do you create a new movie script? By defining or a new Pawn by telling the shots readwrite? Let’s hallucinate, that’s literally it making up something new that nobody has seen before. That’s useful. But when you’re asking it, right, that product plan for me, or write a legal draft for me, you cannot tell us anything, facts that don’t exist. So the solution to that problem, which is, it’s not perfectly solved yet, I have to be clear about that. But we are minimizing it as much as we can, that we’re adopting at Victoria is called grounded generation. So grounded generation is about how can I generate that response to the prompts that you provide, but grounded in the facts, right, so when you’re saying, right, this new PRD, based on these pictures that we had in the past, and based on this design of our product, and based on the manuals of our documentation, then it’s only constraining that output to what these documents are providing, it’s not going and trying to come up with a new output based on its statistical model that it has in its neural network brain. And that’s how we minimize hallucinations in these systems. And that’s the approach that we are taking to the car. That approach has other benefits, by the way, by separating the knowledge in the last language model, which is what it learned about the world. And the Knowledge Graph, or the knowledge base of the content of your organization or your business, you are now allowing it to be real time as well, because now new data can be added to that knowledge graph. And as a new prompt comes in the large language that does not need to be retrained, we don’t have to go and retrain it and really fine tune it, it can pick up that response and start giving you an answer right away. So that’s another very key benefit that approach provides and brings to you. It’s also a lot cheaper to do this way than to find you in a model from scratch. And then last but not least, it solves a problem called model pollution, where you might be afraid that the large language model is being trained on your data. And now for using that large language model from another vendor. You might be afraid, Oh, what did that vendor jump into my space? So for example, let’s say your other stack was to put all of their documentation and even source code inside of a large language model. Oh, what prevents you now from repeating or other stack? Right? So you need to have this balance between the large language model not being trained on the data and simply serving the data back and interpreting the data. So here are some of the problems: this kind of approach of grounded generation, which by the way, that’s not unique to ask, like non generation is a technique that many companies do. I think we are the best at doing it. It’s also referred to sometimes as retrieval. Augmented generation is the name of that technique. Now, can I address your other question about developers? Yes, please. Yes. So I have a layman way I used to explain this. There’s two types of developers in this world. And they’re both important types. And they both spend money and we should as vendors, we should be building for both of them. The first time I called them the IKEA developers, like IKEA, the furniture company and the other type I called the Home Depot developers. So what’s the difference here? The Home Depot developers are very descriptive in nature. So the descriptive is the more technical term. So what that means is they like to describe to me what I can do, don’t tell me what I can do. Describe to me what I can do. Give me all the Lego blocks, give me all the pieces. And let me figure it out. So that’s like Home Depot, I’m gonna go to Home Depot, I’m going to buy the planks of wood, I’m going to buy the hammer, the nails and the soul. And I’m gonna make a nice desk myself, because I am. I love making desks is something I enjoy doing. Right. So that’s the first type of developer, most Silicon Valley pipe developers, they tend to be in that category, they tend to be given the building blocks. And let me figure that out. This white clover actually was doing very well in Silicon Valley. Because CORBA was a very complex platform that had all these Lego blocks. The IKEA developer, who prefers something like a Snowflake, or a Victoria, they are like, don’t, don’t let me figure out how to do it. I don’t know how to do it, tell me how to do it. So they are prescriptive in nature. They like to give me the recipe. The recipe has step number one, do this. Step number two, do that. Step number three, do this. And then at the end, I have an amazing solution that is working. So that’s IKEA, when you buy this desk inside of a box, and that has added steps that sap that prescription, you end up doing the prescription and you end up with an amazing desk. So that’s exactly the model that Snowflake followed. And it’s the model that the product is following. I think both are important. I think you can make good money from both approaches. I might be wrong. But I think that is more developers in the IKEA category that just want to get the job done. Give me a nice and easy to use API that gets my job done. versus the other category of giving me all the building blocks. And let me figure it out myself.

Kostas Pardalis 41:37
Yeah, I agree. I mean, we can talk a lot about that. Actually, maybe we should have like, an episode to just chat about like the different archetypes of developers and how they are affected, let’s say or how they affect these generative AI like revolution that is happening right now. We need to focus on a few other things. Today. So okay, let’s talk a little bit more about the technology behind data rights. So what it takes to build a platform that serves generative AI, it’s something that I mean, okay, we’ve been hearing about like, SaaS platforms for a very long time, I think like, pretty much like every engineer who has been around for a while they have, even unconsciously, let’s say, like a mental model of how a system like this would look like, right. But when it comes to generating AI, I have a feeling it will correct me if I’m wrong, but I think things probably are a little bit different. And platforms might be a little bit different. So how do you architect a system like McDarrah with the goal to not be like Cloudera, right? Actually, yes,

Amr Awadallah 43:04
So actually, Vectara does not like Cloudera meaning to be easy to use versus being powerful and complex. That’s more a product approach, actually, than a technology approach, per se, like, it’s like the products and how he designed the API, how you choose what to expose to your end users and whatnot expose, I think that’s more comes from that. But from the technological design point of view, it’s actually very similar, like we’re still using Kubernetes as our underlying fabric for deploying stuff, we need to be able to provision two servers that have GPUs in them. So that may be a little bit of a difference from the previous wave where we don’t really care about GPUs that much. So being conscious of how GPUs are being utilized is key. But the Victoria pipeline, if you look at it, we had like, I can find two, five components. We have a document parser that knows how to parse documents coming in and extract the text, but then tokenize that text. So a very key thing using this large language model is something called tokenization. And how we generate these tokens for different languages English, French, Japanese, etc, etc. So that’s the first component that we have. The second component is a neural network encoder. And the neural network encoder is that technology that knows how to go from human language space, meaning English, French, Japanese, Chinese, into computer language face, the lingua franca space, that is symbolic vector space that understands all languages. And that’s a very critical component. It’s actually one of our secret sauce components. My co-founder, Amin, was on the team that built one of the very first multilingual encoders at Google research. That is the space in which most of the operations take place. You will hear that space called an embedding space, you will hear the term embedding that’s what it means. In reality, if you’re a mathematician, it’s a vector. It’s a multi dimensional vector in a space that can have many I mentioned is an order of 1000s of dimensions. And so it’s not that 3d, sure a vector, and then every concept or every meaning is a pointer in that space in that multi dimensional space. And by the way, our name is Viktoria because of that, right, because of that the underlying theme of how these things work is the vectors. So that’s the second module. Okay, so the first module is the document extraction. The second module is the encoding into a vector space. The third module is a directional database. So you need to have a vector database that knows how to store these vectors. And then when new vectors come in for a question that somebody is asking, it finds the vectors that are closest in proximity, pointing in the same direction, which means they have the same meaning. It’s really as simple as that. But to make it work at scale is, of course, very hard. So we do have a vector database that we built at the heart of our platform, we leveraged an open source library from Facebook called the fire’s library, about then we added the extensibility, for it to be scalable, to be multi tenant to balance in memory will on this for the economics of doing this, and to be able to fine tune the speed with which is doing the matching depending on the use case. So that’s the middle component that does the matching. And then you have another neural network that’s called across attentional rancor. And sorry for getting too technical here. But that neural network is very essential, because that neural network takes the output of the vector database, which are here are the 10 facts that are most relevant to the prompt or the question you’re asking, you need to re rank these facts, so that the most relevant factors, number one, based on meaning, and understanding again, so that cross attentional neural network, it’s a bird, it’s a bird flight model, it ranks the facts that have been retrieved from the backward interviews. So the most relevant one to the prompt or question is the first one, and then the second one, and the third one. And that’s very important, because the last step is the neural network, which is the large language model that’s doing the summarization. And these summarization systems, they pay more attention to things earlier in the context window, meaning what you’re giving to them in the pond, versus later on the context center. Actually, they pay attention to the beginning and the end, and the middle can sometimes be glossed over. So it’s very important to rank these things appropriately. And then you so the last step in the pipeline is to pass this to the summarizer. That takes here of the input facts that we found in the right order, please now summarize this into a cohesive way to generate an output that I can give to my end users as a function of the prompt or the task or the question that just came in. So that’s literally our pipeline beginning to end, you can go and build this pipeline, you can use a length chain, you can use a pine cone as a vector database, you can use a modeling for Bert. From hugging face, you can use EBT, for the summarization, you can use adduct from open AI for the embedding. And you can build it all yourself. So that’s what the Tinker developer, that’s what the Home Depot developer would do, they will go do that. Or you can use Victorian, you don’t need to know about any of that stuff. You have one API with the data, the other API with the prompt, and it just works.

Kostas Pardalis 47:54
Yep. Yeah, no, that was awesome. One question. You mentioned the term facts, what is the fact? How do you define the fact like in the context of a data? Excellent question. So the fact

Amr Awadallah 48:07
that you can, the dumb the layman way of saying is assess results, right. What are the search results that are most relevant to the task at hand? The task I’m trying to do right now, but I think you’re calling them facts, because it goes back to this grounded generation, because the end output is not the 10 results. Really, at the end, we don’t want to show the 10 results to the end user. We want to show them the answer. So that’s why we are saying these 10 results are really the facts, the underlying facts in which the answer is being grounded as you’re providing feedback to the end user. And by the way, that’s tunable, you can say sometimes I only want the top five results, you can say I want the top 10 results, today’s static future where it will keep going down the tax until there is a bit of a relevance threshold, you hit where this next step now is not going to add a lot more information or contents to the response. Now, this is a bit technical. But the reason why you want to do that is in that the large language models are expensive to run, actually, they’re very expensive to run the loud language models. So the more tokens you give them in the inputs, the more cost you pay as you’re running them. So you want to minimize how many facts or how many words am I giving in this last stage. So that’s why we are researching dynamically tuning how many facts we’ll be providing to the summarizer and apologies for getting too technical here.

Kostas Pardalis 49:25
Like that’s BS, get us technical as you want, like we both us and the audience might enjoy that. Okay, we are close to the end here. And I want to give the microphone back to Eric. But before I do that, one question is just to make the grounding process that you described together with what you said about the facts a little bit more clear to the audience out there. So how does grounding through all these processes like this pipeline that you describe? I’m actually in between mandates and how much like the user needs to be aware of the process itself. Like how transparent the grounding processes are. Yes.

Amr Awadallah 50:09
Excellent question. Excellent question. So to give a layman analogy, it’s like you have two humans working together. Okay. There is one human who is very good at speaking English and giving you a perfect response to a question, but they have no knowledge. They don’t have the knowledge, right? They don’t know. What are the right answers to a given question? Like, for example, should I invest in this intrapreneur? pinging? pitching me on blockchain? From Greece? Right? So that’s the question, they have no idea. But if they’re given the right facts, they can say them back in a very good way. Right. So that’s one human. And then the other human is the human that knows everything. They literally read all the documents you have ever got. They read all of the articles, they read everything about this topic, so that when the question comes, it goes to that first human first, that human is not good at writing answers. They don’t have good English writing skills, but they have very good comprehension skills, right? They are really good at matching concepts to each other. And they can do them across languages. They can do them for English, Spanish, German, Chinese, all at the same time. So the question comes to them that first humans, the first humans look at the mountain, the knowledge base they have, and they have photographic memory. By the way, that’s not the key thing. That human remembers everything. Exactly. It’s not compressing the knowledge down like a lot of language models do. So that human now gets the question. They leverage this photographic memory to come up with these are the most relevant facts to be able to answer that question about whether you should invest in this intrapreneur pitching new blockchain from Greece, I don’t know how to compile them into a bass response. He didn’t grow second human, you’re very good at prawns, you’re very good at writing amazing English, or you’re very good at writing amazing Chinese, depending on the end user, and who asked me the question, or Greek, or Italian or whatever. And then they take these facts, they compile them now into responses that they give back to the end user. So by doing these two things together, we are now preventing this second user from making stuff, right and the second creative user, which is very good at writing, we’re telling it when you are getting that response to this question. Don’t try to rely on your memory because your memory is a compressed memory, the word of mouth language models work because they compress all of our human knowledge into a very small footprint, right? Met, like petabytes and petabytes of data gets literally compressed into 100 gigabytes. Like that’s really what the large language models are doing. So by definition, they’re lossy, meaning they will not have the tool and knowledge base in their head at all times. So by telling them only to give you a response as a function of these facts I just provided to you, that’s how you significantly reduce the probability that they will make something if it’s not zero stuff. Still, though, they can still make something up, we are thinking of adding another stage and sorry for like, going for too long in this response, where we do like newspapers. So what newspapers when they have reporters writing articles, they always have another step before the article goes out. It’s called fact checking, right? That’s where they extract the key facts in that article. And they send it to a fact checker, double checking that these facts are all true, and not something that the reporter hallucinated, as they write the article. We haven’t done that yet today. But we’ll probably add something like that in the future as well to help further minimize the probability of a bear attack. That’s doing that right. Getting hallucination to be zero is required for us to move from answer engines to action engines, we cannot move to action engines without solving that problem. Because as you said earlier, now you open up yourself with liability. If you hallucinate the wrong response, we tell you, yes, go invest in that intrapreneur. All of your previous documents say you should invest in that guy from the rich and then it turns out to be a bad investment.

Kostas Pardalis 53:56
Okay, that was fascinating. I’ll give the microphone back to where it is now.

Eric Dodds 54:03
All right, well, we are at the buzzer, as we say on the show, but just one final question for you, if we sort of zoom out, I mean, these are really critical issues. I love the concept of SAC checking. But, um, or what, when you think about AI, what keeps you up at night, in terms of the risks that we need to be aware of, in general? You know, I think that, you know, I think about Vic Tara, and sort of having an API that makes all of the knowledge of RudderStack available? Like, I mean, that’s a no brainer, when I mean, I think this is going to really change the way that our customers could interface with, you know, our knowledge base. But it’s larger than that. It’s more than just an API. And so and you’ve seen this, you know, over several decades so what keeps you up at night? What are you worried about with AI?

Amr Awadallah 54:59
You: So first, it’s not what you hear some folks saying about Skynet and gloom and doom and AI is gonna take over the world. And that’s not. That’s not what keeps me up at night. What keeps you up at night? Is humans. It’s humans who misuse technology. Like that’s not the question obviously, humans machines use technology. So in the same way, when Henry Ford and all the amazing people that helped make the car made the car, their goal was to make a device that helps us get from point A to point B very efficiently. The goal was not a device that kills 50,000 people per year, which is how many people just, I think only in the US die from cars. Sure. It’s not the goal. That was not their goal, their goal was not somebody to take a car and go, purposely run over people and kill them with it or drive it where the drunk can then kill an entire family. Like that was not their goal. We humans, we are the ones who make mistakes like that, or purposely use technology in a dangerous way. That’s what keeps me up at night. Like what keeps you up at night is somebody using large language models to control us in a negative way, right, because as I said, with these large language models, I can rehearse how to say things to you and to always buy the product I care about. And you wouldn’t know if you’re buying the product, because it’s truly good for you, or because the messaging you got was so convincing. And hence you ended up buying it. So that’s kind of the concern I have is like Cambridge analytica, remember Cambridge analytica, the Trump election, think that now I’m multiplied by a million, right. So it’s Cambridge Analytica analyzing our profiles, each one of us, but not now coming with segments of messages, we’re going to send these messages here, the play on their fears of these people on these slides, we’re going to make a message just for Eric because we know exactly what it can heal it. And they’re going to vote in our direction because of getting that message. That is one of my key concerns actually, for how this technology is going to evolve. And my answer to it is we need to have, if I had time to work on other stuff, I would go build that startup. But we need to have the antivirus like in the same way we have viruses for computers and viruses are so illegal. And when we catch people building viruses, they go to prison, but they still happen day in and day out. Because humans are bad. Some humans are bad, not all humans are bad. And they do these things. We have the antivirus now that can catch and stop these things. We need something like that. For this, to be able to when you are hitting a message, or when you’re seeing an ad or when you’re having a conversation with armor. I’m telling you why. Because the best product for your company, as small light would show up and say this is medication taking place. You’re being manipulated 80% of the time and only 20% is true goodness for you. I actually don’t know what the answer to the problem is. But when you asked me the question, what keeps you up at night? That’s what keeps me up at night.

Eric Dodds 57:41
Yeah, sure. Well, maybe we need to go back to Greek virtue, right? I mean, the heroes in Greek mythology, sort of operated according to a core set of values, right. And so perhaps it comes back to the humans on the receiving end as well. Who, you know, has a value system that can interpret that. So

Amr Awadallah 58:03
no, but the problem is we have different value systems. So this is definitely a topic for a whole episode, like we can talk about this for, like how do you know because I have a value system that might not be exactly your value system? Exactly the value system, somebody else and like, how do you do that properly? It’s really hard. It’s really hard.

Eric Dodds 58:19
It’s very difficult. Well, what a wonderful topic. Amr Well, let’s, let’s, let’s have you back on. And then we can just have an open discussion on that topic, which will be great. We don’t have to get into the tech as much of it. Thank you for sharing your story. Vectara seems amazing. Congratulations. And this has been an amazing show. So thanks for giving us some of your time.

Amr Awadallah 58:45
My pleasure. It’s been awesome to be on the show with both of you.

Eric Dodds 58:48
Costas. Wow, I’m trying to process this span, or maybe have that conversation with Amr from the vector era. I mean, what a heavy hitter, right? Like, I started a company that sold it to Yahoo and did data science stuff at Yahoo. Founded Cloudera went to Google, and then started accompanying assuming stuff, you know, on MLMs. I mean, heavy hitter might be an understatement for you know, someone who has that track record. But what an approachable guy, first of all, right? I mean, just conversational. Very helpful. I think one of the things that was really helpful for me was him breaking down the academic sort of the technical academic advancements that have enabled modern AI, as it sort of manifests in MLMs. Right, like when, you know, people like to talk about what they just you know, it’s like they scraped the whole internet. ICT and like Microsoft, you know, just provided unlimited compute power. He really defined sort of the academic calendar, starting in 2013 of the major breakthroughs and algorithms that enabled what we are experiencing today, which was, that was really helpful. To me, at least I hope for our listeners. But I think the other thing was that, you know, I guess if I had to describe one of my big takeaways is that AMR really seems like a steward of technology, if that makes sense, right? He’s not blindly forging ahead. And he’s thinking about what is being built. Right. And I think that actually is expressed in what his company vector provides, which is just starting out. It’s an API. Right? It’s an endpoint. But I think that’s reflective of his approach to how we wield these technologies. So I don’t know lots to think about, what do you think?

Kostas Pardalis 1:01:07
Well, I want to emphasize only one thing I will leave the breasts for, like our audience to go and listen to the episode. But I want to share with everyone that we spent about an hour with someone who came to the United States from Egypt in 1995, did a PhD in Stanford, started the company that was acquired by Yahoo. After that started, the Cloudera company that went public, was the CTO. They’re one of the founders, VP of developer relationships at Google. And today, he’s starting something, again, from zero rights. And the most important thing, which I don’t know, for me, at least, was probably the most unique and exciting thing of all, this whole conversation was like his energy and his excitement. Thinking about like, you have like, a person who has done all these things. And after doing all these things, has this energy and this excitement about starting something new? I don’t know, I find it something very unique, something that you can find with people in technology. And it’s, I don’t know what I would suggest for anyone to like these episodes, and they will be surprised with the things that they will learn from.

Eric Dodds 1:02:43
Yeah, yep. I agree. I agree. Someone who has done all that if they start something new, you probably should pay attention to it. Because they’re not going to make a light bet. All right, well, definitely listen to this one. Really fascinating conversation. We get into the details of our alums on a technical level, how do you actually go to market with that, but a great history as well. If you have not subscribed, definitely subscribe. You can get a data stack show wherever you get your podcasts, tell a friend and we’ll catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.