Episode 134:

Unpacking the AI Revolution and the Technology Behind A Feature-First Future with H.O. Maycotte of FeatureBase

April 12, 2023

This week on The Data Stack Show, Eric and Kostas chat with H.O. Maycotte, entrepreneur, founder, and CEO of FeatureBase. During the episode, H.O. shares his personal story of growing up in Mexico and his journey to becoming a CEO. The discussion also includes a look at the super evolution, AI advancements and future opportunities, the tension between authenticity and technology, and more.

Notes:

Highlights from this week’s conversation include:

The journey of H.O. into data and becoming the CEO of FeatureBase (2:37)
Characteristics of the super evolution in technology (6:36)
ChatGPT as the missionary of AI (9:45)
The tension between authenticity and technology (13:12)
What is FeatureBase? (17:53)
Comparing FeatureBase to Feature stores (25:58)
Workload capacities and possibilities in FeatureBase (33:20)
The importance of developer experience on a platform (38:23)
Exciting developments for FeatureBase in the future (47:13)
Final thoughts and takeaways (53:52)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show Kostas. Today we’re chatting with H.O. Maycotte. And what a story he has to tell, I am not even going to get into it. Because I’m so excited for our listeners to hear about his upbringing in a hill town without technology in rural Mexico. So there’s your rebel teaser. I want to ask H.O., of course, there’s a lot to talk about with FeatureBase as a company, which is a fascinating technology. But he’s also really into the future and really into AI. And so I want to hear about his vision for the future and for AI. And then how did that influence? Or how did it influence? You know, his founding feature base? And then, of course, you know, I want to hear the technical details. And I’m sure you have technical questions. So you’ve used it. So how do you use FeatureBase? So what are you going to ask about?

Kostas Pardalis 01:15
So there are plenty of questions. FeatureBase is very interesting, because it’s outside of okay, being like, let’s say a database for features, it has some very interesting use cases like I primarily, to be honest, I got interested in it, because I felt that it’s like a great technology to use for the use cases around CDP’s. And like making, like working with event data and creating audiences of like all that suffered, you know, like, pretty well thing. Both are like others and also working in marketing. But it’s much more than that. They do a great job in exposing the technology behind it, which is obviously very interesting for me, until they get into a lot of detail like, why they’ve decided to use a warplane with bitmaps. And how they use them and all that stuff, which might be a little bit intimidating for someone who just wanted to try the product. But for someone who’s curious and likes, understands, likes how it works, it’s amazing. And I think we will have a lot of opportunities to talk about building, like a business around a very new technology, what it takes, and what the journey looks like. Because there is cheering behind the feature base. It’s not like a product that says Like last like six months ago. And I think he was the right person to talk about that stuff.

Eric Dodds 02:45
I agree. Well, let’s dig in. H.O. Welcome to The Data Stack Show. Wow, we have a ton to talk about the future AI feature base. So why don’t we start where we always do? So give us your background? And kind of what led you to starting FeatureBase?

H.O. Maycotte 03:06
Yeah, thank you all for having me here. Yeah, my name is H.O. Maycotte. I’m the CEO of FeatureBase. I was born and raised in Mexico, in a hill town that had very little access to media and technology. So it was a place where, like, your wildest imagination could run free without a paradigm. And, you know, I always from a very early age was obsessed with the idea that humans were just on the edge of being superhuman, and that, you know, society and life, not necessarily humanity. But life in general was sort of on the edge of this super revolution. And so as that took me through school and my career, I continue to pull on that thread, right? I think especially what’s happened in the last six, eight weeks is really fascinating. You know, I’ve lived my life for this moment, like, some call it the fourth industrial revolution. But without a doubt, AI is officially here. I think Chad GPT-3 is like the mission for AI. It’s not exactly how I thought it was going to manifest with like, funny images and marketing copy. But for good or for bad. It’s here. And I do think that, you know, what’s happening will allow it to evolve to its next order, fascinated by it. And I think that, you know, we as humans need to sort of unite to make sure that we’re helping machines help you humans and not, you know, helping machines and humans help machine because if we’re not careful, you know, a lot of our innovators are focused very much on technology for technology’s sake. And so we’ve got to remember, we gotta keep that in mind. So, you know, so every day, I love building on the future. I consider myself an ambassador of the future, but also trying to make sure that, you know, at least the role I play and that my company plays is trying to keep that balance. And remember that we’re doing this to help us and life and not the other way around.

Eric Dodds 04:45
Yeah. So much to dig in there. Let’s go into the hill town. In Mexico, though. You said from an early age, you were enamored by this concept of their sort of being You know, humanity almost going through, like a stepwise change. But you, but you also said you didn’t have access to a ton of media? Where did the seeds of those ideas come from?

05:10
It’s a really good question.

Kostas Pardalis 05:11
I think, you know,

H.O. Maycotte 05:12
I just couldn’t handle the idea of a little kid that was like, you know, you broke your back, you were paralyzed? Like, why can’t you fix those nerves? Like, it’s just, you know, it’s just matter. It’s just, you know, it’s just, you should be able to fix that or right, like, why, if we have a heart condition, does the rest of the body fail with it? So those thoughts have always sort of upset me, maybe because I didn’t have media. I didn’t realize it wasn’t possible. Like, I thought it was possible. I always thought it was possible. And I think, as I’ve gotten older, you know, I continue to try to challenge why these things are not possible? And, you know, without a doubt, I think we’re gonna solve them all now. It’s just a whole nother subject so we can get together for another episode. I think as long as we don’t have a social revolution, and interrupting all of our advances, you know, I think we’re gonna see a wild amount of innovation coming at us. We as humans, like I said, in my purpose statement, right, you know, we have to help machines help us, I think we’re just gonna have to learn to adapt to very rapid change, right? Like, just the impacts that social media have had on kids and on society is 15 years old, right? Like, it just came at us so fast, like, all of these things are coming at us so fast. And, you know, the first two industrial revolutions were like, 100 years each, the last one was 50 years, you know, this one’s gonna be like, 20. So anyways, it’s gonna be, it’s gonna be fascinating. And I think maybe that idealistic, you know, up in the mountains, flowers on the hills background, you know, kept me from the paradigms that block my thinking, so, I love it. Yeah.

Eric Dodds 06:44
I love it. Yeah, the mind of a child and, and sort of, you know, imagining what adults think is impossible, it seems like you’ve carried that through your life and career, which is truly inspiring. One quick question on the Super evolution. So you mentioned a couple of things that, you know, were like medical in nature, you know, sort of biotechnology, but what are sort of the big? You know, I certainly think that’s, there’s going to be a huge breakthrough in biotechnology for, you know, sort of medical things, but what are other characteristics of this super revolution? What do you see?

H.O. Maycotte 07:19
Yeah, I think a lot of people are asking the questions right now, like, you know, what is it going to take to sort of bridge the gap between the AGI and, you know, the general intelligence that we humans sort of dream is coming along, right? When does the machine wake up? And all of the narrow AI and the simple AI that we’re doing today? And always try to reframe that question. I actually think it’s a much broader question, right, it’s like, AI is on a journey, and it will get there. Again, unless we have social unrest that stops technology, we will be there sooner than we think. But I think the other end of the spectrum is really what’s happening on the synthetic biology side, right? Like, we’re trying to, like, make little robots and we’re so impressed when it can, you know, jump from one box to the next box. But like, you know, imagine if you could reprogram a cat, like the cat is a million times more advanced than that robot, you know, I always make the joke that, like, if I could have a swarm of programmable chinchillas, right, I could create a lawn business and have these chinchillas come like, mow your lawn and fertilize for free. And you know, they’d have a blast doing it. And it would be entertainment at the same time. But like, that is probably closer to us in capabilities than, you know, creating an army of 100 little robots that come out and mow our lawn or, you know, do our landscaping. And so there’s some fascinating things. A friend of mine started a company called colossal. And their big, audacious goal is to bring back the wooly mammoth. He partnered with George Church, the co-founder of LinkedIn CRISPR, and they are 100%. And notice I say they are 100% Convinced that they’re bringing back the wooly mammoth in about five years. And it’s fascinating to have hundreds of samples, they’ve been able to put together the genomes and for us, that are technologists, what blows my mind is that they’re starting on one end with the Asian elephant genome. And they’ve sort of mapped out by putting together all these fuzzy genomes, what the wooly mammoth Jima genome looks like. And there’s only 60 genes that are different, right? So they flip, you know, they flip the bits per se, on the 60 genes. And all of a sudden, it has like four times the mass, it has long hair, it has, you know, different hemoglobin flowing through its body. Like imagine if we had gotten around to build a robotic Asian elephant. And now we wanted to build a wooly mammoth, robotic elephant. It would take us millions of lines of code and years of effort and engineering to think that we can go from the Asian elephant, the mammoth 60 genes is just mind blowing. So like, whatever programming language nature uses, it’s several orders more efficient than the ones that we’re using. So like if we can figure out how to harness that. That might be a faster path to some of the things that we’re trying to solve. So in any way, in any event, like I’m still holding out or what Path is gonna be faster. But I do think we need to consider both paths as we think about the future.

Eric Dodds 10:06
Fascinating. I could dig into that for hours. But let’s do it a little bit more towards data and technology and dig into AI. Right. So chat GPT is a huge topic of conversation. I love your description of chubby GVT, as the missionary for AI. I agree that we’ve actually had various discussions about AI on the show, you know, throughout our entire time, which has been really interesting. And I think chat, GBT is kind of the first thing that that we’ve seen, or that we’ve discussed on the show that has sort of like mass practical, like appeal and utility, you know, where as a lot of the previous iterations of AI sort of live behind a shroud or you consumed it, you know, with some level of obfuscation is like an end user. Or people just didn’t understand it, right. But you know, anyone can go into chat, GBT ask you the question, and they get an answer. It’s like, wow, okay, this is real. But you didn’t necessarily expect that, you know, me, as a marketer, I’m gonna use that to like, get my new product pages for RudderStack. You know, all fancy, and then, you know, generate my blog images for my blog posts. What did you think was going to happen? You know, if we rewind five years, and you said, Okay, the first sort of the first missionaries are going to be like xy and z in terms of the manifestations of AI. Yeah, I

H.O. Maycotte 11:29
mean, I think if I had a robot right now that can lift and stack boxes, like I’d sell the heck out of them to Amazon, like, I think we thought like AI and some of this robotic stuff is going to be taking our jobs from the bottom, right, like so just one insight I have with what’s happening now is it looks like we’re gonna go after the middle, right? Like, the legal profession, for example, has always sort of feared technology, because it couldn’t quite do their job, right. For the first time, it sounds like, like, this evolution revolution in search, which we call chat, GPT, which we call these large language models, can finally, you know, do that work really well. Or, in fact, it’s probably really well suited for it. And so, it’s so, you know, one of the interesting manifestations, it’s sort of going for our jobs in the middle, but I think more than anything like it relates to us, that relates with us, like we relate to it. And so like, we’re convinced that it’s real, and whether it’s real or not, I don’t really know, like, you know, we think it’s real. You know, there’s a lot of questions about content, credibility, bias, and all of those things that get baked into it. But like, we as humans are kind of suckers, right? So as long as we think it’s real, which I think we do think it’s real. And I think it’s going to have really profound implications, right? Like, I think a, we all believe in AI now, like, you know, my little cousin, my grandmother, like everybody talks about large language models. And I think it is gonna go after some jobs and change the way we operate. I think what I’m kind of obsessed with is how do we democratize that? It’s really like three or four really big companies that have access to all of this? Yeah. But for everybody else, AI sucks, right? Like, data still sucks. So AI really sucks. Like, how do we make this so that we can augment ourselves? Right? Every email, every text message I have, every time I buy a piece of clothing, like I should be able to have my own version that’s helping me every day. And it’s my old personal co-pilot. So, you know, we have a lot of opportunities over the next few years to think through the different paths that can take all of this, but certainly the wheels are in motion now. And it makes me so happy.

Eric Dodds 13:31
Yeah. Can you just take me? I want to go to a little sidebar here. You mentioned like, is it real? Can you describe the tension there from your perspective? Because I think that’s interesting, and I’ll tell you, I’ll give you context for the question coming from my brain is that, you know, I think about it, I tend to think about it from the technology side, right. And so, I mean, this is a real model, like taking real input and using real training to produce output, right, but there are a lot of people, you know, the education space is a good example, right? Like, is an essay that’s produced by chap? GPT? That’s really good. Like, is that real? You know, like, is that kind of what you’re getting at? Yeah,

H.O. Maycotte 14:15
I mean, I think maybe even a little more existentially. Like, what is real? I, I helped start a nonprofit media organization called The Texas Tribune, a long time ago, you know, and one of the premises behind starting that organization is that like democracy requires an unbiased source telling you, you know, covering the facts, but like, even as factual as we tried to be, there was probably some bias in the way we wrote it. And the way we report it, the media tends to lean left in general. But, you know, if it was on national TV, you know, 30 years ago, it was real, right. Like, for the most part, we all believe that whether or not it was real or not, like we got harder with newspapers and print publications. And, you know, like, I think we had an inverse curve hit spike when the internet got created. And the inverse curve is like, critical thinking went straight down. And, and, and the proliferation of any version of reality that you wanted which straight up? Yeah, right. And so now what GPT has done is just, in these large language models, it is just obfuscated, turning all of that into a black box. So like, you know, in fact, checking was already hard at critical thinking was already hard, like, this stuff is really convincing. Like, you know, like, chat GPT, tell me how I’m gonna go lift a 10,000 pound weight? Well, it’ll tell me and it’ll be very convincing and exciting and enthusiastic about how I’m going to go about doing that. So, you know, that reality, I don’t know. But like, it’s going to be a lot harder to fact yet, it’s going to be a lot harder to, you know, to, to figure out what’s credible, and that’s credible. But to some degree it doesn’t matter. Like we’ve been believing, you know, what, media internet, now these models are feeding us and, you know, for good or for bad. Reality is going to be further distorted.

Eric Dodds 16:05
Yeah. How do you balance that? Sorry, I’m gonna continue on the sidebar, because how do you balance that as an ambassador for the future? Right, because of you. And thank you also for answering the question with beginning by saying, like, even more existentially, I love that. So how do you manage because that’s somewhat of an existential crisis, or it’s like, I’m an ambassador for the future where these things are coming to fruition. However, you also acknowledged that the distortion of reality poses like, you know, serious questions, you know, for society.

H.O. Maycotte 16:38
Yeah, it’s a good question. Maybe with a quick sidebar on sort of some insights I’ve had on myself recently, I’m definitely an optimist. Everybody tends to ask me how I’m doing on a scale of one to 10. And it’s always a 10. And people are happy to be a 10, like, your house is on fire. I like it, but I’m a 10. You know, so, I have this weird ability to not feel to some degree. So, you know, don’t ask my wife and children house having a non empathetic husband goes, but

Eric Dodds 17:05
they like scale of

H.O. Maycotte 17:06
One to be was a good question. But I tend to think that, like, you know, it is what it is, right? This is evolution, this is life, I think, you know, Darwin will kick in there somewhere, but like, there are for sure gonna be some tremendously negative consequences that come from this distortion of reality, and those that have the power to create the content, whether they’re doing it, you know, consciously or subconsciously have some consequences on their hands, right. We’ve seen these with the social networks, like, without going into the details, you know, these social networks have had a profound impact on my personal family, you know, and so I think we’re just sort of at the beginning of seeing the consequences of all of this, but it’s evolution, right, like, you know, we will, we will evolve as a result, but, you know, let’s go read the Internet and Russia right now. And let’s go read the Internet in Mexico right now. And they’re gonna have a very different version, about the exact same, you know, about the exact same current events. So, yeah,

Eric Dodds 18:02
yeah, well, always appreciate an optimist. And someone, I mean, in many ways, you’re accepting the reality of the inevitable, you know, which I really appreciate. Okay. Let’s talk about how all of that is packaged into, you know, or what pieces are packaged into you starting feature base, you’re a serial entrepreneur, feature base is your latest company. Can you give us a quick overview of, you know, sort of feature base, like, what it is what it’s used for, and then circle back and tell us like, why did you start it? Was it in response to some of those sorts of fundamental beliefs?

H.O. Maycotte 18:40
Yeah, I love that order. And so, so yeah, its core feature base is just a really fast analytical database. It’s all in memory. It was inspired by the feature extraction and engineering process. And so we figured out that, like, most data was originally stored in records. And then people started storing it in columns to be able to analyze it, but every column they created was yet another copy. And so we sort of moved into this, like, let’s move and copy data in order to analyze it. And when we had this penicillin moment, which we’ll get to in just a moment, and create a feature base, we realized that if you stored sort of data at the value at the feature, you know, machines could process that information much more efficiently that it was a way of empathizing with the way machine wanted to process data, and not the way that humans process data. And so it’s a feature that is far more computationally effective and efficient than doing it the way that that human construct has led us to do it with traditional analytical formats. And so we’ve invested about $30 million into the technology. As I like to say it’s kind of like the particle physics of data like, I think our i o underneath the hood, probably on a Guinness World Book of Records every single time we, you know, we’d go after one of these bigger and bigger workloads. And so naming that has taken a lot of effort. And we’re absolutely maniacally obsessed sort of on getting the developer experience so that people will use it and make the bad joke that it’s like a flux capacitor. So unless you have a DeLorean, you know, time travel is going to be pretty difficult. But we’re making huge strides right now to, you know, to make it adoptable and usable. And further, I’m starting to make some moves to, to perhaps, you know, build a service around it, that makes it a lot more than just a database, like infrastructure is really hard. And, you know, we sell this amazing engine. And, you know, if I went to you and said, hey, you’ve got a car? What if I could give you an engine that way, you know, 10 times as fast with 10 times less fuel? You know, would you like it? You’d probably tell me, yes. But the next day, I show up at Kostas, his front door, and I say, Hey, here’s your engine, good luck. Like, it’s not gonna, it’s not gonna go very well. Right. So I’m trying to think through like, you know, how do we bolt on a steering wheel and some wheels and then further, like, you know, shouldn’t have a driver, right? Like, you know, I’ve gotten data in Snowflake and I want to run this model. Right? That would be really nice. Right? So I’m currently trying to go from like, the unbelievably efficient Analytical Engine, you know, to what can that power to actually deliver a full experience to the end user? So we’re sort of in those throes now.

Eric Dodds 21:19
Yeah, absolutely. And you mentioned the penicillin moment. You may have mentioned it, but can you reiterate it? If you did?

H.O. Maycotte 21:26
Yeah. So I’ll give you two penicillin moments. It was like penicillin and squared. So the last company I had was called, on board was a CDP for sports media and entertainment company. So we had about 10% of the world’s sports teams as clients. And the origin of that company was a little bit less commercial. And it was, I’ve always been obsessed with trying to democratize access to these things to consumers. So when we invented the technology that powers the feature base, we were trying to sort of think through things like, what would a digital genome look like? How could we represent a human and all of their attributes, and those could be behavioral medical, otherwise, right? So these lead to the format that’s underneath the hood for the feature base, right? It’s like the presence of an attribute or a behavior, and that underneath that gets stored as a bitmap. So we were like, Hey, let’s noodle on that. And you could take, you know, my genome and your genome and find the pattern. And you know, those aggregates could tell you how we would pave without knowing h2o or Eric Right. So it was pretty powerful for analyzing audiences and consumers and behaviors. Very quickly, we realized that we really couldn’t find a way to empower the consumer directly, there wasn’t a business model to say, Hey, let’s go help you take back your data, and we’ll help you monetize it. So we just leveled up a step to sports media and entertainment companies who had these amazing fan bases. And we got quickly enamored with the idea that every customer would bring, like hundreds of millions of people onto the platform, right, like a TV network, a sports league, like it was hundreds of millions of consumers every time and we love that, you know, because it works really well inside of this data that we had. But before we fully got our technology ironed out, we were trying to use everything out there that we could think like ElasticSearch, we’re trying to force it to do these, you know, these filters, aggregations sorts, you know, rink sorts, and you know, at that scale was just very difficult. And so, our most important queries were going from sub second, second to minutes. I think, at one point, our 40 node Elasticsearch cluster was returning that query in about 20 seconds. So, you know, we were starting to cache things. And I was just obsessed with the idea like, no, no caching, everything has to be real time, it was probably a stupid obsession at the time, because the end user didn’t care. But I was obsessed with it. And so that’s where this idea of the digital genome and the ultimate format that ensued came in, we had two engineers that had been doing quantitative stock market trading their entire career. And they just said, Hey, Joe, we’ve got this wild idea. You know, every time we prepare data, or the models that we use to trade stocks, we’d essentially go select all the features that we wanted. And we’d store the data features, which were essentially decision ready, you know, data, it was a one or a note or a zero, and it was there and then the model would use these features as inputs. What happens if we just convert all of the data into features that it’s getting created? And what if instead of putting it in a database, or sort of human centric data, what if we create a format specifically for features and store it in that native sort of form? And so I gave them six months, and they came back with a two node cluster of the technology. And within weeks, we commissioned the 40 node, Elasticsearch cluster, and it could do our segmentation and aggregation queries. And, you know, single digit milliseconds. And so that was, so one moment, it was like, wow, we just defined physics in a way that’s so simple, but at the time, it just could do like really high cardinality workloads. Everything was Boolean so yes or no. So, over time, I eventually convinced my board to let me spin that out about four and a half, five years ago into its own company. Like I said, we put about $30 million into it. And so we started teaching it. Things like integers, how would you store integers in a binary representation? So we found this white paper called Slice indexing. And so you can store a 64 bit integer and seven, and it would still have the performance of the underlying bitmaps. Right? So you could do range queries on and, and all of those kinds of things. And so eventually, we taught it floating point. And we use these compression techniques to be able to do dense mixed density, ultra high cardinality data, a bitmap compression technique roaring, we modified it and made it a 64 bit version of roaring, I think we were the first to do that. And then we stuck it in a B plus tree, so it could behave more like a regular database. And, you know, fast forward to today, we’ve got what mostly looks like an analytical database, but it’s very different underneath the hood.

Eric Dodds 26:01
Amazing. All right. Well, that is actually the perfect time, Kostas. I have tons more questions, but please, I know that there’s so many technical questions that crop up based on HS description. So take it away.

Kostas Pardalis 26:15
Yeah. So it’s, uh, let’s start by talking a little bit about feature stores. Right? Yeah. Like the term feature store has been around for a while now. And probably the likes, posts, the peak of the hype cycle, let’s say, right. But when I was trying to like my brother Southwater, feature store ease, I was confused, to be honest. Yeah. Yeah. It wasn’t. And I guess, like, it’s still easily like in most of like, its implementations, like a single data store, right? It’s pretty much like a call architecture that tries to support both online use cases, or let’s say, like real time use cases. And those are like bots, use cases in terms of processing the data. Because it makes sense, right? Like you have your historical data, obviously, it’s going to be a bus process, right. And then you also have the data discounting and you want, like, as soon as possible, to create features and feed them to the models. So you’ve never felt to me like we are talking about the database system? Yes, they were using various components like Snowflake, to hive to Databricks, to everything. But I think the most interesting part of the feature stores was Redis. There was always a Redis there that was storing the features and serving the features, right? So at least that’s my understanding of the features or what’s like how you kind of experience like feature stores? And also, how do you compare it to what’s a feature base? Right?

H.O. Maycotte 27:57
Yeah. And so I think, you know, this is a lesson for a lot of technologists and entrepreneurs, I’m not gonna lie, when we invented this and solved our problems so well and humble, that we didn’t have to do a whole lot more to it really solved the high cardinality, segmentation use case that we built it for. But I’ve always had this blind faith that it can solve a whole lot more than just that. And so we’ve been a bit of a proverbial, like, you know, solution chasing a problem for a while. And that’s always a hard place to be. And it takes a lot of blind faith. And it takes a lot of optimism and grit, and all of those things that make us crazy, as founders. So we’ve gone through a journey of like, what are we and, you know, trying to meet the market where it is trying to meet the chasms, where they are. And so, as we were exploring a category change, about four years ago, three, four years ago, we were looking at sort of the underlying process of turning data into features and what we were doing underneath the hood, right? There’s one hot encoding, there’s no variety of things that you’re doing. And so, you know, can you call it a one pot database, you know, what do you call it, and at the end of the day, we were storing features. So we’re like, hey, it’s a feature store, like a data store. But instead of data, it’s features, right? So we met, like a storage system for features. We didn’t mean like a model lifecycle management, right, that does versioning, and lineage and all the other stuff that the modern sort of, quote unquote, feature stores do. And so we’ve been working on this launch, and we’ve relaunched as a feature store, and literally within weeks of sort of changing our category, then the Michelangelo project sort of spun out and you had Feast and Tecton. And, you know, they redefined ultimately, and not even redefine that we hadn’t quite defined it yet, except for ourselves. They really defined ultimately what feature stores were going to become and, you know, their versions of feature stores, which I think still align with the current definition, are more of in my mind, a model lifecycle management system. They’re not a storage system for features. They’re really helping you manage, you know, the creation Then management of features, the offline and online features, and most of them have at least three databases underneath the hood, right? So you’ve got a variety of databases that are going to come together to solve that problem, which is a very different problem than the one we set out to solve. That data at scale is very difficult in that you’d have to copy and move it and that everything is batch, right? Like and yeah, if I go process my features in a batch and stick them in Redis. Yeah, I can serve that really fast. But what if I could just compute those features on the fly? What if I didn’t have to pre process those features? What if my transformations, aggregations and joins were happening in real time? What if those were in the model instead of in my pipelines, and in my batch jobs, it would be so much easier to track lineage and versioning. Right, so that was what we were trying to solve with our feature store. But it became a difficult sales process, because our top of funnel was full. Everybody was interested in feature stores, and we’d show up and we’d be like, well, we have a feature storage system. And they’re like, well, we want to put this model in production, how are you going to help us and we’re like, we can’t. So sadly, we had this sort of pivot out of it. I do think at some point, it’s going to get redefined. Again. I feel like the categories sort of slowed in interest. But I think features are an unbelievably important part of the future. I mean, it’s the way machines think, not the way humans think, we want filing cabinets, let’s keep our filing cabinets for the humans, but like machines, love features, models, love features, you know, CPUs, GPUs, they love features. So I do think features and a feature first future is going to dominate the way that we scale

Kostas Pardalis 31:35
data. Yeah, that makes total sense. And how would you compare, like, feature base to vector data bases? Like, what’s the difference?

H.O. Maycotte 31:46
It’s a really good question. So we have floating point support in feature base now, but we don’t have native floating point. And by that I mean, the same technique that we use for integers, we’re now applying it, we’re in development on it, we’re now applying it to floating point as well. So being able to store it as a 64 bit float, and we’ll see exactly where it ends up. But let’s call it Timbits. So we’re pretty excited about that. Because our core engine should be able to serve full feature vectors at a fraction of the cost, and much more scale than the ones that are currently out on the market. And I also believe that it’s it. So much of AI today is batch. So much of AI is based on records, training a model and developing a score, but we avoid the analytical queries, like looking at an index and looking at a population when we’re outputting those scores because the queries are really expensive. So I think there’s an entirely new paradigm, when an analytical database can serve both a last mile transactional workload and the core analytical workload and do the feature vectors all at the same time. And so we have all of that in Preview internally, but we’re very cognizant that being able to store a full feature vector efficiently is a pretty killer feature. And so we’re very quickly developing it right now.

Kostas Pardalis 33:08
That’s super cool. So the only, like, maybe the only, okay, it’s an important difference, but like the main difference is like in the data lives, right? Like what kind of data like one system or the other compatible? So with the Vector Systems, you have floats, representations, primarily rights? Well, right now, with a feature base, you are working primarily, like with integers. And under the hood, what you have, there’s like a bitmap, which is like a series of zeros and ones, right, exactly. So what kind of workload can someone experience today with, like, HBase? Like, well, it’s like, let’s say, the best scenario for someone to go and try a feature based on how, like it will be by using it.

H.O. Maycotte 33:54
Yeah, I mean, I think it’s, it’s, for good or for bad. And I’m happy being transparent with all of my flaws. And all of our challenges, like, for good or for bad feature base has been a database of last resort for very large workloads, right. So when 1000s of servers have not been able to solve the job, you know, our customers have been willing to invest the months that it takes to wrap their head around the data model, you’ve experienced a little bit of this. It is a distributed system. We’ve got high availability features, you can do replication factors, you know, all of those things pretty important for the type of workloads that we serve. But for the most part, we’re serving very high ingest workloads that need rapid segmentation and filtering on that data. So that is really the bread and butter for the feature base. Now, we’re very quickly able to wrap a lot of other workloads around it, but until it’s easier to adopt, which is becoming those workloads that we serve, and I’ll give you one example, a company called shimmer video. When we first started working with them, they had about 1000, node Hadoop and druid clusters that served up there, they were storing somewhere about a million events in that cluster. And then they would run predictions on the consumers and the devices that were feeding data into this cluster, and it would take them about 24 hours to generate those predictions. So we came in, and we were able to reduce the 1000 servers to 11. And we could do those same predictions in about a third of a second. So a couple of things that are important to mention here is it was 1000 to 11 servers. So sending millions a year in Compute now, don’t believe everything HE says, the 11 servers are a lot bigger than the 1000 servers that were using previously. But they did the calculation, and it’s about a 70% reduction in cost. But I think what’s more important is now those workloads, let’s just call it a second, instead of a third of a second they happen in that query. So instead of taking a day, it would take a second so you could run 84,000 queries and the day on that same, you know, compute cluster that they had. So just absolutely changed the way they ran business, they were now completely real time in a space, this is the advertising space. And they’ve now scaled that up to about a trillion events a day. And tracking about 20 billion devices globally. And literally, there’s just nothing else that could do that. And I love those, those are great. Those are like trophies you can put on the wall. But like, that’s not the everyday problem, right? That’s not the problem that the masses had. And to build a really big company, we’ve got to find problems that most of the masses have. So that’s why we’ve been maniacally obsessed with developer experience, which we realize is the key to that mass adoption.

Kostas Pardalis 36:37
Yeah, I’m gonna presume to have like a couple of questions there. But so from whether there’s like a, let’s say, like a CDP scenario, like it’s pretty much ideal, right? Like for, like, market market? Yeah, marketeers. Let’s say we want to be able to segment and create audiences. And like, all the standard things that we’ll say someone is doing with CDP like you can do that scale and like with extreme, like low latency by using a feature base. Right,

H.O. Maycotte 37:07
exactly, exactly, exactly. We typically break it up into like three buckets, right, like consumer experience, which includes personalization, segmentation recommendation, all the things that you just talked about, that are natural to a CDP anomaly detection is highly faceted as well, right? So it’s something that has to happen really quickly. And that feature stores and the approaches today, pre-processed the data, right? If you’re a credit card processor, you have to decide if it’s fraudulent or not, and like 50 milliseconds, and they can do that. But you know, why? Because the fraud vector they’re using to make that decision was pre computed. So it was probably being served out as Redis. But it might have been pre computed 24 hours ahead, like, we see these companies pre processing, you know, in days, so that’s not okay. Right, we’ve got to, we’ve got to process that in that moment, based on the totality of all of that data, right. So there’s another really great opportunity for this real time, you know, workload. And then lastly, I’ll say a lot of the stuff happening in AI is really interesting, especially around computer vision, you know, things like labels, you know, once they get tokenized and transformed out of there sort of RAW formats end up being highly categorical, right? They look a lot like consumer behaviors and consumer insights. Right. So at the end of the day, there’s really unstructured search, which gets turned into structure. And then there’s structured search, which is faceted search, right? So at the end of the day, it kind of all leads to the same place. So, you know, we, I am optimistic that this is going to serve a variety of important workloads, as we keep innovating.

Kostas Pardalis 38:35
100%. Yeah, I totally agree with that. Yeah, it’s super, super interesting. So how, let’s go and talk a little bit more about, like the developer experience now, right? Like, you’ve been building the product for a while now you have customers, you’ve seen, like, what it takes to take, like a new piece of technology and like drive adoption, it’s not easy, right? And it seems that more and more like people start to believe that like, it’s not just like the user experience, it’s like also like the developer experience, which is like pretty important, like making sure that like you can help developers succeed in whatever they do is an important aspect of like, succeeding or not, like bringing, like a product market. So without like, a few clicks, through this process, like what you have experienced?

H.O. Maycotte 39:27
Yeah, I mean, I think being in love with your own technology is a huge problem. You know, and empathizing with the end user is all that matters, right? But what others think of you, your product and your brand, is what really matters. Obviously, that’s like one on one. I think, you know, it’s important to explain a little bit about our journey. So we definitely were originally an open source project under the name palooza. So that was the original name of the project. And by all measures, we were wildly successful. Investors sort of flocked to us in You know, saw the stars going up and, and as soon as we took on investor money, the investor said, Well, this is so important that you’ve got to plan, you’ve got to turn your sales process into an enterprise sales process. Huge mistake number one, because if you look at it, the curve of innovation and the chasm that we all have to cross to get on the other side, like analytics, was in a prime spot at that point. Today, I would say analytics is way off on the other side. So at that time, you know, Product Market Fit was great. And we decided to start selling this from an enterprise perspective, we ended up going about as high in the organization as we could. And the sales cycles were long, they were very large contracts, you know, like, half a million million dollar year contracts. And we would sell before people would adopt it. So the developer for experience, seemingly didn’t matter. And let’s hang on that word, seemingly, because we’d go sign a contract. And then the teams were introduced to, you know, the feature base, and they were like, Hey, we just signed this contract, go figure out how to use it. And so they kind of had no choice. And it was a painful process. But we had an army of customer success, people and deployment people, and we would go help them get it implemented. About a year ago, I just looked at my board. And I said, This is crazy. Like, we can go build a big business, maybe, you know, $100 million business, but I don’t want to build $100 million, and I’m talking about revenue, I don’t want to build $100 million business, I want to build a billion multibillion dollar business, and there is no way we’re gonna get there. We have to read captivate the hearts and minds of the developers. And so a year ago, we fired all of marketing, all of sales, and we went plg. And at the same time, we decided we were going to take a year to bring certain things to market, like we’re almost done with SQL, you know, I know, you’ve been using the product a little bit over the last couple of months, we have a lot more. Along those lines, we pushed out a whole new iteration of documentation yesterday. And so it’s been a mad rush for the last year to remove five API’s to have we had to ingest API’s, everything’s now gotten standardized around SQL. And SQL was difficult for us to get our minds around before because we’re like, there’s a much better language for bitwise operations than a bitmap oriented format. But it didn’t matter, right? That was great in our own heads. So we’ve had to have a strong dose of reality over the last year, as we’ve worked on this developer sort of adoption, and I think we have another six to 12 months to go before we can say, hey, this is now adoptable. And in the meantime, I’m working on plans to acquire a few companies that are going to eliminate a lot of those challenges to write like, why should someone have to buy or install yet another database to go run models on their Snowflake data like solid Snowflake or Databricks, or Redshift like, you should just be able to tell me where your data is and what model you want to run. So a lot of what we’re working on now on the roadmap sort of addresses that like making it even easier. So, you know, we’ve got cloud and cloud consumption, out to market, we’ve got sequel almost out to market. One of my favorite areas is user defined functions. So being able to register functions in the database, and run them actually in the database, as opposed to having to move and copy data to the models. And then Serverless is another big piece that we’re finishing right now. You know, to bring costs and efficiency, even further down.

Kostas Pardalis 43:15
That’s a pretty basic roadmap. Sounds like fun. Sorry, go ahead. I think when the guy was

H.O. Maycotte 43:22
gonna say it, it is and it’s been a year’s worth of work. And we’re really excited to start to see these things coming to fruition now, but they can never come fast enough.

Kostas Pardalis 43:30
Yeah, 100% was like one of the most, let’s say, surprising learning that you went through, like, through this, like transition of being by these high dots, from top to down kind of sales, like in the enterprise, and leaving that behind and trying to go like after other developer like, what’s, what surprised you?

H.O. Maycotte 43:54
Yeah, I mean, I think everything is about market timing, and product market fit. And just because you have it at one point doesn’t mean that you have it at the other, like we had it four years ago, but when we switched to enterprise, like, you know, we learned that motion, but it was inefficient. And then by the time we came back, like, you know, who really cares about very fast analytics, right? Like, I mean, it matters, but it’s, it’s not the problem that’s on everybody’s minds today. Right? And, you know, everybody’s trying to figure out the machine learning, you know, pipeline, and paradigm. And further yet, now we have large language models, are they going to eat everything? Like, they probably could, like, we should be doing our analytics and our machine learning, you know, in a singular way. And so, you know, I just keep coming back to the idea that AI sucks, right? Like for the average company, and the average person, it’s amazing on TV and in the movies, and, you know, with all this chat GPP stuff, but like, practical AI is very difficult and very distant. And so, I definitely, you know, I’m going to continue to move as fast and hard as I can to make that experience really easy. Like I said, earlier, we’re like an engine, I’m going to buy some wheels and a steering wheel. And I’m going to offer an Uber-like service so that you can get from A to B. And I might not be able to serve these trillion event a day workloads, as efficiently, but I’m going to serve the broad masses, you know, needs more efficiently. So. So that’s a long winded way of saying, I’ve worked hard to make this database more adoptable. We have more work to do, but I don’t think it’s enough, like people don’t want and need yet another database. You know, what the market needs? It is a solution to real tangible problems every day.

Kostas Pardalis 45:35
Yeah, yeah. 100%. Totally agree with you. And by the way, like, I have to say that, like I like, it’s very impressive. And I don’t know what other word to use, like, but talking with someone who has gone through the process of building a company has reached the point where you have a Product Market Fit Heap, like after, like an enterprise, and decide to leave that behind. And in a way, let’s say, rebuild the company from scratch, that takes a huge amount of courage to do. So. I mean, that’s super, super impressive. Like I have to, I have to say it to share that with you. Because I know from my personal experience, and those who like the mind working like startups do that. It’s like an extreme con. It’s super, super hard. Like, if you think that’s like a leap of faith, like to start the company from zero to one, taking a company from 100 to zero to go back to one. That’s Wow, that’s, that says a lot about the person. So thank you for sharing that with us.

H.O. Maycotte 46:44
Well, yeah, of course. And I think, you know, at least in my case, I have pretty blind faith and features, like, I really do believe that if we’re gonna have the machines doing our work for us, like we need to think like the machines, not like humans, and we’re still stuck thinking like the human. So, you know, I have this blind faith that features will power the future, right? And that everything’s gonna be peachy first. And so we haven’t quite found the exact right approach to it. But we’re going to, or we’re going to DOT trying.

Kostas Pardalis 47:13
Yeah, it sounds like you are the right person to do that.

H.O. Maycotte 47:17
Well, thanks. I might have to call my board and tell them that. But yeah, thank you.

Kostas Pardalis 47:25
All right. So one last question from me. And then I’ll give the microphone back to Eric. So sir, we last like something exciting, about a feature base that is coming out, like in the next couple of weeks or months, or something that we should keep in our mind and make sure that we go and check when it comes out?

H.O. Maycotte 47:47
Yeah, I think the most exciting thing that we’re working on right now is this user defined function UDF, I think we’re not the only ones working on it, the single store is doing a really amazing job. We’ll see how we’ll see how it ultimately manifests. We’ve got all the Wesam stuff happening as well. But I do very much believe like I’ve got, you know, in my passion for features, you know, my personal obsessions underneath that is to eliminate copying and moving of data, right? I believe that models and data are going to collapse. If we’ve learned nothing more from these large language models, it is like the data and the model are becoming, you know, pretty, you know, pretty much the same thing. And so, I think that the only way to really scale the future is to really think about this as like, the working memory of AI. You made this point, you asked this question earlier that we didn’t quite get to, but like, you know, we as humans, don’t go back and analyze everything we’ve ever done, rewatch the videos of everything that we’ve ever done, read transcripts of everything, we’ve got to like you, and I’ve had quite a few interactions. I didn’t go reread all of those, like, we wouldn’t have time to do that. But we have just enough knowledge about our prior interactions that we can bring to what’s happening in this moment and make decisions, right. So I think the only way we’re going to be able to scale the future is to think about it in that same way, right? Like a working memory of AI, right, like being able to recall just what you need, from the historical context with what’s happening at this moment, to be able to make decisions. And to do that we’re going to have to bring models to the data, right, we need to stop copying and moving data to the models. And so one way or the other, it’s going to happen. I hope we’re one of the pioneers of it, because models love eating features. Were a feature storage system. So like bringing models to the future storage system in my brain makes a lot of sense. But one way or the other. I’m excited to see that both in our own product and in the world. I think it’s going to make the world a lot more secure. I think it’s going to shift innovation to creating value not to the data engineering that’s involved with all of the machine learning pipelines and the lineage and the burgeoning and when we can start to network, the output of these models, I think that it’s a wonderful future. So the very beginnings of this for us are simple. It’s just models. In Python, with SQL, you register in the database, and you can run it as data arrives, you can run it on a cron job, you can run it as you do your query, but it runs on the same Compute Engine. And further, it’s going to run on the same serverless Compute Engine that we’ve created. So you can isolate the model from the query execution piece and in a way we’re pretty excited about that. And we hope the world agrees that it’s going to be a good new capability.

Kostas Pardalis 50:24
Yeah. And hopefully, we’ll have the chance to like to talk more about that when it is released. So I’m inviting you. All right. Well, thank

H.O. Maycotte 50:32
you. Thank you. Well, we have a principal engineer Matt Jaffe, who’s been leading those efforts. And he is brilliant, far more technical, super articulate, I think you would love Matt Jaffe dive into, like, not only are features computationally far more effective, and efficient than storing data, but now that we’ve got those serverless capabilities, like I say this all the time, I think we’re gonna cut the cost of analytical workloads by at least 99%. So whoever’s making money right now on these workloads, they should tremble because the world is about to shift quickly, right? Like we’re gonna move to like, computing faster and more regularly. We’ve got to figure out how to make these models continuous on trainable like, like, that’s where we need to start to shift our energy

Kostas Pardalis 51:13
100% and 100%. All right, Eric, the microphones back to you. Yes.

Eric Dodds 51:20
As always happens, we can keep going and keep going. But we do have to respect producers. Well, he’s gone. Okay, ah, oh, this is more of a personal question. Because you are highly optimistic. You seem extremely high functioning, you understand technology on a deep level. But you also think, existentially, you know, as evidenced by the earlier part of our conversation. Is there anything on a personal level in terms of sort of like productivity? Or like, how do you operate in your day to day? And is there anything that you could share with our listeners that’s been particularly helpful? Because it seems like you have a lot of ideas flowing through the old gray matter up there?

H.O. Maycotte 52:03
Yeah, it’s a really great question. And I wish I had a spectacular answer for you. So I’m going to try to get a good answer. I have a gentleman on my team. His name is Kurt Campbell. He was one of the first employees at Splunk. And then he started a company called Loggly. If you’ll remember, he’s been in search of it for 20 years. So as he sees all this, like, large language model stuff, he’s like, ah, yeah, that’s just like the next evolution of search. I’m like, Yeah, but it’s worth like 30 billion now. So it’s a little more than just the next evolution of search. But chords are brilliant, and courts help us monetize, not monetize. Think about how to democratize these technologies more so that we can use them every day as a co-pilot. And so as we work on this next iteration that I was telling you about, like, how do we create the sort of Uber that goes from data to the model we want consumers to have at the forefront of it, right? Like, this isn’t like a company. It’s about individuals. And maybe they belong to a company, maybe they belong to many companies, but we want the individual to have a free tier where they can index their email, they can index their texting and index all their files, they can index all of these things. And He does this every day. So he’s got a technology called a mirror that he’s constantly indexing everything into. So if you were having a conversation right now, all the things you were saying he’d be feeding commit up, it crawls URLs, it eats PDFs. And so as he’s working, he’s asking questions, but the biggest piece of it is you give them feedback. So JMeter comes back with a fact or some opinion that is not right, he just gives it that feedback loop. And so I think prompt engineering, prompt feedback, and being able to apply it to our daily life. It’s gonna be really critical. So chord is doing what you asked me that I should be doing. And I’m hoping chord is going to help us productize this so that we can all include myself doing this every day, right? It’s just a, it’s just, we would be so much more productive if we just had that augmentation. And so

Eric Dodds 53:56
I love it. I love it. Well, ah, oh, this has been an absolutely fascinating conversation. Thank you so much for giving us some time. And we would love to have you back because we only scratched the surface.

H.O. Maycotte 54:08
Well, thank you all so much. I love what you do.

Eric Dodds 54:10
Wow, cost us what an episode with h2o May caught. I mean, his story was amazing. But feature based seems like quite a technology. I think my biggest takeaway was actually his optimism. And I thought it was interesting. He had a lot of self awareness about his own optimism. But, you know, he started with telling us his story about growing up in rural Mexico without a lot of technology and how that influenced his view of what was possible. And he’s really carried that through. And you heard multiple times throughout the episode. You know, I was so insistent that we wouldn’t cache anything. He just has this persistence about whether we shouldn’t have to face these limitations. I think feature bass is a really interesting manifestation of those characteristics of him because he really has overcome some amazing things with a pretty wild piece of technology.

Kostas Pardalis 55:11
Yeah, and like, Okay, I have to say something, which I found, like, amazingly fascinating, to have a custom with a person, like a human being, like h2o, it might sound like you have a very stubborn person, right? That’s needed to go and like mike, like, Give something that hasn’t been created before and get it to the point where it is adopted, like people use it. But at the same time, like, it’s, I don’t know, probably like the only person that has demonstrated, like, an extreme level of flexibility. And what I mean by that is, like, the story of how they started the company that went to the enterprise, they had product market fit, and then they decided that like, we want to build something even bigger and as required to pretty much like, go back to zero and, and start again. That’s wow, like, from a founder perspective, like being able to do that and take, like, this amount of risk requires, like, of course, to be very stubborn with your visuals, obviously, also, like a lot of flexibility at the same time. And I think like, this whole episode, in this whole conversation, is like how it’s like at this moment of like, how important, like the vision and the belief of the humans behind the technologies for the success of the technology. Of course, we talked a lot about technical things. But this I think, comes next, like, it’s more important to like to understand these qualities and like how important they are. And then the commendation is out there. Like, we’re gonna just go and decorate it. So yeah, that’s why I like keeping from these episodes, and through all of the reasons, I would encourage everyone to go and listen again.

Eric Dodds 57:06
Yeah, we also talked about the future we did was pretty wild. And he has some pretty exciting predictions about a super evolution that’s coming upon us quickly. So yeah, we

Kostas Pardalis 57:19
also talked about the Bible. And he’s an imbecile door of the future, right,

Eric Dodds 57:23
pastor of the future. So yes, definitely check it out. If you’re interested at all in the next super evolution, that map features and super fast database technology, and just generally a really optimistic and engaging, brilliant person. We will catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 134:

Unpacking the AI Revolution and the Technology Behind A Feature-First Future with H.O. Maycotte of FeatureBase

April 12, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter