This week on The Data Stack Show, Eric and Kostas chat with Max Beauchemin, the CEO and founder of preset.io. During the episode, Max discusses all things BI, real-time, data tooling, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. And don’t forget, we’re hiring for all sorts of roles.
Welcome to The Data Stack Show. Kostas, the guests that we have never cease to amaze me. And we’re talking with Max from preset today. Not only has he worked at some of the biggest, most successful Silicon Valley companies. We’re talking Ubisoft, Facebook, Airbnb, Lyft. But he also is the originator of several major open source products in the dataspace. So airflow and superset that are part of the Apache foundation pretty unbelievable. And what a privilege to talk to someone like Max, I’m super excited. One of the things that I really want to ask him about is, we’ll see if I can sneak this in because maybe it’s a little bit of a personal question, but sort of having to starting to projects like that. I wonder what it feels like to be the inventor of something like airflow. That’s just a really cool thing. And I think I tend to make that like a really grandiose thing in my mind. And maybe it was, but also I think a lot of times inventors are just trying to solve a problem that really interests them. So that’s what I’m going to ask how about you?
Kostas Pardalis 1:36
Yeah, absolutely. So I think those would be awesome to ask him. What was the process? Like, how do you leave that stuff? Right? Like, what how do you end up building something? Well, this kind of adoption by like the industry or the community or whatever out there, right. So yeah, I’d love to hear that from him. I think it’s going to be like, super interesting. But I also want to ask him about bi. I mean, he’s also founded a company that’s like in the BI space right now, the BI space is something that I don’t know lately. We don’t talk that much about it but it’s one of the most fundamental bounds of the data stock. So it would be awesome. Like to hear from him. Like, what’s the current state of the industry? What happened this past few years? And what’s next?
Eric Dodds 2:25
All right, let’s do it.
Max, welcome to The Data Stack Show. We can’t wait to chat with you.
Max Beauchemin 2:30
Hey, happy to be here. It’s an honor to be on the show. I just realized how many episodes there is on this show.
Eric Dodds 2:36
Yeah, it’s been super fun. Okay, I don’t know where to begin because your resume is just incredible, but I would love to hear about how you got started in the world of tech, and then tell us how that transitions specifically into working on data stuff.
Max Beauchemin 2:56
I started my career in early 2000. So right after the.com burst is when I started, I started early. So I did I did not finish my program. In college, I didn’t really go to college, I was lucky to strike an internship at a company called Ubisoft, that’s well known. Now. He’s a big video game company. And I joined I did a little bit of web development early on during my internship, and then I had an opportunity to work on their first data team. And then, of course, the data landscape was very different back then.
Eric Dodds 3:28
What timeframe was that?
Max Beauchemin 3:30
Like 2000, 2001.
Eric Dodds 3:32
Oh, wow. Yeah, data. So that’s a very different data landscape.
Max Beauchemin 3:36
Yeah, so one topic for today could be how we’re kind of reinventing the same stuff over and over with different premise. But at the time, we use the SQL Server, so Microsoft SQL Server stacked stack. So there’s SQL Server Analysis Services, SQL server, the server itself, there was reporting services, I think that came a little bit after Office Web Components. And I think integration services the other one so that was the stack that we had selected at the time. And I was lucky enough to be part of the team that created the first data warehouse the first kind of business intelligence team there so before my time there it was very little databases just excel files, right and sure, and then I worked on financial reporting supply chain, like kind of your so retail type stuff, unless like your Game Analytics, or like your kind of modern analytics. So that was really like counting dollars and unit sold at the time, if you had like, inventories, all that. That reporting so very specialized team. So I worked there for quite a while what are three, three different offices I work in Montreal, Paris, and San Francisco. So I have traveled the world over this first kind of decade of my career. And then like soon after I joined Yahoo was the birth of a dupe at the time. It’s kinda interesting times, right? So that Hadoop team would like meet in very office where I was at the opportunity to meet some that early. Wow, relatively early pig scripts, if people are familiar with the language, that pig language, it’s a little bit of a foreign key sequel light, not really SQL like that kind of data set language that SQL like in some ways to work on, on some that early stuff. And then the part that’s really interesting for me is when I joined Facebook, it seemed like they were really kind of on the other side of mud, dirt, what I would call “data modernity,” like just we were in this completely different phase at Facebook at the time since 2012, where everything was getting rebuilt from the ground up other Hadoop and other things, right? There was like this internal, like Cambrian explosion of data tools. So this hackathon culture of I built it if it doesn’t exist. So they had rebuilt internally, a lot of the things that existed in the market at the time, like from scratch, but also we’re building things that have never been built before that now in some of the spaces that we’re building stuff, I’ll try to describe a little bit more like what I mean by that, but, but essentially, people internally at Facebook had rebuilt a dashboarding tool data exploration tool at in memory, real-time database, something that was a big inspiration for airflow, that’s called Data swarm. That was an internal tool, there was like multiple experiments in a DAG, kind of data orchestration space, too. And there was like all of these like, kind of mutant little data tools, too, that some data quality stuff, some data dictionary stuff, they did graph metadata, browser things so early, early, and not so early versions of some of these things that we see really kind of emerge today on the market. So it was a really inspiring time. For me, I was going from being kind of a bi engineer Data, Data Warehouse architect to being like a software engineer and like building tools to enable more people to play with data. So very, like, data had been democratized. At Facebook, people were like building all sorts of cool stuff. And it was super fun to be there at that time really inspiring, too.
Eric Dodds 7:21
Absolutely. What do you do today? You actually went to another couple really amazing companies. But what do you do today?
Max Beauchemin 7:32
Yeah, I can keep going, I think it makes sense to do the transition. So right after I run Airbnb, and that’s where I was missing a lot of the data tools that we had internally at Facebook, and that kind of brought there along with others. This mentality of like, Let’s build some stuff, let’s solve these problems in a new way. And if I’m not going to have something like data swarm here, I’m going to build something that’s going to solve my problems and my team’s problem as a data engineer, and that’s what became airflow. So I was like, I want to get involved in open source. I was always like an admirer of people who had built open-source projects, and looking at the Linux kernel and other things, just being inspired by it. I was like, Oh, maybe I can try got a shot. So I thought the timing was good to build something like airflow. And then you just decided to go with it. So I started building it actually, between the two jobs before joining it, I was so excited, like, I’m gonna make it open source. I’m gonna start working.
Eric Dodds 8:31
Oh, wow. So the initial birth was between jobs.
Max Beauchemin 8:37
It was in between jobs. But I knew I was joining Airbnb. We have talked about spent a lot of time with the data team, they’re talking to people and it was clear that they needed something like that, and that would be unable to build it. So I was like, I’m just gonna get started in that model between I think I miss I missed a vesting cycle. They’re like a three months vesting cycle by a few days. That 2014 Airbnb was a really good time, a pretty decent time to join. So but as a result, though, I got to put my project like airflow I think it had an ad a different name, ours, it was called Flux. And I got to put it in open source under my GitHub, and I joined them like, well, this thing is already opened. Working on it. So I started working on a data mart. No, my primary function was like to do the data engineering for core what we called customer experience, CX internally, and then I was building airflow to time and then working with a small team of data engineers to building stuff and we were kind of building airflow at the same time as we were solving these data engineering challenges for them. And then after I went like for a brief time at Lyft, so I spent a year there. Well, I also while I was at Airbnb, I started Apache superset. It’s also very well known. So superset is very much in the data visualization and exploration dashboarding space. And the general idea there is like we were using, we were investing heavily and presto and Apache druid, or I think it was pre-Apache. So Drew druid the in memory, real-time database, shares time, and none of the tools on the market and Looker, Tableau, and the tools that exist at a time didn’t work or didn’t work. Well, with the databases we were investing in.
Eric Dodds 10:32
Let me ask you a specific question there because I think it’s fun for our audience. They cross a wide spectrum and I think some of them probably hear that, and they say, like, Yes, I know, like, I get that I had similar pain. But for a lot of people, it’s like, man, Looker and Tableau are so powerful, like, how could you ever reach the limit of those? What did that look like and feel like inside the company in terms of hitting the limits of traditional BI? I know, you mentioned that it was sort of database integration stuff. But like, could you explain that dynamic a little bit?
Max Beauchemin 11:04
Well, so. So one dynamic was we had a large Presto cluster and had hired people from the Presto team, or people have used presto, in the past, and we’re investing heavily in this thing, and that’s our ad hoc layer. And then you tried to, I think we had Tableau at the time, and we have to load stuff and extracts which is like that a subpar database. If you have, at least at the time there was this thing called a Live Mode, which would differ gonna run the heavy lifting on the database itself. But that didn’t work very well, for a variety of reason I could expand into but I won’t, and then drew it at the time didn’t speak SQL, it had this funky kind of dimensional query interface. And it just wouldn’t talk to any BI tools at all. And there is no front end for it. So the very premise for Apache superset was I’m gonna build a quick does it three-day hackathon project to a wow, and all four exploration of druid datasets. So druid in memory database, real-time super fast, super fun database, heavily indexed, right, so it’s just like, blazing blazing fast in real-time. And we had some real-time use cases internally at Airbnb, or Tableau would just wouldn’t, there’s no socket. Any things I’m like, I’m gonna build this thing. And then quickly, I could go deeper in that story. But super sad. Then, over a weekend or at some point, I was like, I got to make that word with presto, too. This is fun, it’s a cool tool, you can explore the data, you can save charts, you can make dashboards, let’s make it work with Presto to and then I became much more ambitious over time, because of internal adoption at Airbnb, people liked having a tool that was just like very fast time to charge very fast time to dashboard, provided that the data sets I’ve been treated, so if the premise is you have a data set that has all the metrics and dimensions, you need to create a dashboard, then exploring visualizing creating a dashboard sharing it is really, really fast and efficient with superset. So that was, instead of shooting at Tableau is about getting to the perfect visualization and the grammar of the visual grammar, churton able to do very sophisticated things, like called the Photoshop of data visualization, right, you can do full things. And then maybe Looker was all about the semantic layer and be able to write your business logic and it’s like, semantic layer. So for us, it’s just like, visualize data set quickly. So that took that worked really well in the team at Airbnb and elsewhere and everywhere.
Eric Dodds 13:49
Amazing.
Max Beauchemin 13:50
Yes.
Kostas Pardalis 13:52
Quite the story. You’ve been around like, for so long, and he has, like so many different things like happening. So before we get deeper into what you’re doing today, let’s understand what’s preset and superset? Is there like a technology that you have seen all these years that really surprised you in terms of how we change the landscape or, to put it in a different way, from all that stuff that you have seen from Hadoop to whatever we have today, what do you think is like the most influential technology that has said today?
Max Beauchemin 14:31
Yeah. First, it’s like, yeah, the gray beard to show for all the years of data and all the last year from scratching my head for it for decades, but, but I would say a lot of what we’re doing now is like reinventing some of the things that existed, they get to go based on new premises. And then there’s these cycles and software where there are some big shifts either move to the cloud, and the move to like distributed systems from And containerization, right. So once in a while you take, and you have a set of new premises and you have to rebuild everything on these premises. And then the pyramid of needs like maybe is flipped a little sideways or upside down. Right. I think one thing that’s new from the in the last like five to 10 years, that was that did not really exist or exists while before as a streaming, streaming use cases. I think that it’s been cool to see different solution emerge around data streaming, kind of streaming queries, streaming computation frameworks, things like Flink or Spark streaming. There’s been like, on top of that, there’s new semantics and new things around streaming that are interesting. There are people trying to bridge the two worlds having these common languages to express both batch and streaming using No, more sadly, it’s, I would say, that’s, that’s been interesting to see, like brand new technology and merge there. There’s a question of like, what does that mean, in terms of visualization? Yeah, or me and use cases? And tons of thoughts there, too. I think it also does relate to the chasm analytics between operational data and business data where operational data is much more going to timely. Not always. So there’s like, some things there that are really interesting to see to like, we’ve got we’ve gotten really good at operational streaming analytics to look at things like data, dog and elastic.
Kostas Pardalis 16:32
And there’s some cool stuff there, too. Yeah, that’s actually really interesting, because you mentioned streaming, and I never thought about the distinction of streaming visualization show. I have to ask more now. Yeah, I’ve seen like all these very interesting platforms, like we have Kafka we have seen, we have like, Mark, streaming, but how do the fit with BI and visualization. Is it needed, first of all? I start thinking right now about what was the color vision, like the Lambda architecture was like a phoenix, like couple of years ago where you had the streaming layer that was more about notifications, and like, got something just broke, and you have to go and fix it, but that kind of stuff. And then you had bad soft tissue where you have reporting and reporting is usually like the most common use case for visualization. How do you see these two things merging and how do you see visualization flinger or like with me?
Max Beauchemin 17:33
Yeah, it’s interesting. One immediate thought is like when you really think about the data that needs to be really fresh. What’s the first I would say, like, there’s a, like latency and freshness, and I think like, fresh talking about latency of like, if we describe it as like how long it takes from the moment you run a query or ask a question, you get the answer. I think that’s infinitely important, right? Like to be able to kind of dance with the data and like slice and dice and like for it to be able to ask this next question as you go. I think that’s transformative. An example that I keep giving, if Google takes like it for milliseconds to give you a result, like if it took like, five seconds, they think about the implication, if Google took like, five seconds, 15 seconds, 30 seconds, a minute, 10 minutes would take 10 minutes to resolve a Google search would still be a wonder of the universe in terms of like, what, what it would allow for people to do, but how you interact with it, how you engage with it without Autocomplete is completely different. So there, what I’m trying to point to is like, latency is super, super important, like not talking about freshness, it’s like if you really need to know what happens someplace in the past 30 seconds. And you’re like looking at a chart and refreshing and waiting for something to appear. Like you should be a bot like you’re not doing the right thing what your time, like, no one should be looking at a dashboard, non-stop waiting for and if that’s happened, like, now, I’m gonna refund this person, click Done, approve? Yes, I’ve done my job. So I think like STS operational stuff, then, in some cases is really great for automation and bots and things like taking action. It’s also good for troubleshooting, right? So there’s an alert or something happens in data dog, your number of 500 errors peaks, and then you’re like, What the heck is going on? You need to know live and now and what’s happening in the past five minutes. So. So I think that stuff is like super operational and very different from like, is my business doing? Good? If you’re thinking about like, Is my product doing? Well, there are some other use cases, I think, interesting around streaming, it’s like when you launch something, you want to make sure it’s good, right? Like you launch a new product change or you release something you might want to get a little closer to real-time to make sure your launch is doing well. But other than that, like for me, the bulk of the time. Yeah, I’m thinking like 90 days in politics, a lot of the high-value questions are not things that happened the past five minutes?
Kostas Pardalis 20:03
Yeah, I like what you were saying about sitting on top of dashboards and reloading overtime trying to see if something’s happening. Like, I’m not, like heal myself, but SQL flake, a first-time founder that just created the dashboards for sign-ups or something like that and being like, okay, where are the signs? Where are the signs? You just need to go out there and greet them, we know that look at the dashboard. That’s what’s going on with the signup. So yeah, many times we get really into this, let’s make everything like more real diamonds, like faster, it’s more fresh or whatever. But real times are very real life there. And it’s like, not all use cases from the same definition of your time out there. So it makes a lot of sense. So okay, let’s—
Max Beauchemin 20:58
One or two more thoughts I want to pack on real-time. One way I’ve been thinking about it too because I’ve had a lot of conversations with, I call them “streamers,” and people going to streaming first and people are arguing, like, we should just like, get rid of those mountains of SQL and all the batch stuff and just rewrite everything in real-time. That’s going to be easy, right? Let’s just do it. Wait a minute. But like, when you think about like, what about freshness? Like what do you need, like, visibility into for things that happen in the past minute or two or five minutes or 10 minutes? What are really the metrics and dimensions. And the level of accuracy that you need are these things. So when you really start looking to the use cases, and we’ve done that internally at Lyft, and Airbnb, even like Lyft was much more like real-time business. When you ask people about why do you need freshness? Why do you need to know about what happened the past minute, then you realize that the requirements are not as complex, like maybe you need just need a handful of dimensions and metrics. And maybe you don’t need to know how many exact bookings and just clicks on the booking button. Knowing this, I tend to say that you don’t even need in a lot of cases, the Lambda architecture seemed like real-time requirements, you solve those with specialist specialized tools, then you have like your business analytics, and you solve that with the right set of tools. If we diverge, and the numbers are not exactly the same, you explain the difference, because we use different tools, different definitions for things, and you move on with your life, instead of trying to bring two worlds together that are very far apart in reality.
Eric Dodds 22:43
I think the other thing is, I think people lump a lot of stuff into the real-time versus batch debate. And there’s, I think, a pretty clear separation between the analytics component, and then the sort of customer experience side of things, right. So like, there are certain things that need to be delivered in an application in real-time, because a user is performing some action, and there needs to be some sort of response, right? I mean, that’s it’s nontrivial to build up stuff. But even then, not everything needs to be real-time, which is interesting. And I’m just thinking about some of the companies that e-commerce companies who are running 1,000s of tests for them real-time is like, 15 minutes, right? I mean, you a testing team can’t really process results, even 15 minutes, it’s pretty unbelievable, so definitely real-time is such a relative term.
Max Beauchemin 23:32
Right? And you have to really identify where there’s value in it, and what you’re gonna do to kind of support the use cases and what it’s worth to you, maybe, and maybe though the tooling will converge, right? Like, we’ve seen some that convergence a little bit where the chasm between, like business analytics and operational analytics is not as wide as it used to be, with the rise of tools that are these next-generation databases, they can serve on both side of the fence, and you see things like superset is becoming more and more used for operational analytics and things like Grafana are more used for business. I think this chasm is getting thinner over time. That’s a good thing. And we used to have like these databases, very specialized at the real-time, or the time-series database for real-time use cases were very different and it really support like OLAP use cases. And now that’s kind of converging with things like Druid, Clickhouse, Pino, these new next-generation databases.
Kostas Pardalis 24:32
Yep, very interesting, actually, like what you said about how things overlap with Grafana, for example, and superstate. So, I have a question. Let’s talk a little bit about BI and visualization, and let’s do some definitions. What’s BI?
Max Beauchemin 24:53
What’s BI? Business intelligence. It sounds so intelligence as it’s an aging term. I think it was around when I started my career. So when no gray hair, more hair up here, that term was already well established. That’s like 20 something years ago, I think like the word intelligence come from the term that the way that the government thinks about intelligence. So it’s like Intel, like no data inside that kind of stuff. And then as applied to business, so I guess it means it’s the set of like tools and best practices around analyzing and organizing and serving data. huge trend, I would say, like the first, maybe, maybe I’m not sure exactly like, depending on where you look in the world, like when that trend was most active, but data democratization as the trend that’s like, maybe caught up, like on top of business intelligence. So the general idea, like, let’s give access to more people to more data, it’s usually like, this is a Dodgers couple this idea like data warehousing, which is like this practice of like hoarding data, right? Like, I’m going to take all the data that has anything to do with my business, that lives everywhere in the world, and then kind of hoard it and bring it into this, this warehouse, that becomes a little bit the library for the data in the organization. And typically, you have a business, that BI tool is an intelligence tool that sits on top of the data warehouse. And then people can self serve in this, I would call it like, general-purpose, but specialists tool, it’s a tool that is made to just generic in the sense that you can use a BI tool to query any type of data healthcare, business products, whatever it might be. And it’s like, generally geared towards like specialists, like people who are trained, and we used to have like, much more kind of specialists, people that are business intelligence professionals, right. I think that’s changing, right? Like, we’re seeing, like the rise of data literacy now, like, more people are more sophisticated with data and use data every day. So that means like, these tools are kind of changing. And we maybe talk about some of the trends, I think BI was originally a little bit like a restaurant, you can travel where to do like, a knob, imperfect analogy, where you’d come and you’d get a menu to kind of order your report or your chart, get it served to you. And then over time, maybe changed to become a little bit more like a blue fair, right? Like people can come in and self serve, and then have access to a wider variety of things and can assemble a meal for it for themselves. But that’s the general idea. Maybe trying to describe BI, I don’t think about it. There’s too much to unpack here.
Kostas Pardalis 27:45
So it’s more of an visualization, right? South visualization, but visualization is in four different parts.
Max Beauchemin 27:53
Yeah, they call it like the database, user interface. Somewhere you have all of your data and people then somewhere you have people with their visual cortex and their brain and somehow you need to get that data into people’s head, so that it becomes intelligence. So yes, if you describe what these tools do is, they expose your data sets in a way that hopefully people can self serve to explore, visualize and visualize their data is usually a dashboarding component, were able to gather an interactive set of visualization with some guardrails, so people can understand their data and interact with it and safe, or in somewhat intuitive way. One thing that’s, while saying one thing that’s interesting about BI, it’s like, any kind of data for any type of Persona with any kind of backgrounds, it becomes like this tool, that’s not very specialized. We don’t have really clear personas. More like a lot of Stan, it’s very general-purpose in that sense.
Eric Dodds 29:04
Max, I have a question. The goal of having self-serve BI is so appealing and I think it’s something that many companies are working towards. I know you can’t actually estimate this, but what percentage of companies do you think actually achieved it? I mean, you’ve built sort of these platforms inside of really large companies that that’s, even though like we have all the tools to do this, it’s still pretty hard to actually achieve that inside of a company where you have these, like a wide variety of stakeholders, who can access datasets that contain the information that is key to the business combines data from other functions. It still seems like a pretty big challenge for most companies.
Max Beauchemin 29:53
It’s huge, right? When you think about this, they call it like the data maturity curve and How different companies or individuals can distribute on that curve at the end, and I have this view of the world where I work in Silicon Valley at very, like data forward companies. So I, the answer is like, I probably don’t know. But what I want to point out is the analytics process is extremely involved in maybe try to describe what I mean by the analytics process. But the FD analytics process is the process by which you instrument store, organize, transform your data so that it can be explored, visualized, and consumed and acted upon, like if BI is that last layer of like, consumed, visualize and acted upon, so much stuff needs to happen first for that to be even possible. Hmm, now, we’re talking about data engineering and like, no, having a data analyst data and data engineers having systems in place that actually store the data and make it available, there’s just like, so much that needs to happen for, for that to be possible. That I would say the world is, I think, like, if we were to visualize companies on this, like data maturity lifecycle, like, we would have, like a huge amount of companies that are very, like, very young, and that to us is not a generous, but like a respectful term, like, they’re just companies will just, like, suck at that really bad. I think, in general, I think, like, in the past decade, and in the next, there’s a migration of like, everyone’s going to become much better. It’s a matter of like, survival at this point. One thing is, like, people have been thinking, like, oh, but data should be easy, like, one day, someone’s gonna fix it all and figure it all out and we’re gonna solve data engineering, we’re gonna settled VI will be all done. And what we’re realizing now is, like, the problems at least as complex and intricate and require specialist the same way that software engineering, as we’ve accepted software engineering is complicated, it’s expensive. There’s a bunch of specialists, there’s a bunch of sub-disciplines, I think we’re realizing that data is just as important. We’re very far from, like, I know, you can have a team of like five to 10 people that are going to do data for your large, yeah. Was employee company, like, that’s just no, that’s not gonna cut it.
Kostas Pardalis 32:24
Yeah. Max, can you give us like a description of how the BI architecture looks today and where preset.io fits in that?
Max Beauchemin 32:35
Yeah, I mean, I think the market is a gigantic market that’s extremely validated, still as afoot, I would say there’s like very big incumbents, and they’re like, I’m thinking like, first wave BI. And when I say that I’m thinking like, business objects. MicroStrategy, Cognos, things are like very much like, dinosaurs, you don’t hear about them as much unless you work at them know, at a company, maybe that made decision about their technology and their data stack a while ago, I think like, there’s one thing that I think is a transformation that has not yet happened that we’re gonna see happen that we’re really interested in preset is, I was talking about data democratization before where it’s like bringing more people to this special place where you do data, right, so data democratization was like, give more access to more data to more people. And I think like, the real the question ahead of us is, how do we bring sort of like, how do we, how do we bring people to the data booth fit is more how do we bring food everywhere in the world? Like, how do we do kind of Uber Eats? Or how do we bring the right meal to everyone where they sit at? On top of the buffet? The buffet is great. I think the buffet is okay, is it totally works for a bunch of use cases. But I think what we’re gonna see are analytics transcend the BI tool, and a special purpose, the special tools, like the BI tools and come out and be part of everyday experiences. Redlands, like in every app that you use on your phone. And every SAS tool that you buy, there’s going to be interactive analytics, in context where it’s most useful to on top of on top, we’re still the oufit there’s also like, people walking with or our dev or where would you like a little bit of a side of analytics with whatever you’re doing right now. And I think like, we’re thinking very actively about this, I preset, still, like beyond, there’s embedded analytics I could get into, which is like how you bring a dashboard and or charts and other contexts. But the problem we’re after is how do we enable the next generation of application builders to easily bring interactive analytics and the things they’re building today? I think that’s a really interesting question. I think it’s, it’s still very, very hard if we’re building a product today, you’re building experiences to bring interactive analytics as far as these experiences, so we want to make it a lot easier for people to do.
Kostas Pardalis 35:10
Okay, so let’s say I’m building product and I also need to expose some analytics, right to my customers. How can I use preset.io to do that?
Max Beauchemin 35:22
That’s a complex and intricate question that house that has multiple components, I think, like, what’s there is that there should be multiple ways to do this. And there are a bunch of trade-offs as to how you want to do this, the most obvious one is what I was referring to as embedded analytics. So you build a dashboard in a no-code to an NoCo tool, right? By the way, BI is the original, like no code tool, when you think about it, you’re able to do very, very complex things by drag and dropping things on the screen, so but you go and you build a dashboard. You style it, you parameterize it, and you embed this dashboard, inside your application, right, maybe you have an analytics portion of your Sass product that shows a dashboard. With embedded analytics with preset, you’re able to apply some role level security to say like, oh, I know, it’s this customer, therefore, the dashboard will apply these filters so that they can see exactly the things that pertains to themselves. And there’s no like, there’s isolation. Yeah, there’s embedded analytics, there’s another idea that we cater to a preset, that’s I call it white label BI. So it’s the idea there, it’s also not necessarily a new idea, but it’s being able to have these pre-packaged BI environments that are essentially if they’re a superset instance, that’s preloaded with the data sets, dashboards, charts, queries that are relevant to that one customer. So you can say, for each one of my customer, I’m going to create a superset sandbox with all the data assets that they need. And then they can come and sell serve, and a few others them, they can write SQL against the data that you expose to them. So that’s the white label use case. And then another use case is more the component library. So so there were we’re a little bit in the infancy of this, but I think we’re excited to expose the building blocks have preset and super sad as component libraries for people to remix into the experiences they want to create. So there, you can actually have a React library, we bring in some charts, you bring some controls, maybe a selection, picker, a filtering control date, a date range picker, and as the application developer, or as the engineer building the product, you’re creating the exact experience that you want with these composite, these rich components that enable you to have these are cross-filtering and drill down these rich expenses that would be really prohibitive to build from scratch. They’re really easy to build, if you have the right framework.
Yep, makes sense. And as a problem, is this related only to the BI side, like something like preset, or we still need to do work without data warehouses on the storage layer and the query layer out there to enable these use cases? Or items that have snowflake, just push all the data into snowflake? And then rely on preset to do that, like, what’s your…?
Yeah, so like, clearly this part of like earlier, I talked about the analytics process, and still, there’s no BI Avenue, there’s no visualization happening unless you’ve gotten all these things, right. I think in this specific case, where we’re talking about embedded analytics, or white labeling, or a component library. In any case, the premise is that you created the datasets that you need with the dimensions and metrics that you want to expose in order to build a dashboard. That’s something that’s like, might be going back to like how the AI is evolving. I think it’s very monolithic in the sense that you would have the tool that includes the data munging, the data transformation, the semantic layer, which this is a super loaded term, we can decide whether we want to unpack semantically or not, but you have all of these things that were part of a very other platform like Microsoft or Cognos or business objects, they would tend to be very monolithic. And I think what we’re seeing now is like we’re in So say, for me, this semantic layer belongs and transform layer and that’s DBT and airflow spades and then we don’t necessarily superset and preset don’t really actively solve that problem which Just say other people solve it much better than we could and we just want to team up and integrate very well with them. Not sure if I’m answering the question too. I know the rising quite a bit through by words, exploring a pretty good book save two.
Kostas Pardalis 40:15
Yeah. You do. I made also my question a little bit more specific to the storage layer. But as you as you said, like, it’s not just one thing that has to be in its place in order for this like to work like, there are so many different things that need to happen. And you used to come the monolith of Cognos, but we don’t. I mean, we still have monoliths, but things are starting like to break into smaller monoliths at least. And I think probably, we see that like, with BI, right, like, I think it’s one of the things that we see happening out there probably was one of the fairest model markets to see that. But I think we also see that with even data warehousing, right, like the whole idea of like having the data lake where you have the storage on s3, and then you have a different query engine on top of that, and then you have now you have a table aging, and it’s all I mean, everything racks, you label their beach,
Max Beauchemin 41:16
Warehouse, lake house, like special-purpose databases, right? Like, do you just say, the base for real-time or not? It’s got to be like, a big query that with a BI engine, and they’re like, real-time option that becomes a Manila that serves at all or for real-time, you’re gonna use Clickhouse, Pino, Druid, whatever, some of the database of very specialized and use something else for your warehouse. So there’s definitely I think, in the database space is really with all of their money, we’ve seen kind of flowing this snowflake place, like now people are looking to say, like, Oh, can we maybe there’s like some, some layers that we can kind of delaminate out of that and build some big businesses and some tools and some of these areas. I think like that, to me, I see, like exploration visualization as like, it needs to explode, right? And add the database. I’m not asking but I’m not as sure about it. I think, as a customer of a cloud data warehouse, I would like for the same database to do it all, to be like a big query, or hey, snowflake, this is a table that’s like, this high availability and put it to memory because I needed to be fast. And but I want to stay in the same warehouse, I don’t want to go and purchase tools. Right, like people are not in BI feel the same way about BI. It’s like I looked at when I find one that does it all. But then you realize, like that thing that does it all doesn’t do anything, right?
A little bit into the question, what is the analytics process to power something like embedded or what I referred to as white label BI? For embedded, it’s pretty simple. As long as we can apply the BI tool can or preset superset can apply row-level security on the fly. So all is required is to say like, this user is this customer, Id just apply a customer ID filter on all the queries that you run. Falooda is more complicated than that. But it’s like, row-level security thing for white-labeled, it’s a little bit more intricate, where you might want to create, like, a data mart for each customer. And, and really what I mean by that, it can just be like a view layer right on top of so say you have like, five tables you want to expose to each one of your customers, what you can do is create these schemas that have views that filter on a specific customer ID, and then you give them access with a service account, to that specific schema that is limited to their data. But hopefully, your schema, you have a universal schema, that’s the same for all your customers is all refresh, perhaps atomically, right? Like you’re refreshed a whole thing every night or every hour, but, but you have these little islands, or windows into the warehouse that are filtered and isolated from them. And then you put a BI tool on top of that schema and you provide canned reporting can dashboard and you can knock their self this themselves out and gold and push that further.
Kostas Pardalis 44:20
Yeah, that’s super interesting thing. One last question for me, and then I’ll give the microphone to Eric to ask his question. Just so he can you say like some things. Some opportunities that you think there exist right now in the BI market, like things that you would like to see happen or you expect to see happening? Or something to help our listeners go through? I don’t know, if you go and build like the products that are from those?
Max Beauchemin 44:52
Yeah, I mean, I really like the idea of like bringing analytics everywhere and the premise is like, people are more Did illiterate than they used to be right? Not only people expect to find a dashboard and every SAS application, but they expect that. And that becomes a requirement. I think also like people, and everything that they do, if you post blog posts on medium, or even like me participating in this podcast, I would expect to see a dashboard on how this podcast is doing in real-time as we release it and be able to see, like, who’s listening to the podcasts? And what are the demographics, I think, like, we’re starting to really expect more and more analytics everywhere, and people are trained and the, they want interactive analytics, not just like static to, it’s gonna be real hard to go and build that with the building blocks that exist today, the building blocks being like, charting libraries, and data warehouse drivers, so I think there’s a real opportunity of thinking about, like, how do we enable people to bring analytics in all the experiences everyday data belts. I think that’s interesting. So how is BI going to come out of his shell, or as analytics gonna come out of the shell, and Alpine, and kind of, like Ray-Ray out and be everywhere. So we’re really interested in actually doing this, preset. And then I think like other trends may be beyond business intelligence, it’s like, big topic, for me has been thinking about how we’ve take the learnings from some of the DevOps practices and the DevOps movement and apply, transform that and reapply that and reinvent that for the data of people. And then the big thing that’s really interesting, too, is how make, how is the modern-day routine evolving? Like, what do they do? Like? What are the roles? And like, how do we then becomes, how do we enable others to become better with data? So you become this vector to enable everyone to kind of self-serve?
Eric Dodds 47:04
Fascinating? Well, two more questions from me, and one of them may take us on a little bit of a rabbit hole but hopefully Brooks doesn’t get too upset with us for going a little bit over.
Just thinking about what you were saying on this sort of the frontiers in BI, what do you think is going to be commoditized? First, or what do you see being commoditized? And the context behind that question is, you make such a good point in that the amount of work that needs to be done in order, like on the back end, in order to enable self-serve BI is immense. But also there are patterns around that. So if you think about visualization, like a lot of businesses can sort of be can conform to like a particular, say, data model for the business, whatever, like a direct to consumer mobile app, and everyone has their different KPIs. But do you think that there will be a lot of commoditization in the actual visualizations as the data layer becomes more established and defined across business models?
Max Beauchemin 48:10
I think so. I mean, for me, I think like we’re trying to accelerate that in some way. So we can really innovate to right so as you commoditize thing, there’s opportunity to go and shoot further. No open source is a tidal wave of commoditization. It’s free, like remixable, so I think like, clearly, we’re doing that in the BI space with Apache superset first, and we also have freemium. So preset as a freemium offering on Apache supersets today, you can go and sign up for a free open source project and have it run for you up to five people, for free. And I think that price point, too, is very competitive. So I think we’re trying to accelerate data. Our mission is to make every team of data team enable everyone to have the best tool to visualize, collaborate with data. So I think that’s clearly happening. But beyond that, as we commoditize the consumption visualization layer there are opportunities to go and innovate for us a theme is like enabling people to bring analytics everywhere that’s one that’s one theme one thing I’ve discovered too as I can in the data world as you walk to the horizon, then the horizon gets further.
Eric Dodds 49:33
Like the universe, right? Ever-expanding.
Max Beauchemin 49:36
And you’re like, oh, I climb on the top of that tree and I saw I’ll find the horizon is and by the time I get there, I will be at the edge of the world and then you walk there no rising kind of moves with you. Yeah, I’m in a tree and you see or not, they are quite sure. So I think like, that on this brand, for one thing I would really like to see us to see like all the data in the world or in Your company being like, instantly IQueryable would be great. If everything was in memory, you could ask any question and yeah, all the answers, but the moment we create an amazing, dig the next generation in memory databases, then people are like, well, I’m going to log words. Yeah. Sure. Like, didn’t you start hoarding? Yeah. Yeah, the houses, the more you afford stuff, and very sure, you never kind of get to fully solve any of the problem. And maybe that’s why you have it. Like, how are we going to solve software engineering, it won’t be solved, they will just continue to no more than grow and evolve?
Eric Dodds 50:41
Yeah, I think that’s a great perspective. Okay, last question, because we really are close to time here but I’m interested to know. You’ve started some projects that have become data tooling that are, at least in whatever subset of the world, that a lot of data engineers operate in, closer to just go to tooling, which is pretty amazing. And I’m interested to know, what did it feel like when you were creating those? Did it feel like you were just sort of solving a problem that was right in front of you? It’s kind of I guess, I’m asking you this almost as an inventor, right? Like, did you feel like you were inventing something? Or was it just a problem in front of you that happened to solve like a pretty critical, pervasive pain point?
Max Beauchemin 51:31
Yeah, the history of innovation is made of people kind of re-mixing things. So if you look at any of the great inventions in the, in the history of humanity, not any, like, there are some really important exception to the rule, but you look at like what Isaac Newton had. Like, at the time, what was he reading? I think Newton is actually a bad example because he actually put it he did have innovation quite a bit.
Eric Dodds 52:00
Yeah, like inventing calculus and all that crazy stuff.
Max Beauchemin 52:03
Yeah, so I think that’s a bad example. I think Tesla is also like Nikola Tesla. He’s also a bad example of that. But everywhere else, a lot of places that you look at any of the who were their contemporaries, and what were they really at the time? What were they thinking about? Who were they talking to exchange a correspondence with? Through letter at the time? Sure. That the collective imagination was already on the cusp, often of what day? Yeah. discovered. So I think it’s very much the case, like airflow was largely inspired by two or three products internally at Facebook, beta swarm, II, some other things. Those were the things that emerged internally at Facebook from a bunch of experiments, right. So maybe it was a sure, yeah, motivation. Those were the things that people internally decided to put their pipelines and DAGs into. And then I took some of that stuff, remix that with some of the things that learned Informatica, and like, using other tools, and kind of remix that into something that I thought was going to be immediately useful at Airbnb for my team, and knew that people coming out of Facebook, which we’re gonna look for things like, yeah, so there’s like this idea, like, people talk about product-market fit in business, but there’s like, Project community set to an open-source. So there’s like, a timing thing to have like, was your bread. So if you were to build airflow five years before, then probably people, Oh, it’s too early, it’s not the right thing people are not really doesn’t fit people’s mental model doesn’t resonate just yet. So. So there’s always this kind of timing aspect where you got to be early, but not too early. So there’s always some luck, some contacts, some hard work, too. And then community building to just a whole different topic, but like how to get people can interest in involved and excited after, and get people to contribute. I think that’s something that I figured out how to do in an okay way for both airflow and super sad.
Eric Dodds 54:16
Well, thank you for sharing that. That’s just so fun to be able to talk to people who have conceived and built the tools that we use. But as you point out, we sort of stand on the shoulders of lots of people who’ve done lots of cool things. So Max, this has been an incredible conversation. The episode flew by, but thank you for joining us. We’d love to have you back on to dig into it. We could go for hours here, but thank you so much for giving us some of your time.
Max Beauchemin 54:40
Yeah, it was super fun. I think we scratched the surface. We got a little deeper and some. I think that’s good and fun, but there’s still so much more to talk about. So happy to come back on the show anytime.
Eric Dodds 54:51
One thing that really struck me was this may sound like a funny conclusion, but next has a very open mind about A lot of things, even bi and he’s building a company in the BI space. I mean, he certainly has strong opinions. But I really loved his analogy of, you sort of think you get to the horizon, you climb a tree to look over the horizon, you realize the horizon just keeps moving. And I think that’s really clear in the way that he approaches problems. He’s trying to look for that frontier. And he keeps a very open mind, and I just appreciated that a ton. How about you?
Kostas Pardalis 55:27
Yeah, absolutely. I think the way you put it, like, makes me think that probably that’s a trait that’s an inventor needs to have in order to be an inventor. Right? So being like, in an environment that changes so rapidly, like what he described, like think about database, and Facebook writes, how the engineering was there. And like all the things that’s one project other than the other, and it’s like, building new technologies and building everything from scratch to ready for ready to use it. So yeah, I think it makes total sense. And it’s an amazing trade for both an inventor and I would say also for founders, so I’m really, really, really looking forward to like what’s next about with research?
Eric Dodds 56:13
Me too. All right. Well, thanks for joining us on the dataset show and we will catch you on the next one.
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.