The PRQL: Does Lakehouse Architecture Really Mean the End of the Data Warehouse and Data Lake As We Know It?

August 5, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Vinoth Chandar of Apache Hudi.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co

Transcription:

Eric Dodds 00:00
Hey welcome to The Data Stack Show prequel. We just recorded a show with the North, one of the creators of Apache Hudi. And we got to ask him about his new company. Well, maybe we should just leave that as a teaser for the listeners, which cast it. So does that mean?

Kostas Pardalis 00:23
Oh, no, no, it’s, it’s great. They we don’t have like, in the company, I guess like, probably people will already know about the company anyway. I mean, it’s, they’re doing like a great job with marketing so far, and spreading the word out there about the company and the relationship with the project or particular digital. Do good thing.

Eric Dodds 00:45
Yeah, I agree. I just had to start with a little teaser there. But a cliffhanger. But I will say, I think one of the one of the most interesting and helpful things to me about the conversation that we just had was understanding the verb or better understanding the different components of sort of data lakes and data warehouses, as they are, as they create value for different users, and then optimize towards different goals, right. So sort of usability versus cost, etc. And I feel like we just got such a good picture of how those things are converging, right, because traditionally, a lot of those concerns have been very separated. And I think way more rapidly than I realized, they are converging, and creating the opportunity to do some really cool stuff. Right. So I mean, one thing we talked about to give a little teaser was like, sort of bring your own, like query interface, right? Which is a really interesting concept, because to the NATS point, like a lot of these things are sort of big, vertically integrated. stack. So that was that was fascinating. But what stuck out to you.

Kostas Pardalis 02:04
I mean, I think we, we focus a lot on, let’s say, the rivalry between the lake house architecture and the data warehouses which, okay, it has been created in a way because of, let’s say, the relationship between data, bricks and Snowflake, but we forget that initially, at least, and I think it’s still the case that data lakes have been created, like for different use cases, right, like data warehouse wasn’t steel isn’t probably will always be like the right environment that you want to do, let’s say BI, and you want to do like, I’m out radix, right. data lakes, we’re not like initially built for that. Now, the lake house says that you can also do that. But the most important part is that you also have the the rest of the use cases that usually are like more indoor, some very heavy type of processing, like doing a male staff like working with like very big and complex, like workloads and stuff like that. Right.

Eric Dodds 03:08
So

Kostas Pardalis 03:11
that’s what I see. Because it’s very easy, let’s say to forget about that. And, like, we never like mentioned that at some point at the end. Like, yeah, I mean, at the end, like causes and data leaks are also enabling, like, a set of use cases that you cannot do on the data warehouse. Yep. And that’s like, where the value is, right? Like, that’s why we need that. It’s not just like a marketing, let’s say, to make people buy the same thing, but they think that they buy something different, right? Like, it’s something else. And like, that’s why you see, like, in many companies, the two solutions to exist, right, like we’ve have data warehouses together with data lakes, and they come to so that’s what I keep from the conversation. And yeah, I mean, outside of this, as always, was like, an amazing, technical conversation with someone who knows deeply what he’s talking about. And so that’s something that they always like, enjoy when I talk with him. So yeah, I’m looking forward to start by getting him in the future.

Eric Dodds 04:19
You had a great exchange about compaction, which was fascinating.

Kostas Pardalis 04:23
Yeah, we talked about that, about compaction. We talked about the different services in general that yeah, we build on top of the data lake to bring it closer to the data warehouse. So if anyone would like to learn more about that stuff, I’m not going to disclose more. These you’ve

Eric Dodds 04:41
already gone too long. Yep. We’ve already prequel thanks. Thanks for joining us. This is a great show. You won’t want to miss it. Subscribe if you haven’t to get notified of the updates. Tell a friend about the show and we will catch you on the next episode.