The PRQL: The Two Parallel Tracks of Development In Data Processing with Ryan Blue of Tabular

April 8, 2024

In this bonus episode, Eric and Kostas preview their upcoming conversation with Ryan Blue of Tabular.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:04
Welcome to The Data Stack Show PRQL. This is a short bonus episode where we preview the upcoming show, you’ll get to meet our guests and hear about the topics we’re going to cover. If they’re interesting to you, you can catch the full length show when it drops on Wednesday. Welcome back to The Data Stack Show. Kostas, we’ve talked a lot about databases, database technology. You know, it’s been a common theme on the show. But today, we’re gonna dig really deep into that world. That high scale. So Ryan blue is our guest. And he helps create Iceberg, which is now part of the Apache Foundation. And it’s gonna be a great story. I mean, I am really interested in hearing the background of the challenges that they face at Netflix, you know, where this was originally developed, and then it’s above my paygrade. But I am really interested if you would be willing to ask him about file formats, because that is actually another interesting thing that we haven’t covered in great detail. I mean, we’ve done it here or there. But, you know, that’s a huge topic when it comes to Iceberg when we think about data lakes. So that’s another topic that I’ve been thinking about, just as it relates to all of rands experience. So hopefully, I didn’t steal your thunder on the files. Question. But what do you want to ask about? Yeah,

Kostas Pardalis 01:43
I mean, first of all, I know that like most people, when they think about Ryan, the thing of Iceberg, but what is like, I think extremely interesting is that Ryan has been around for a very long time, he has been part of building some of like, very foundational pieces of technology that we are using today, like things like Avro, Parquet, and obviously, like that, they bill formats, like our Iceberg is. So outside of any anything technical, that we will be talking about with him. One of the things that we’ll spend quite some time with him is like, do a little bit of like history, like, Why think why things actually happened the way that they happened with that with him. And it’s like, in my opinion, super interesting is about how when it comes to data processing, there are actually two parallel tracks of development have happened in the past like 10-15 years. One, which is coming primarily, like from the database folks that were building database systems. And another one is like coming actually, from people that were primarily distributed systems people. And that’s where things like MapReduce game stuff like Hadoop, and like all these big data, technologies that we are talking about, and we will see that there are like some very interesting comments, and points, they’re made of like how were invented some things, or we did some things like differently, why this happens. And Rand gives like a very interesting perspective into the evolution of the systems and how they happened and why. And outside of that, we’ll talk a lot about file formats, which is also wider for codes topic, or care, for example, has been out for a while there are like a lot of conversations with like, we need to update it. There are some actually new things coming out these days. So I think it’s like a very good time like to do like a refresher on what file formats are and for storing data, and how they differ between them, and how they differ to table formats like Iceberg, right? And on top of that, we’ll talk also like a little bit about like tabular is company and also about some other like really interesting things that are happening right now in in the space. So make sure you listen to the episode is very interesting. Brian has like a lot to share, and we have a lot to learn from him.

Eric Dodds 04:25
Great. Well, let’s dig in and talk about Iceberg and all the other things. All right, that’s a wrap for the PRQL. The full length episode will drop Wednesday morning. Subscribe now so you don’t miss it.