The PRQL: Exploring the Evolution, Challenges, and Benefits of Composable Data Stacks Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

January 29, 2024

In this bonus episode, Eric and Kostas preview their upcoming discussion with a panel of experts as Wes McKinney (Co-Founder, Voltron), Pedro Pedreira Software Engineer, Meta), Chris Riccomini (Seed Investor, various startups), and Ryan Blue (Co-Founder and CEO, Tabular) join the show.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show prequel. This is a short bonus episode where we preview the upcoming show, you’ll get to meet our guests and hear about the topics we’re going to cover. If they’re interesting to you, you can catch the full length show when drops on Wednesday. Welcome to The Data Stack Show, we have a truly incredible panel here to discuss the topic of composable data stacks, so many topics to cover today. So let’s get right into introductions. And I’m just going to do it in the order that it shows up on my screen. Chris, do you want to start out by giving us a quick background and intro? Sure, yeah,

Chris Riccomini 00:44
Sure yeahm my name is Chris Riccomini. I have spent the last 20 years of my career at two companies, mostly LinkedIn where I spent a lot of time on streaming and stream processing, and was the author of Apache Samza, which was an early stream processing system kind of similar to Flink. And most recently at a company called Blue pay, which was acquired by JPMorgan Chase, where I ran our payments, infrastructure, data, infrastructure and data engineering teams for a stretch of time. I’ve also written a book for new software engineers kind of a handbook, because I was tired of saying the same thing in one on ones over and over again, I’ve been involved in open source, I was an editor for the airflow project and helped guide it through incubator on Apache. I also do a little bit of investing. And so that’s where I spend a chunk of my time now. And I Yeah, write a little newsletter on all things, systems infrastructure. That’s me in a nutshell.

Eric Dodds 01:33
Very cool. Wes, you’re up?

Wes McKinney 01:37
Yeah, I’m Wes McKinney. I’m a serial open source project, open source software developer have created or co-created a number of popular open source libraries, pandas and Ibis for Python, Apache aero, kind of in memory data infrastructure. Layer, it’s very relevant to the topic of today’s today’s show. been involved in a bunch of a bunch of companies, most recently, a co founder of Voltron data building accelerated computing software for the composable data stack, and posit the data science platform company for for R and Python. I am an author of the book Python for data analysis. So popular reference book for Python data science stack. And I also do a fair bit of angel investing in and around the next generation data infrastructure startups.

Eric Dodds 02:37
Very cool. Ryan, you’re next on my screen.

Ryan Blue 02:40
Thanks. I’m Ryan Blue. I’m the co creator of Apache Iceberg, which is one of the open table formats that I think is slowly but steadily, making a big change to the way we architect, you know, big data systems, especially in object stores. I’m also a co founder of tabular, where we sell an Iceberg based architecture that has, you know, security, and data management services baked in. I left Netflix to found tabular, and Netflix. at Netflix, we were on the open source, big data team. So I got to work on Parquet and Iceberg and replace the read and write pads and spark and various other things. Very

Eric Dodds 03:34
Cool. And Pedro.

Pedro Pedreira 03:37
Alright, hello, everyone. I’m happy to be here. Once again, on Pedro Pedreira the software engineer have been met up for a little bit over 10 years, always involved in projects around data infrastructure, a little bit closer to analytic engine DBLog processing agent. So it’s been most of my career just kind of developing databases and data processing agents and think about in the last five years, I started getting a little closer to this idea of composability and how we can make it can we make the development or those those engines more efficient. So we started working on a variety of projects related to this space. One of the projects that we eventually open source that got a little more visibility on the industry was developed, which was recently open source to this idea of making execution more composable for data management systems. But inside Matt, I work with a variety of teams with most of the warehouse compute a large warehouse Compute Engines like presto, like Spark, kind of this data processing area for analytics, developing efficient query engines is sort of the thing.

Eric Dodds 04:41
All right, that’s a wrap for the prequel. The full length episode will drop Wednesday morning. Subscribe now so you don’t miss it.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

The PRQL: Exploring the Evolution, Challenges, and Benefits of Composable Data Stacks Featuring Wes McKinney, Pedro Pedreira, Chris Riccomini, and Ryan Blue

January 29, 2024

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter