The PRQL: How Did Pandas Become a Data Science Powerhouse? Featuring Chang She of Eto Labs

October 23, 2023

In this bonus episode, Eric and Kostas preview their upcoming conversation with Chang She of Eto Labs.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show prequel where we replay a snippet from the show we just recorded. Kostas, are you ready to give people a sneak peek? Let’s do it. Kostas, this one thing that is amazing about Chang. I mean, of course, in addition to the fact that he, you know, was a co author of the pandas library, which is legendary. And the fact that he has built multiple high impact Technologies is a multi time multi exit founder, building data tooling, you know, in sort of the data and ml ops space, I mean, all of those things are, it’s really incredible. But when you talk with him, he, you know, if you didn’t know who he was, you would just think this is just one of those, like, really curious, really passionate, really smart founders, you know, and you said at the very beginning that he’s humble that I mean, that’s almost an understatement. You know, he’s just, he would treat anyone on the same level as him no matter their level of, you know, accomplishment or technical expertise. Yeah. That really stuck out to me. And I also think the other thing that was really great about this episode was, it wasn’t like he came out and said, You know, I have an opinion about the way the world should be. And like, this is why we’re doing things like the lance DB way. He just kind of had a very calm explanation of the problem, and a really good set of reasoning for why he needed to create a new file format, right, which is, like shocking to hear, you know, because it’s like, well, you know, you have, like Parquet exists, why do this? Right? So it sounds really shocking on face value, but then his description was really compelling. And the story of how they actually sort of almost backed into creating a vector database, you know, because they invented this file format, just an incredible episode.

Kostas Pardalis 02:20
Yeah, I mean, chunk was like one of these rare cases where you have both like an innovator in the builder, which is, like, I mean, it’s hard to find an innovator, it’s hard to find a builder, it’s, like, even hard to, like, find someone who combines these two, and at the same time being like, down to earth, like, like him. I think this episode, like has pretty much like everything. I mean, it sounds like lessons from the past, that can be like super helpful, like to understand, like how we should approach and like solve problems today. And there’s like, a lot of things like to learn from the story of pandas that are applicable today, for for everyone who’s trying like to build tooling around AI in the middle. It has a what I really enjoyed was actually probably like, the first time that we talked about something, I think, very important, which is how that infrastructure needs to evolve in order to accommodate these new use cases, and actually accelerate innovation with like AI in the middle, which is still like work in progress. And I think Chang provided like some amazing insight of like, what are the right directions to do that. And he said some very interesting things about not creating silos, how, like, you know, give, like a very interesting example, like from, like mathematics, where he said that, you know, in mathematics, like when you have a new problem, you try to reduce it to a known problem, right. And that’s like, how we should also like build technology, with like, an amazing insight, to be honest, and I think it’s something that especially like builders keep like to forget and tend to like, either like replicates or creates a bloated solutions and all that stuff. So there’s like a lot of wisdom. In this episode, I think anyone who’s like a data engineer and will get like a glimpse of the future of what it means like to, like work with the next generation of like data platforms. They should definitely tune in and like police into chunk.

Eric Dodds 04:16
I agree. Really an incredible episode. Subscribe if you haven’t, you’ll get notified when this episode goes live on your podcast platform of choice, and of course, tell a friend, many exciting guests coming down the line for you. And we will catch you on the next one.