The PRQL: Data Warehouses on Steroids

August 26, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Kishore Gopalakrishna of StarTree.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co

Transcription:

Eric Dodds 00:00
Hello welcome to The Data Stack Show prequel we just recorded a fascinating show with Kishore. He is one of the founders at StarTree, and was one of the original creators of the Apache Pino project, which can do user facing analytics at an unbelievably massive scale. He developed that at LinkedIn. Costas, it was such an interesting episode what what really stuck out to you from our conversation with Kishore, I’m going to let you go first this time, because I usually don’t do that.

Kostas Pardalis 00:36
I mean, okay, I really enjoyed, like, the technical conversation that we had, like systems like Trino, like, have some very interesting, let’s say, technical choices, because they have to deliver, let’s say, very fast results, but make sure that they can also support like, like crazy concurrency. And it was, like, super interesting for me like to go through, like the choices that they’ve made to achieve that. And also, what the trade offs are, it’s like, you know, sometimes like you hear about you say, like, Okay, we have like Snowflake who have BigQuery out there. And then you have like, something like, BEAM or like ClickHouse And you’re like, Okay, why not just use like, these the sound like, Okay, did our house on steroids, right? Like, why did we need both? Yeah, and there’s like a lot of noise out there. It’s not always easy, like Father Son, like the differences. And for me, it was like, extremely, extremely enlightening today to understand exactly like, what the differences are and why at the end, we probably need like, both of them because they should have like different use cases and they make little different trade offs. So yeah, that’s that’s what I keep like from the conversation today. And I’m really looking forward to like talk with more vendors of this space and about how we build the systems.

Eric Dodds 01:58
Yeah, absolutely. I agree. I think one of the other one of the other things that was interesting was you know, Pinot is used by really large enterprises a lot of them right But Kishore said and it was fascinating he said that his opinion has changed on sort of the scale you need in order for a system like Pino to be useful to you. He said really, it’s becoming more valuable and easier to run for smaller organizations and smaller data teams as well which was super interesting so definitely definitely listen to the upcoming episode with Kishore about that you know and StarTree which is built on Pino and we will catch you on the next one.