The PRQL: What is Data Discovery?

October 21, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Shinji Kim of Select Star.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show prequel where we talk about the show that we just recorded to give you a teaser. Costas, one of the phrases that was used in the show towards the end, which was really interesting was a lot of our customers initially find this tool to act like a Google for their own data, which was a really interesting concept. What was your take on that? Yeah,

Kostas Pardalis 00:34
I mean, it’s, I think the complexity of working with data lake explodes really, really fast. I mean, if you start like, collecting data from a couple of sources, and then you start joining the data together and creating Wait, pipelines and all that stuff, it’s like super, super hard, like, for someone who hasn’t been there, since the data started to get collected, like to figure out what data is shared. And what it’s a piece of data means, what to trust, and what not to trust, and what’s used by which tool and all that stuff. So I then this is like true, like, even like, in small companies, I don’t need to go have like, 1000s of tables there for this to happen. But yeah, like, like, it grows, like I don’t know, probably like, very in a very exponential kind of nature. So yeah, like having the ability to just go there and like search for something and like, come up with a table that might be helpful for what you’re looking for, I think makes a lot of sense. Like, yeah, I’d love to carve this tank of like Google to really sit in like searching the data of the company, and then figuring out how to connect them together. So that’s, I think that is a very, very good feedback to get

Eric Dodds 01:52
for your products. Yeah, I totally agree. And I think, I think one of the things that, you know, was appearing in the conversation, but, you know, you, it’s easy to think about the steps of the data flow being, you know, sort of distinct, you know, distinct thing, two distinct things that happen in a particular order, you know, as very clear steps, right? So it’s like, okay, well, we got to ingest data, we need to transform it somehow, then you like, you know, model it, and then you have BI right, and it’s like 1234. And the reality is that modeling, you know, transforming modeling NDI are all like highly iterative processes, right. And you have to go through a long life cycle, until you get things that are durable enough to where they don’t change a lot. So you’re believing in a small company. And I think it was just a helpful reminder. And you mentioned this actually, in the introduction that understanding your data, knowing what data you have really is a key first step in actually enacting governments which is ultimately what a Shinji of select star is helping people do. And she dug into that on the show you got pretty technical, which was really interesting. How do you do that and automate, you know, lineage and metadata and all that sort of stuff. So if you are interested in anything around lineage, data governance, metadata, and data discovery, you’ll definitely want to check this one out, and we will catch you on the next one.