The PRQL: A Methodology for Better DAGs with Stefan Krawczyk of DAGWorks

July 24, 2023

In this bonus conversation, Eric and Kostas preview their upcoming conversation with Stefan Krawczyk of DAGWorks.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show prequel where we replay a snippet from the show we just recorded. Kostas, are you ready to give people a sneak peek? Let’s do it. Kostas, I love the show, because we covered a variety of topics with Stefan from DAGWorks and Hamilton. You know, I think one of the most fascinating things about the show to me was we kind of started out thinking we were gonna talk a lot about dads, right, because DAG works, you know, sort of the name of the company is focused on DAGs. But really, what’s interesting is that it’s not necessarily a tool for DAGs, like you would think about airflow necessarily. It’s actually a tool for writing clean, testable, ML code that produces a DAG. And so the DAG is almost sort of a consequence of an entire methodology, you know, which is Hamilton, which was absolutely fascinating. And so I really appreciated the way that Stefan sort of got at the heart of the problem. It’s not like we need another DAG tool, right, we actually need a tool that solves sort of problems with complex growing code bases at the core. And a DAG is sort of a natural consequence of that, and a way to view the solution, but not the only one. So I think that was my big takeaway. I think it’s a very interesting, elegant solution. Or way to approach the problem.

Kostas Pardalis 01:35
Yeah, DAG, APIs are everywhere with these kinds of problems, right? Like anything that’s like, close to a workflow, or there is some kind of like dependency there. There’s always a DAG somewhere, right? And, like, similarly, like, again, like, how many don’t, the same way that if you think about like, DBT, right, like DBT, also, is a dyad. Right? It’s every DVD project is a graph that’s connects models with each other. The difference, of course, is that we have like DB D, which is like in the sequel words, and then we have Hamedan, with like in the Python world, and it’s also like, targeting different, different audience, right? So that’s like, at the end, why what Hamilton’s trying to do is like to bring the value of, let’s say, the guardrails that a framework like DBT is offering like to the AI and the analytical and the analytics professionals out there, to the ML community, right, because they also have that and probably they have it also like in deeper complexity, compared to, let’s say, the BI words, just because by nature, like ML models and features have, like deeper, deeper dependencies to each other. So it’s very interesting to see how like the patterns emerge, you know, like in different sides of the field, like the industry, but at its core, they remain the same. Right, right. So yeah, I think everyone likes to go and take a look at Hamilton. They also have like a, like a sandbox like playground where you can try it online if you want and started like building a company on top of that, and like, any feedback is going to be like, super useful for the commandant, folks. So I would encourage everyone like to go and like dude,

Eric Dodds 03:35
definitely. And while you’re checking out Hamilton, I think is try Hamilton.Dev, head over to data stack show. Click on your favorite podcast app and subscribe to the datasets show. Tell a friend if you haven’t and we will catch you on the next one.