In this bonus episode, Eric and Kostas preview their upcoming conversation with Chad Sanderson of Gable.ai.
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com
Eric Dodds 00:05
Welcome to The Data Stack Show PRQL. This is a short bonus episode where we preview the upcoming show, you’ll get to meet our guests and hear about the topics we’re going to cover. If they’re interesting to you, you can catch the full length show when it drops on Wednesday. We are here with Chad Sanderson. Chad, you have a really long history working in data quality and have actually even founded a company gabled an AI. So we have so much to talk about. But of course, we want to start at the beginning. Tell us how you got into data in the beginning.
Chad Sanderson 00:44
Yeah, well, great to be here with you, folks. Thanks for having me on. Again, it’s been a while but I really enjoyed the last conversation. And in terms of where I got started in data I, I’ve been doing this for a pretty long time started as an analyst and working at a very small company in northern Georgia that produced girl parts, and then ended up working as a data scientist within Oracle. And then from there, I kind of fell in love with the infrastructure side of the house, I felt like building things for other people to use was more validating and rewarding than then trying to be a smart scientist myself, and ended up doing that at a few big companies. So worked on the Data Platform team at Sephora and subway, the AI platform team over at Microsoft, and then most recently, I lead data infrastructure for a great tech company called QCon.
Kostas Pardalis 01:45
Boy, that’s awesome. By the way, we mean, so the first time that we have you here, Chad’s. So I’m very like, excited to continue the conversation where we left and like many things happened since then, but one of the things that I really want to talk with you about is the supply chain around data and data infrastructure. There’s always like a lot of focus, either like on the people who are managing the infrastructure, or like the people who are like the downstream consumers, right, like the people who are the analysts or the data scientists. But one of the parts in the supply chain that we don’t talk about much is like going more and more like upstream where the data is actually captured, generated, and transferred into like the data infrastructure. And apparently, like many of the issues that we deal with, like stem from that. There are organizational issues, we’re talking about, like very different engineering teams involved there with different goals and needs. But at the end, all these people and the systems, they need to work together, we want to have data that we can rely on, right? So I’d love like to get a little bit deeper into that and spend some time together like to talk about the importance, these, the issues there, and what we can do to make things like better, right, so that’s one of the things that I’d love to hear your thoughts on? What’s in your mind? What do you would like to tell about?
Chad Sanderson 03:16
Well, I think that’s a great topic, first of all, and it’s very timely and topical, as teams are, you know, the modern data stack is still, I think, on the tip of everybody’s tongue, but it’s a, it’s become a bit of a sour word. These days, I think there was a belief, maybe five to eight years ago, that by adopting the modern data stack, you would be able to get all of this utility and value from data. And I think to some degree, that was true. The modern data stack did allow teams to get started with their data implementations very quickly to move off of their old legacy infrastructure very quickly, to get a dashboard spun up fast to answer some questions about their product. But maintaining the system over time, became challenging. And that’s where the phrase that you use, which is data supply chain, comes into play. This idea that it’s data is not just a pipeline, it’s also people. And it’s people focusing on different aspects of the data. An application developer, who is emitting events to a transit transactional database is using data for one thing, a data engineering team that is extracting that data and potentially transforming it into some core table and the warehouse is using it for something different. A front end engineer who is using, you know, RudderStack to emit events is doing something totally different and analysts are doing something totally different. And yet, all of these people are fundamentally interconnected. Did with each other. And that is a supply chain. And this is very different, I think, to the way that software engineers on the application side, think about their work. In fact, they try to become as modular and as decoupled from the rest of the organization as possible so that they can move faster. Whereas in the data world, if you take this supply chain view, decoupling is actually impossible, right, it’s just not actually feasible to do, because we’re so reliant on transformations by other people within the company. And if you start looking at the pipeline as more of a supply chain, then you can begin to make comparisons to other supply chains in the real world and see where they put their focus. So as a very quick example, McDonald’s is obviously a massive supply chain, and they’ve spent billions of dollars in optimizing that supply chain over the years. One of the most interesting things that I found is that when we talk about quality, McDonald’s tries to put the primary burden of quality onto the producers, not the consumers, meaning if you’re a manufacturer of the beef patties that are used in their sandwiches, you are the one that’s doing quality at the sort of Patty creation layer, it’s not the responsibility of the individual retailers and the stores that are putting the patties on the buttons to individually inspect every Patty for quality, you can imagine the type of cost and efficiency issues that would lead to where the focus is speed. And so the patties suppliers, and the stores and McDonald’s corporate have to be in a really tight feedback loop with each other communicating about compliance and regulations and governance and quality. So that the end retailer doesn’t have to sort of worry about a lot of these capacity about a lot of these issues. The end. The last thing I’ll say about McDonald’s, because I think it’s such a fascinating use case is that the suppliers actually track on their own how the patty needs, like the volume requirements for each individual store. So when those numbers get low, they can automatically push more patties to each store when it’s needed. So it’s a very different way of doing things having these tight feedback loops versus the way that I think most data teams operate today. Yeah,
Kostas Pardalis 07:24
yeah, make sense? Okay, I think we’d have like a little to talk about already. What do you think? Let’s
Eric Dodds 07:30
do it. Let’s do it. All right. That’s a wrap for the PRQL. The full length episode will drop Wednesday morning. Subscribe now so you don’t miss it.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.