Episode 230:

The Cynical Data Guy: Data Tech Debt, Data Mesh, and Dashboard Directives

February 26, 2025

This week on The Data Stack Show, Eric and John welcome back Matt Kelliher-Gibson for another edition of the Cynical Data Guy. The three go through their hot-takes lightning round as they discuss the challenges faced by data teams, focusing on technical debt and the concept of data mesh. They critique the illusion of productivity in data teams, the pitfalls of complex SQL queries, and the difficulties of implementing data mesh in organizations. The conversation emphasizes the importance of clarity, governance, and practical strategies in managing data while also exploring the potential of AI to optimize SQL queries and reduce technical debt, highlighting the need for balanced human oversight. Stay cynical! 

Notes:

Highlights from this week’s conversation include:

  • The Return of the Cynical Data Guy (0:14)
  • Risks of SQL Complexity (2:16)
  • Technical Debt in Data (4:34)
  • Data Mesh Critique (6:38)
  • Governance vs. Decentralization (9:55)
  • Never Let a Stakeholder Tell You They Need a Dashboard (12:05)
  • Dashboard vs. Table (13:34)
  • Organizational Dynamics in Data Requests (16:35)
  • AI and Prompt Writing (19:43)
  • Search Techniques and User Behavior (21:20)
  • Discussion on Code Optimization Tools (23:19)
  • Final Thoughts and Takeaways (24:47)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds  00:03

Welcome to The Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to The Data Stack Show for one of our favorite monthly installments. Our time with the cynical data guide Matt, welcome back to the show.

Matthew Kelliher-Gibson  00:37

Thanks for having me

Eric Dodds  00:41

from the bowels of corporate data America, we are going to do three rounds today, possibly with a fourth round. I have some tasty LinkedIn selections here for us to go through. As always, we will only mention the name when appropriate. And this first one actually is from our good friend Ben rogajan, who’s been on the show multiple times? Great guy, great thinker. But then we’re gonna, we’re gonna put you on the dock for a year. Data teams can often portray the illusion of high productivity while accumulating a devastating amount of technical debt. Just write another dozen DBT models or build a few new Tableau dashboards with internally calculated metrics that only a single analyst knows how or why they were developed. Speed is the goal, right? Just because the number of data pipelines your team manages is growing and your data team’s head count is getting larger, doesn’t mean you’re more impactful or even providing high quality data products. Cynical data guy, yeah, be careful. Especially

Matthew Kelliher-Gibson  01:41

there’s this corporate thing, like, the more people I have, the more important I am. That is not the case. If you’re a data engineering team, you are setting yourself up for some pain. If you’re not careful on that one, there is some that’s just, you’re setting yourself really bad. But it’s not just data pipelines, either. I have known people that, like they thought, if they got really long SQL queries, it meant they were really smart and not just horribly inefficient and obscuring everything so no one else can try to take their job.

John Wessel  02:16

Yeah, there’s, I’ve seen some of those queries, and I think it’s hard for me to understand. Like, is it like, intentional, like, there’s, there’s like, very long, like, like, any sort of sequel included, of like, okay, like, they just laid up their doing, they’re figuring it out, they’re learning and it’s just really long. And then I guess there’s another version of like, I want to be fancy, I want to do this in a cool way that just ends up with that, like, crazy amount of complexity.

Matthew Kelliher-Gibson  02:44

I actually worked at a place where we had someone that had two opinions on this person. There were people who worked with him, really, just to see his secret queries and it was just amazing. And everybody else who had to deal with them was like, there’s no one he’s doing. It was completely complicated for no reason.

John Wessel  03:04

So I had a conversation this morning, which I thought really applies here, where we were talking about sales commissions. So if you, if you’ve ever done any reporting on sales commissions, like scary stuff, like, you don’t want to be the one to make a mistake, yes, so they got to tell me, yeah, I got my sales commission query. And for the query, there is more English writing, explaining what each section is doing to, like, justify each business rule in line. Then there is actual code over my iteration after iteration. And for the analyst, I was like, I think that’s the right thing. I think you’re doing it right. Yeah. So painful, yeah.

Matthew Kelliher-Gibson  03:46

Well, in the number of data pipelines, going back specifically to this, like, realistically, that should never be what you’re focused on is the number you have, right? Another quick story was, I worked with someone else where they were making the transition from on prem to cloud, and they announced that everyone knew about it at the last minute, one of the Power BI reporters was like, You can’t do this. I have 120 queries. I don’t have time to do that. We all just heard like, wait, how many queries? And it turned out, every time you needed to add another column, he built another query and model in power. So it was really only, like, three queries he needed, right? So bloat is a sign you’re probably doing something.

Eric Dodds  04:31

Yeah. You know, one thing I would that’s interesting about this is, and, like, data is exploratory, right? And so I think one of the things is, like, planning on how you deal with, how you deal with that technical debt. Because if you think about software like, I mean, whenever you’re working in a technical environment, you have technical debt. I mean, yeah, there is literally just no way around, right, right? Yeah. But there should actually be a lot of technical debt in data, because you’re trying to figure out the best way to solve something, or, you know, things change, whatever. It’s not like that doesn’t happen in software. But you can’t, like, build technical specifications for exploratory problems in data quite like you can if you are, like, doing a very detailed spec on, you know, certain things in software, which is interesting. And I was thinking about, like, as a data leader, like a really good data leader is probably, like, planning on, okay, how are we actually going to deal with this? Because it’s inevitable number one. But also, I would argue, like, if you are trying to produce a lot of value, you’re going to create a lot of technical debt, you know.

Matthew Kelliher-Gibson  05:44

data. So are they planning on how to deal with it, or is that just the reason we tell ourselves to make our six feet? So I think

John Wessel  05:52

there’s a product of you here. It’s like, what is the equivalent? These are black, like, long start because, like, that’s a way, like, in a product that you can that you can keep a lot of the complexity lower, it only turns on your set component, percent, yeah, like customer

Matthew Kelliher-Gibson  06:09

That’s called comments.

Eric Dodds  06:13

That’s called more comments than

Matthew Kelliher-Gibson  06:15

Dash. Dash. No,

Eric Dodds  06:18

Actually, I was looking at data bold recently, and it looks like they’re focusing on data migrations. Maybe we should get them back on to talk about that. Yeah, because it’s not quite Yeah. Okay, that was, that was round one. Let me pull up the next label, the next one. Oh, man, this isn’t good. So excited about this, I’m just going to read it. I’m just going to read it. The data mesh is dead. Here’s why. In the last couple years, data mesh has been the next big thing. It promised to fix bottlenecks, decentralized ownership and scale data driven organizations, but today, most companies failed to make it work. What went wrong? Decentralization led to Chaos. Governance has become a mess, more silos, not fewer. Too complex to maintain what actually works, a hybrid model, real accountability, practical governance. The takeaway, data mesh sounded great in theory, in reality, few made it work. Time to focus on scalable, manageable data strategies. What do you think was the data mesh always flawed, or did we just implement it wrong?

Matthew Kelliher-Gibson  07:24

The communism of data I think if you take data mesh on its own terms, so regardless of what I think about it, I think there’s a point here that there’s an operational model that needs to be different for it, but on its own terms, we have two things that are competing against each other that are just, I don’t see a solution to one is it only makes sense if you’re huge. It doesn’t make sense if you’re just a small team, right? But it’s a way of life or a way of thinking, or something like that. So you have to completely have a cultural revolution and operational model changes and stuff like that to work, which weren’t really bad at that. Oh, giant organization. So you’ve got these two forces that are just going to compete against each other that I don’t know that’s ever going to really square that circle.

John Wessel  08:21

Yeah, it is such an odd fact of life of all right, who can benefit best from extra layers of abstraction? It’s like when I have more people and more teams in a larger context, like, who’s the least likely to change their processes to take advantage of, said, like, extra layers of abstraction. Yeah, the same people. So because it’s like, the people most likely like could implement once they SQL mesh a small, like, small ish company, people that don’t probably need SQL mesh, like a small ish company, where

Matthew Kelliher-Gibson  08:55

we’re just gonna, like, pilot wrap on what we’re doing,

John Wessel  08:59

right? We had a conversation before the podcast today. Like Kobe, is that you don’t really need even, like a DVD, like a layer. It’s just not that much data that there’s like one person like a company working with one person in accounting who I actually like, I think, will be very diligent with whatever process we put in place. And they can get away with a couple of views in a database that essentially, like, that’s what they’ll pull data from. That does not deserve, like, an entire abstract and like pipeline of builds and tests and what all these things

Matthew Kelliher-Gibson  09:31

well, it’s also its size. Even if you get to, like, multi magic and things like that, right? It’s hierarchical, right? Like, we haven’t figured out how to organize humans at that scale in a decentralized way. I don’t think you probably can do it and have it all going in the same direction. So to be like, Well, if you’re really big, if you just decentralized, it’ll work great. It’s like, maybe, I don’t know, you got a couple 1000 years of human history fighting. Well,

John Wessel  09:59

they. And you have the, like, the leaders that are concerned about data governance, security, yeah, you know, etc, etc. And it’s like sharing, sure, like, to some extent, like the data sharing problems with the problem, but we also want, like, symptom governance and all these other

Matthew Kelliher-Gibson  10:16

Oh, and there’s also inefficiencies, and the CFO is going to come down on you for that. Why do we have 12 different vendors to do the same thing? Yeah.

John Wessel  10:24

I mean, if you Why does Microsoft work for us for the solution? First question? Yeah, yeah, yes,

Matthew Kelliher-Gibson  10:30

exactly. I mean, if you are a chief data officer and you’re like, I’m gonna build my clout by building the number of people on me, this is probably a great way to do that, but it’s probably gonna be temporary, and the phone crashing down.

Eric Dodds  10:44

My hot take on this is that I don’t think that data mesh was ever alive too and I’ll tell you the reason wasn’t like, show nurse, well, like, okay, so I know this from the show. I remember when data mesh started to become a hot topic, and we had multiple discussions about it on the show, and it was really hard for multiple very smart people to really pin down what it was and what it meant practically, for day to day, you know, data teams. And after going through that multiple times, you know, just I was like, Okay, if these are, like, we have smart people, like, on trying to, like, parse through it, and they it’s like, I don’t really know what this is, right? And so that, to me, was a major warning sign from the very beginning. You

Matthew Kelliher-Gibson  11:43

just end every interview and you’re like, I just have one more question. What is data?

Eric Dodds  11:49

I mean, that was kind of

Matthew Kelliher-Gibson  11:53

That’s why I say it’s like the communism of data. In a sense, it’s like, it sounds good, but when you really get into, like, the practical, how is it going to work? You’re like, oh, wait, no, this is never going to work where I am, yeah,

Eric Dodds  12:05

That’s been an interesting one. I’m sure there’s some great takes out there. Okay, round number three, never let a stakeholder tell you they need a dashboard. I almost just want to stop there, because I’m running some failure, but I’ll read the whole thing well, sort of Never let a stakeholder tell you they need a dashboard. They shouldn’t even be asking. A dashboard is a solution, and we, the data professionals, need to be the ones to determine the best solution for a specific problem. It’s our job, not theirs. If a stakeholder is coming to you asking for a dashboard, they are not coming to you with a problem. They are coming to you with a solution. Don’t let them do this. Stand up for yourself. It is the domain of the data expert to determine if a dashboard is the correct solution, not the stakeholder. After all, who is the data expert here? Do you agree? Or am I totally wrong?

Matthew Kelliher-Gibson  13:01

This feels like you’re getting up in the morning. I am a strong, powerful data professional. I read that, and kind of my first thought is like, oh, sweet summer child, you all are like, I don’t know. Yeah, they come. They say they want a dashboard. Maybe they actually need it. You could just ask follow up questions and then go like, Oh, well, you really just need his report for the next two, two months. Cool, we’re good. We don’t need to be putting our foot down on this.

John Wessel  13:34

My Yeah, my take on it is, sure, maybe they’ll need a dashboard. But for the average users, like, here’s our video. We’re going to make a dashboard. We’re going to send you that dashboard on a regular basis, and it’s not going to have any graphics in it. Is that okay? Yeah, that’s what I wanted.

Matthew Kelliher-Gibson  13:58

Dash. But it looks like it

John Wessel  14:02

looks like a team, but that conversation, but yeah, probably half a dozen times over the years. Yeah, that’s what I wanted, like a table. Yeah, this is like discovery in a physical Excel file, right? We need to

Matthew Kelliher-Gibson  14:19

do it. Empower Me. Just put it in a work, ah, but this is part of the product discovery that, you know, you just do it. It’s part of it. I mean, also, like, I remember, I was the one data person on the marketing team, and our data wasn’t great, and I would meet with people on the team and they would have this, what I want, these outlandish things, and what do you say? You go, Okay, that’s great. So why don’t we marine out of this? Oh, that. Okay, so we can’t do X, and here’s the reasons why, but I can give you y and z until you go talk to them about how to make this work, right? Like there’s ways of doing things. Is an out kind of being like, No, you don’t say dashboard. I say dash.

Eric Dodds  15:06

Yeah, it’s, I just read this and I think, how long would you last with that attitude? Yeah, you know, I know he’s, I know it’s meant to be a provocative

Matthew Kelliher-Gibson  15:20

I would say, No, not a year, because I’ve seen this. Like, though we’re not doing that, it doesn’t last a year. Yeah,

John Wessel  15:26

yeah. Well, and it gets worse, right? Because now there are more options. If it is your mindset, you’re like, I’m a data professional. Like, I will recommend the solution, and then you come up with solutions. And like, you know what? You don’t need a dashboard, you need a report. You need a LLM to talk about, you know, this thing like, it’s just gonna get worse, because the solutions will look even wider. Well,

Matthew Kelliher-Gibson  15:50

it also reaches a good point there. And the idea of, yes, you may know something about how it looks best when you’re gonna do it, but you need to understand how they’re gonna be consuming too, right? So if you’re like me, I have a better idea. We’re going to do this thing over here, or the other one, which is like, someone asks for a report. People decide they’re going to make a dashboard, and it’s like, no, they live in Excel. That’s all they ever look at. You want to put them in another system they have to log into and check they’re never going to use, versus something that just emails them in Excel work like every record or whatever. So there is a give and take to that. There’s what’s going to work best from a data professional standpoint, but there’s also, how are they using it? What are they going to use?

Eric Dodds  16:33

Right? What do you think about this? Again, I understand it’s a hot take on LinkedIn, but it sort of applies this rigid rule to everyone. But, man, it really depends on who’s asking. Yeah, you know, I mean, it’s like, okay, some, someone who’s, you know, pretty low on the org chart, and, you know, it’s like, okay, I mean, yeah, okay, no, we have other priorities, right? Like, if your boss says they need a dashboard, you make them a dashboard.

Matthew Kelliher-Gibson  17:09

That discussion,

John Wessel  17:12

right? Exactly? Or making a dash Exactly? Yeah, yeah, I was gonna say it is so organization dependent, because one like, if there’s a really tight prioritization process and everything gets Dev and which is not very many companies, but some companies, then it really is like, then that kind of shows the power where, like, the technical realm, we get to dictate what happens, because we, like, have this very strict method of, like, prioritizing and whatever we use, you know, business council meetings and stuff to set like, a strategy for those companies. Like, there’s a lot more kind of power swing of like it or technical or data people get to do the label, to do it. And the other way, if it’s like, a really sales driven organization or marketing driven organization, everything’s there and everybody else is there to, like, their best purposes, so they can have their goals or ops, yeah, or well, and

Matthew Kelliher-Gibson  18:04

then you can see people who try to do stuff like, everything must be submitted in the format of user story or as a blank i and it’s like, right? I’m gonna do that. That. Gonna do that,

John Wessel  18:15

right? But now your boss and complain, yeah, yeah. But I think both of those, if it’s like, you know, essentially all right, sales has this audacious goal to grow to x in the room dollars this year, like everybody else, like, better support them. But if you don’t support that, you’re gonna get blamed for them by having a goal, right? Like, that’s one culture, versus, like, the opposite. Yeah, yeah, yeah, yeah. I was just thinking about a user story of as your boss, I wanted, that’s the whole, yeah, that’s the there is no dash, right? User story, okay, do we have, let me ask Rich, do we have time for an AI bonus round. Yes, we have time for an AI bonus round. Okay, this is hilarious. Okay, this is someone quoting a post on x, and I’ll read the commentary that they added, and then I’ll read the post that they quoted. So Matt Novak, we’re going to give you a shout out, because this is gold, says AI folks have now discovered thinking quote, unquote, and the post that he puts is sometimes in the process of writing enough prompt, writing a good enough prompt for chat, GPT, I end up solving my own problem without even needing to submit. I mean, that is just pure gold. What’s the rubber duck? Yeah, that senses your rubber duck in it, right? Yes, you always have little boxes of Python, yes, rubber duck, yes, that’s exactly right. Matt, I love it. Thank you for that wonderful commentary on a topic I’m actually interested in. Have you had that experience? Either of you writing prompts? No, no, I don’t think I’ve ever typed something out. We thought, oh, like, I know what to do.

Matthew Kelliher-Gibson  20:10

Usually I could. Usually I’m asking for something factual or like to summarize anything or don’t really need it. At that point, I understand

John Wessel  20:18

the general like thought. And do you think it’s valuable to, like, have a thought partner and, like, shoot, you know, send me back and forth. Hey, ask me questions about this. Hey, help me clarify that. Like, I think that’s why you Yeah, but you might as well do it in an interactive format. Like, I don’t think I’ve ever put enough effort into, like, a really large, you know, like, initial prompt for that to happen, but I didn’t happen. That’s

Matthew Kelliher-Gibson  20:43

true. That feels a little bit like I’m gonna write my entire code for the app, and, yeah, I’ll do it, right? No, I’m just gonna throw something in there

Eric Dodds  20:52

and then kind of, right. It’s by nature, like, very iterative, right? Yeah, yeah. Super interesting.

Matthew Kelliher-Gibson  20:58

That could also be contained a person who tries to figure out the perfect Google search terms never met that person before

Eric Dodds  21:08

would be interesting. Well, that’s like all of the options you have when googling. I mean, you can build some, like, pretty robust, you know, queries that. I mean, it’s so powerful, but almost no one uses them. Yeah, right, yeah, yeah. Like, not, like, operate your terms, yeah, totally. Or my favorite, like, search for PDFs. Like, there’s some real one like, like, putting quotes around, like, one word. I mean, there are all sorts of really helpful things that you can do. You know, my question is, 20 How long has Google been around? Like, primary stuff? Like, 20 years at least. Yeah, yeah. Like, what component of like, G tier, AI is going to be that way? We’re essentially, like, out in the open. You’ve been able to do this for like, 20 years with that and, like, nobody knows it, I wonder, yeah, for sure. Well, it’s

Matthew Kelliher-Gibson  21:54

like, as they’re searched out, better handling your natural rise of doing it that became

Eric Dodds  22:00

less necessary, yeah, yeah, that’s very true, yeah, and

Matthew Kelliher-Gibson  22:04

yeah, but they leave the features in there for when you need them, yeah, if you can even remember them, yes,

Eric Dodds  22:08

but you can, but you have to Google the Bible. You totally do Google. You Google, you know, Google.

Matthew Kelliher-Gibson  22:19

I do that with chat GPT, where I’m like, how best should I ask you to do this? Totally Sure,

Eric Dodds  22:24

totally Yeah, which feels weird, but is really helpful. It seems very logical at the same time, yeah, yeah.

Matthew Kelliher-Gibson  22:30

Or open up a new prop and you gave me this error under

Eric Dodds  22:35

clearly context, yeah. Actually, I do have an AI question related to the first post about technical debt. Do you think that? Do you think that AI will help in those scenarios you mentioned, where there’s, like, all these, you know, there’s, like, an early query, or whatever, I mean, it is really good at, like, iteratively saying, you could say, like, you know, make this more efficient. Like, it’s, it is actually very helpful for that. I mean, I know, you know, people probably aren’t going to just, like, write production code and deploy it, right, but at the same time, like, I don’t know. I mean, do you think it’s going to haven’t, do you think it’s going to be a core way that people write SQL? We have, we had somebody on the show a couple weeks ago that essentially their work gave us, like, code AI space. Misha, yeah, yeah, yeah. Which I really like their how they’re thinking about it. I think it applies to, like, do the simple stuff too, where, essentially, like, if we can be focused on some of the quote, tech debt. So we can point this AI at our SQL code base, for example, and then create pull request of like optimizations, like for humans to review, and they’re good enough where, let’s say 80% or 70% of the poor guys get merged, you bring a little then, like, that’s how you add, yeah. But if we’re like, 40% gets merged, then it’s like, then it’s just gonna be a mess, right? Yep. So that was a solution they’re working on that makes sense to me, for SQL as well. And I have seen startups that’ll, like, analyze, like, every single that runs, like, a shared database, and, like, optimize for whatever database engine. When that makes sense. I think racing was that, yeah, for sure.

Matthew Kelliher-Gibson  24:22

I mean, I could see that even just in the editor, where you write something and kind of be like,

Eric Dodds  24:29

Yeah, a little real. Do you really want to do that? So totally

Matthew Kelliher-Gibson  24:33

It’s like, well, you know, you’re a big group. You say select star, Shrum table, limit five. It goes, there are 130 columns, and that’s right, and even you just want,

John Wessel  24:45

right? Yeah, yeah,

Eric Dodds  24:47

All right, well, we are at the buzzer. That concludes this month. Cynical data guy, Matt, as always, thank you for joining us, and we will catch you next month. Thanks. Cynical. The Data Stack Show is brought to you by red. Stack, the warehouse, native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.