Episode 193:

Introducing the Cynical Data Guy: Is Data-Driven a Myth?

June 12, 2024

This week on The Data Stack Show, Eric and John chat with Matthew Kelliher-Gibson, the deemed “cynical data guy,” in a candid discussion about the realities of data work in large organizations. They explore the skepticism surrounding metadata’s value, the myth of a data-driven culture, and the challenges of shifting executive mindsets to trust data over intuition. Additionally, the group delves into the practicality of no-code and low-code solutions for data operations, emphasizing the importance of discipline and understanding the limitations of these tools. The conversation also covers the misuse of tools like Jupyter notebooks in production and the need for clear guidelines to prevent inefficiencies and manage tool usage at an enterprise scale. Don’t miss the battle of the cynical and the agreeable on this week’s episode! 

Notes:

Highlights from this week’s conversation include:

  • Introducing a special edition of the show with the cynical data guy (0:19)
  • Metadata and LLMs (2:32)
  • Data-driven culture (8:44)
  • No-code orchestration tools (17:09)
  • No Code vs. Low Code (19:58)
  • Enterprise Challenges with No Code Solutions (20:08)
  • No Code Tools for Small Companies (21:40)
  • Inappropriate Use of Tools (23:06)
  • Final thoughts and takeaways (24:05)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show.

John Wessel 00:07
The Data Stack Show is a podcast where we talk about the technical business and human challenges involved in data work.

Eric Dodds 00:13
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show, we have a very special kind of episode that we’re gonna try and start doing monthly. And you’re gonna love the name for the show. It’s called the cynical data guy. And it’s because we have a special guest, who is this cynical data guy, Matthew Kelliher-Gibson is joining John and I. So cynical data guy, welcome to the show.

Matthew Kelliher-Gibson 00:54
Thanks for having me.

Eric Dodds 00:58
I’m so excited. I can’t wait to dig into it. But we do need to give the listeners some context here. So how does a typical data guy come about? Well, first of all, John’s data consulting practice is called agreeable data. So we kind of jokingly call him, you know, the agreeable data guy. Yeah. And Matt joined RudderStack, from deep in the bowels of corporate America, doing all sorts of data stuff. And you have tons of horror stories and tons of scars from trying to do data at very large organizations. And so when we talk about topics, you often have, let’s say, a salty view that’s been dashed against the rocks of corporate reality.

Matthew Kelliher-Gibson 01:47
I prefer to think of it as a realistic view, but others disagree.

Eric Dodds 01:56
Well, of course, when John and Matt and I chat in the office about topics or LinkedIn posts that we see, we just enjoyed Matt’s hot takes so much that we started jokingly calling him the cynical data guy. And then of course, one day, we stopped and said, this has to go on the podcast. So we gotta do it. We got to do it. Okay, so here we are. Here’s the format. So I’m going to act as the moderator. We have the cynical data guy, we have the agreeable data guy, and I just pulled some LinkedIn posts that I think are interesting topics to discuss. Okay, so, Matt, I’m gonna just present some of the I’m gonna read some snippets here. Okay. Some of them. I’m gonna keep anonymous to others I won’t, for obvious reasons. Yeah. Okay. Are we ready? We’re gonna do, we’re gonna try and do three Lightning Rounds here. So if I interrupt you and move on, sorry, not sorry,

John Wessel 02:50
I guess Sorry, not sorry. Definitely

Eric Dodds 02:52
not sorry.

John Wessel 02:54
Let’s do it.

Eric Dodds 02:54
I’m definitely not sorry. And neither is Brooks. Okay, I’m gonna read. Here’s the first one. Are you ready? Go? Yeah. Metadata will have a profound impact on the success of modern MLMs. With better assets, developers can leverage API’s to access and utilize their organization’s data more efficiently in their applications, enhancing functionality and capabilities in streamlining the development of AI models. Okay, metadata. And LLS. I’m

Matthew Kelliher-Gibson 03:25
sure that’s true. Once CEOs actually start caring about metadata.

Eric Dodds 03:32
Do CTOs even care about metadata?

Matthew Kelliher-Gibson 03:34
I don’t think I’ve ever met a person who cares about metadata, like in an actual corporation.

Eric Dodds 03:42
So is this just trying to sell software? What is this pipe dream?

Matthew Kelliher-Gibson 03:47
Oh, I mean, everyone’s trying to sell software. So is it a pipe dream? I don’t know if AI can do it for him. But I don’t know. That’s gonna happen. What about the data quality

John Wessel 03:59
aspect? Yeah. So

Matthew Kelliher-Gibson 04:00
How much? How much data quality? Do you see corporate America really investing in it? Not a lot.

Eric Dodds 04:08
In your most recent job, prior to RudderStack, you had a publicly traded company? Did you ever mean to use the word metadata in a meeting?

Matthew Kelliher-Gibson 04:20
Like with a business user, wherever ever? It might have come up to three times in a year and a half? Truthfully, yeah, it doesn’t come up that yes, yeah. Yeah. I remember the first time that I heard metadata, and I was like, what is that? John? So

Eric Dodds 04:41
give us an agreeable take here.

John Wessel 04:43
I think that is the right answer. But like, how, what percentage of companies have metadata data in place that like AI would be useful like today? 0.1 Maybe? Right? Right. I want

Matthew Kelliher-Gibson 05:01
to be there when you meet with another company and say, let me tell you about Metabase. Read the data. And I want to see how quickly their eyes glaze? Well,

John Wessel 05:13
I think it’s an easier sell than data quality. I think that’s my take on it, right? Because data quality was the thing. It’s like, data quality, got better quality data that you can trust. That’s still a thing. But now it’s like, you know, really, it’s about the metadata. And people don’t understand metadata more than they don’t understand data quality.

Matthew Kelliher-Gibson 05:31
So does that mean then metadata is just going to become like, we’re just going to shove all data quality and of metadata. It’s like, 100% out, it’s about the metadata, like making sure your pipelines don’t break. I

John Wessel 05:43
I think people are gonna sound cynical. Oh,

Eric Dodds 05:49
you’ve corrupted someone in the first minutes of the show?

John Wessel 05:53
No, it’s been longer. It’s just been offline. Right? Yeah, that is, I think, when people sell this metadata it has the benefit of being less clear, right? Because if we’re data quality, like, oh, yeah, data is wrong. Data is right. Metadata is like people will use it and not know what they know what it

Matthew Kelliher-Gibson 06:07
means. I think everyone’s gonna think you’re talking about the metaverse. Is that data in the metaverse or meta?

John Wessel 06:12
Right. The company? Yeah. All right.

Eric Dodds 06:15
Okay. So, man, co host corruption in six minutes. i That’s a record.

Matthew Kelliher-Gibson 06:23
Technically, John and I have known each other for like, eight years. Yes. Yeah, so I’ve been working along.

Eric Dodds 06:30
Oh, that is true. That is very true. Yeah. He,

John Wessel 06:32
He’s been undermining it for a long time. Okay, so

Eric Dodds 06:36
The future, the very future of MLMs, is based on an ambiguous concept that no one cares about. And he’s actually have that, my, what I’m getting from

Matthew Kelliher-Gibson 06:48
This is the key to MLMs. And selling it is picking terms that nobody understands what they mean.

Eric Dodds 06:55
Am I that’s probably true

John Wessel 06:57
for a lot. Yeah, it is true. But my agreeable take on it is that there are some, there is some progress, I think, like in the BI space with people doing some neat stuff with MLMs. Like Xin lytic, for example, has a pretty neat semantic layer that you can put on top of your data. And then the LLM interacts with the semantic layer, which does work better than like, hey, generate SQL like, yeah. And from GPT. Yeah, that is the right answer. People are doing it. But I think there’s a lot of overhead and in some ways, like, if you’re a small company with not that much data and the semantic layers, like well, like all my data came from Shopify and my ERP, like, like, you know, somebody could do that for you, right. And you could have something reasonably usable. I think, where you’re going into like, corporate, it’s like, Man, this is just like an impossible amount of

Matthew Kelliher-Gibson 07:49
work. Scalability like every Yeah, yeah. Yeah.

John Wessel 07:53
But you can have those early, like proofs of like, hey, it actually worked for this, like, smaller company. And then people will extrapolate. Oh, like, yeah, it’s gonna take over the world.

Matthew Kelliher-Gibson 08:02
That’ll be the dream, every dream of look, we did it with one data science. How hard could it be? Right, right.

Eric Dodds 08:08
Okay. Okay. Dang. Moving on to the next round here. I can’t wait for this one. Yeah, I’m just gonna read a snippet here. This author will also remain anonymous. I’m gonna choose a couple of pieces here. Data driven people do not equal people looking at dashboards. You don’t achieve data Centricity through the wide adoption of a BI tool, skipping down a little bit. While access to business data is a crucial first step in achieving data centric outcome, a data centric outcome. It is only a small and early step in the overall journey. And then where’s the zinger here? True data Centricity is data driven. This is achieved when there are tangible commercial and operational outcomes stemming from the use of data at all decision making levels in the business. Are you using data to effectively generate more value for the business? Are the top leaders openly asking? What does the data say? Or have we tested this assumption yet? Okay.

Matthew Kelliher-Gibson 09:08
So data driven culture is a myth. No, they just don’t know. Everyone says they want it but when it really comes down to it, you’re fighting against, usually a VP or someone who spent 30 years fighting their way to the top of that corporate structure. And they haven’t been using data, the idea that they’re going to suddenly care about what data is now named of naive

09:37
John: Yeah.

Matthew Kelliher-Gibson 09:43
Stephen the correct responses. Why? Well, that’s correct.

John Wessel 09:49
Welcome comes to mind, it’s the whole, like, data informed thing like it was like we need to be data driven. And then it’s like, Let’s pump the brakes a little bit and go back to the day they’re informed because there’s this like space for like, intuition involves law. I got to hear your take on that before I move on. But what’s your take on the phrasing? Well, I

Matthew Kelliher-Gibson 10:10
mean, data informs. What does that mean? I gave you data. Yep. I’m doing whatever I want. Anyways, okay, you were informed? Right? So

John Wessel 10:21
I think the take on it for me as data is absolutely helpful. From a forensic standpoint, like, I need to find out what happened, super helpful to have, it’s helpful from a behavior standpoint of like, almost all of us have, like watches now that like track steps and a few track steps, like the walk more, yeah, like you do, if you care about it, right. So I think that type of data is super useful. And like that as a form of data driven or like, we have a goal, like, we’re not really clearly tracking this activity. And we need to get here every single day. Like, that’s a wonderful use of data. Like the stuff beyond that, where it’s predictive, or it’s, you know, like recommendations, like, where you get into like, the more like aI ml stuff, like I think mileage may vary, right?

Matthew Kelliher-Gibson 11:08
Have you been looking back on it? Part of it? I mean, I’ve been in plenty of meetings where it was, you know, was essentially pick your metric, you know, oh, we just did a big campaign. How did it do? I don’t know, pick the top three metrics that showed the best results. That’s what we’re now set, the data set,

John Wessel 11:26
look at the number of views.

Matthew Kelliher-Gibson 11:30
I mean, I’ve been in situations where, you know, literally, like, you’re looking at stuff like year over a year or whatever, and it’s down. It’s like, well, it’s down. But it could have been done more. So you know, really, this is a success, and we should roll this out everywhere. And then that argument one.

Eric Dodds 11:51
So is it okay, so the, okay, let’s talk about the executive who battled their, you know, 30 years through the valley to emerge on the other side. And they don’t use any data. Why aren’t they using data?

Matthew Kelliher-Gibson 12:06
And they probably started out partially just because it wasn’t as much available when they were going up, right. Yeah. I mean, this isn’t like Oh, if they just have always been data for nerds. Yeah. Not. I mean, they most likely had to do stuff without it, they might either didn’t know it was available, or it wasn’t available. Yeah. And they had to make decisions. And one of the things we do as we’re successful is we reinforce and say, Well, this is what got me here. Yeah. So when you’re then going to a person and saying, Hey, I know your gut, or whatever you’ve been doing, or how have you been going about it has been working for the last 20 years. But I have some numbers that say you should do the opposite.

Eric Dodds 12:45
Yeah, who’s gonna hate you? Right? You know, when? Yeah, right. Yeah. Yeah. One other thing that I’ve noticed is that, like, a lot of really good executives. You know, if you break a business down into its most basic building blocks, there aren’t a lot of numbers that actually drive the business forward. And there really are only a couple that are mission critical, right? To move in the right direction from the executive standpoint. Now, there’s, of course, like a ton of data and a ton of stuff that is like ladders up to that. But I think that there can often be this, you know, everyone needs to be really data driven, meaning there needs to be this mass democratized access to easily drill like, you know, drill down reports and all that sort of stuff. When in reality, that person’s probably been successful, because they know which two numbers matter. Yeah. And they push aside literally everything, except for the stuff that moves those numbers in the right direction. Right.

Matthew Kelliher-Gibson 13:48
Yeah, I think also, it’s one of those if you are in a position where you know, you’re in a company and like, just to be honest, the VPS are probably only going to be on your side, if the numbers agree with what they want to do. Long term is to start going at people who are still early in their managerial career where they’re still forming these habits and what do they trust and work with them, they’re more likely to be open, and they’re more likely to, to work with you and see opportunities and ways that they can make better decisions with that. I mean, you know, it’s a little bit of like, there’s also probably a chance they’re going to move on to another company in three years, but it’s still a better approach that you’re going to have than trying to really convince that 65 year old CEO, you really got to trust my numbers right here.

John Wessel 14:41
Yeah, I think I think tying things to Financials is like the best way to be data driven and most companies because the the numbers that matter are like profit and loss like for me, like if you’re VP over something like it’s whatever your profit and loss is going to show up at the end of the month at the end of the quarter. Yep. So if you can say, Hey, these are drivers, the impact p&l, then like, that’s, I think, a conversation you can have. And get a VP on your side of like, oh, like, okay, cool. Like, yeah, yeah, we should work on this, we should track this. But what I think Matt’s referring to, yeah, I’ve sat in those meetings to where it’s like, Mark, like, it was picked on marketing, like marketing is not doing well, like, again, like this quarter. And they like just rotate through vanity metric after vanity metric of like, views, like switching up views and sessions on row as like, high row as like campaigns that like were like $100, and like, high views on campaigns with awful row as like this, like shuffle

Eric Dodds 15:46
belief. And totally, just say it, Matt, you shied away from the mic.

Matthew Kelliher-Gibson 15:51
Well, I will say it also matters when you catch them. Because I’ve literally sat in meetings where we were giving a very, you know, financial base, it had to do with the pricing thing. And the meeting started with the CEO saying, Well, you know, we’ve got to do whatever the data tells us, we showed them why raising prices was not going to be a good idea. And literally the decision was, well, but we put it in the budget at the beginning of the year. So if you don’t catch it at the right time, it’s like, yeah, but the budget says that we’re not gonna hit those numbers with. Yeah. But that’s what the budget is. I mean, we got to do it with a budget. Yeah, it hasn’t Barry, like, you know, I don’t make the rules. I just think to come up and write them down. Yeah,

John Wessel 16:31
I agree. Timing matters. Yeah,

Eric Dodds 16:34
yeah. Okay. Lightning round three. Are you ready? Go for it. Yeah. Okay. I’m going to read this phone. Actually, this is great, this is great. So I’m going to mention, this is Adam Lenning, who’s a Data Platform engineer at Ben labs. And I’m just going to read this. It’s kind of a short post, but it’s great. And it ends on my question for the cynical data guy. Ever heard of a tar pit idea? Basically, a tar pit idea is one that seems very appealing, and many people have tried to make it work. But ultimately, no one has really achieved product market fit. already. So good, huh. So as I’ve been thinking about it, I’m starting to believe that no code orchestration tools, and no COVID ETL tools may be tarpit ideas, tools, like Fivetran, air bite, gather, and literally a billion others all claim to handle moving data from A to B with low no code. And they work in 90% of use cases, the issue lies in the last 10%, which will almost always need a code solution, whether these tools may be useful. If we always need to add tools to get data into our warehouses. Are they really worth it? I think many people would argue yes, but what’s your take on cynical data guys?

Matthew Kelliher-Gibson 17:46
Or the set a tar pit idea? Yes. Yeah. I think yeah, then I think there’s a lot of those out there. I mean, you know, anything text to I feel like a lot of times it’s got that siren call for a lot of technology people. And when you actually sit there you go, who actually cares about this? Yeah. So that’s, yeah, there’s a lot of those out there. And they just get recycled over and over again. What get

Eric Dodds 18:15
break down the note? Are you skeptical of no code? I mean, you’ve built tons of data pipelines in your career, but you’ve also interacted with, you know, non technical users or semi technical users is the no code thing. A myth? Yeah.

Matthew Kelliher-Gibson 18:30
I mean, let’s be honest, when you’re like, well, we’ve got non technical users, and they can do. Do you really want non-technical users building this stuff? With a bunch of building blocks? I don’t think you actually want that most of the time. Eventually, you need someone with more expertise. I mean, you know, it has the feeling of basically saying, Hey, we’re going to admit build data, like McDonald’s we’re gonna just have a handbook and anyone can cook a burger that I just don’t think that works. Agreeable data

John Wessel 19:03
guy on the no code side, I really like it. Or, like embedded analysts, I can think back to a role I had where we had an analyst that sat on the floor with ops like was there probably at the company for like seven years? And like did so many really incredible stuff with I’d CO probably low code would be the right thing and by the end like he could kind of code did some really incredible operational stuff from like an IT and governance and quality Well, I mean, quality was decent, but it governance perspective, like scalability, I mean, awful, right? Like it just doesn’t work. But from a business knowledge capture to like something it people could take, and then like, go do it the right way, and scale and stuff. Like I think that’s a great use case. And what companies ended up doing is they let that get out of control. They hire a bunch of analysts and they all have their own things and like that’s where it becomes like a big problem

Matthew Kelliher-Gibson 19:57
As soon as you do No Code. When low code low code gives you some ability in there, no code is like, just trust us, it’s all going to work. Try

John Wessel 20:06
it. That’s a problem, which I would argue they’re actually very few no code solutions out there. Like, like Alteryx is a great example of like, it has a GUI. But that is not a no code solution. Like they’ve got all sorts of like little you know, but

Matthew Kelliher-Gibson 20:21
to go back to, but it comes down also to like your discipline with it, right. So as you said, if you have one person who’s really there, and they can do these things, and then he gets handed off, that’s great. The way this stuff typically ends up happening, and the way it gets sold to people, to a certain extent is, well, you can give this to everybody, you can get it everywhere right now, oh, and we’ll sell you the tool that will help orchestrate and collect all of these, which are all just out of control. And they’re just duplicates of the same four things with slight changes in them. And it just becomes this mess that someone that then they try to hand off the data or it like, you can just fix that.

John Wessel 20:58
I think specifically in the orchestration space, I think Fivetran does a really nice job of no code, do a really nice job, they have the ability to do like, lok was not even low code, like they have the ability to, like, do, you know, custom connectors on their platform? I think they’re really good examples of, of your, again, like, things break down at scale, mainly for them mainly at cost, right? Like, it’s just very expensive to run them at scale. But for small companies, they do a nice job of connectors that a lot of these small companies need. They’re all together, like, yeah, it can get a little expensive, but it really is a pretty much a no code plant.

Matthew Kelliher-Gibson 21:40
And I think it’s so kind of like knowing your level to like, it’s perfectly fine. If you’re a small company or a startup, and you just need these tools in order to get it. It’s when you insist on trying to use them as you get bigger and more complex. Or, you know, or the other one is, well, we bought it for one small part of the organization. I think that’s a big trap you can get into as well, we just need it right here. Right? But eventually, it’s gotta come outside, if it’s gonna if you know if it’s gotta be there, and a lot of enterprise stuff, like there’s a reason they’ve hired teams, for a lot of this, they have a lot of edge cases, it’s got to be, you know, fit to their exact specifications. And you’re just not gonna get that with like, no codes. Yep. Yeah.

Eric Dodds 22:23
I agree with Adams’ post that, like many topics, will. It’s very dependent on the context, but completely no code. I agree. I don’t think it’s realistic for this stuff. Right. But the one thing I would say is, if we always need to add tools to get our data into warehouses, are they really worth it? But the reality in the enterprise is you’re gonna have far more than two differences. It’s not like you can, I mean, that I think that is actually one of the fallacies. Love, like, you should have one single tool that handles I just don’t that I mean, is it possible? Yes. Is it reality? I don’t think so.

Matthew Kelliher-Gibson 23:06
Yeah, I think it’s less like, oh, we have to have one tool that does this. And it’s more the idea of like, people are going to want to use this stuff in inappropriate ways. And can you contain that, right? You’re like, well, we just used it for this recently.

Eric Dodds 23:21
I mean, so like,

Matthew Kelliher-Gibson 23:23
The biggest example of this to me is every like, Come Jupyter Notebook ish type thing where they’re like, Oh, well, but you’re not supposed to use this for production. Like really? Then why does it say schedule notebook? Like, oh, really good in this isolated situation, but we made it so that you can use it hogwash.

John Wessel 23:47
Like this is just for development. We’ll put no restrictions on it to use in production.

Eric Dodds 23:51
All right, well, I think we’re going to schedule this episode to go into production and end on a high note, cynical data guy. Thanks for joining us, and we’ll see you again in a couple of weeks. Great to be here. The Data Stack Show is brought to you by RudderStack. The warehouse native customer data platform RudderStack has purpose built to help data teams turn customer data into competitive advantage. Learn more@rudderstack.com