Episode 209:

Storytime with Cynical Data Guy: Data Projects, $50K Web Scraping Fails, and the Role of CDOs

October 2, 2024

This week on The Data Stack Show, it’s another edition of the Cynical Data Guy as Eric and John welcome back Matthew Kelliher-Gibson. The group shares personal anecdotes about their experiences with data projects in corporate settings. They discuss the challenges and successes of working with pricing data and web scraping, emphasizing the importance of understanding manual processes before implementing automation. Eric recounts a project where his team improved data accuracy using a neural network, while John highlights the benefits of manual data review. The episode balances cynical and optimistic perspectives, offering valuable insights into the technical, business, and human aspects of data work. Don’t miss this edition of the Cynical Data Guy.

Notes:

Highlights from this week’s conversation include:

  • Previewing the Next Cynical Data Guy Episode (0:13)
  • Story Time: Coolest Data Project You’ve Worked On (1:13)
  • Failed Web Scraping Project (3:40)
  • Building a Neural Net for Matching (5:22)
  • Rebuilding the Project Strategy (7:04)
  • Project Completion and Politics (9:35)
  • Agreeable Data Guy’s Pricing Story (11:00)
  • Balancing Advanced and Simple Solutions (14:15)
  • Insights from Pricing Team Meetings (16:19)
  • Building for Scale vs. Immediate Needs (18:29)
  • Open Source Data Formats (19:46)
  • Disaster Recovery Experiences (22:34)
  • Reflections on Chief Data Officers (25:01)
  • Cynicism in Data Projects (28:19)
  • Final Thoughts and Takeaways (30:20)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:06
Welcome to the Data Stack Show.

John Wessel 00:07
The Data Stack Show is a podcast where we talk about the technical, business and human challenges involved in data work.

Eric Dodds 00:13
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the data stack. Show, it is officially October, and so we’re going to start out Halloween month with a cynical data Guy episode, but a special edition cynical data Guy episode, which we’re calling Story Time With cynical data guy and actually agreeable data guy, yeah, I got some positive stories. Yeah. We have to counterbalance. We have to counterbalance. And then, of course, we’ll end with, actually just a single round of LinkedIn lightning round, and I’ve got a good one. So all right, so let’s start deep in the bowels of corporate America, which is usually where we like to start on these shows. Okay, here’s story time. Here’s the story time question. What’s the coolest data project you’ve ever worked on?

Matthew Kelliher-Gibson 01:18
Okay, so I’ll go first. So probably the coolest project I ever worked on also entered in me no longer being at that company.

Eric Dodds 01:30
So Well, this is like, this is like a therapy session where you’re understanding, you’re starting to understand some of the formative experiences that shaped the centers, exactly. So the really short

Matthew Kelliher-Gibson 01:42
background on this, without going into many details, is I was running a team that was kind of doing, like internal consulting, data science work, and then we also had, they brought in someone to be the VP of innovation, which, by the way, if you ever go to a company that has a department of innovation, run away, is that innovation? Cool lead, XYZ, yeah, and he had, early on, tried to recruit me into his group, but then I saw what he was telling the executives and his timelines and requirements for how we would have to build which were like, 10 years old. And I rebuffed them. He did not like that. So without going too much into the details we had about this company, one of the big things we did was we collected a lot of pricing data. Okay, it was all manually done at the time. Wow. So it was literally a team of like 12 people who would either call up suppliers or would go online and search for specific things. That is brutal. It was really bad. Caused a lot of problems. It made some of our claims a little questionable, that we told customers stuff like that. So there had been a project in the innovation team to turn all that into web scraping, yep. Okay,

John Wessel 02:55
you’ve said enough with web scraping.

Matthew Kelliher-Gibson 02:59
So, so they invade this big promise, which,

Eric Dodds 03:03
yeah, because going from manual to web scraping isn’t manual problems,

John Wessel 03:09
the manual with more complex Yes,

Matthew Kelliher-Gibson 03:11
Well, yeah. So there’s so many things that were wrong with this, but so he made a promise if he was going to put in, you know, it was some ridiculous number, like 100,000 prices into our database in like six months. Six months later, I got a phone call from my boss. Hey, can you help us? And it turns out that in that time they had thought they had claimed, oh, we’ve scraped 2 million prices they had matched and gotten into our internal database exactly five.

Eric Dodds 03:39
Oh, my, oh, that’s like, yes. And then

Matthew Kelliher-Gibson 03:44
like, and this wasn’t a project where it was like, Oh, they had a little bit, like, they had gotten this full group had gotten, like, over a million dollars to fund them, type thing. So that’s

Eric Dodds 03:55
one of those, you know, you that’s one of the I call these moments, Oh, yeah. Like, movies are based on, you know, like reality. Movies are based on reality, you know, like, it’s just hard to believe that. So

Matthew Kelliher-Gibson 04:09
We did some work, some really preliminary work to just get the number up, because there was some KPI that they were kind of under the gun to hit at that point. And then it became a, this project isn’t working over here. We want you and your team to take it over, right? So my, like, you know, half a million dollar team in total is taking it over from this, like, $7 million team. We got into the details of it, and it was one of these things where, like, they weren’t targeting prices, they were literally paying a company to scrape entire websites, like every price from the website and the person it kind of played it off is like, Look, I’ve done the hard part. I have a million prices now, all we have to do is match them to our database, which, if you’ve ever done this before, is like, like, Web Scraping is relatively simple, matching that. Is insanely hard, especially when you don’t have good names in your database. So we had gone through a couple that this was one of these where, like, every turn, they, like, kind of tried to cancel us and stop the project, or the guy who had had it before tried to convince the executive that our method would never work. So we had started with this pre GPT and all this stuff. We were going to build an unsupervised language model using, you know, like just neural nets and stuff like that. We then had to prove that our method would work, which ended up being, we had a we got a group, you know, we had a labeled set. So we built a neural net, a pretty simple one, not huge, that could match super human ability and matching on this stuff. A couple of turns later, we’re still trying to figure it out, and I’m negotiating with the web scraping company we have, because they were spending, I mean, they were literally spending like 50,001st site, first scrape, wow. Because everyone was a custom project, right? Jeez, so

Eric Dodds 06:01
you’re probably still getting great margins, yeah, I’m

06:04
sure, yeah, yeah.

Eric Dodds 06:05
I mean, we’re on the cynical data guy, yeah. So

Matthew Kelliher-Gibson 06:08
I was talking with the guy, and I had talked with other people, and it was one of those things where, like, our use case really isn’t what Web Scraping is for. You’re supposed to know the product, know the page, and we’re like, we want to discover more stuff that we know is out there, but no one’s found it yet, right? And so it was one of those, I’m in a meeting with this guy, and he’s trying to convince me to stay with them, because I’m getting ready to fire him. Yep. And he’s like, Well, you know what you could do is search for it and pull the top, like, five or 10 results. And it was one of those moments where you go that’s useless unless I had some way to, like, automatically match it, like a model that’s superhuman and matching this stuff. And it clicked at that point. And so we rebuilt, like, our whole thing around this idea. We also had some KPIs. It was one of those, like, you know, this was an OKR at the beginning of the year, so therefore it had to be done, even though we’ve got eight months of information that says this is really stupid and we shouldn’t do this. We still had eight one so we had convinced the team to basically, in the process, we also proved this team was super inefficient, because they claimed it took like four months to do all this research. We got them to do all of their redo, all of their online research, in like six weeks. And because we were like, we need the web page, we need a better description, blah, blah, blah. And that was all just going to feed into this thing. But it was also one that we realized we could use APIs for them. That when we wanted to say, hey, what? Because you could set up agents on websites basically, right, that we could say, Hey, we’ve got this new thing. We need to get pricing data on. Okay, what is it? Oh, it falls into X category. Oh, now we can hit these APIs on them right, and now we can have it then automatically use this. Because it’s like, computationally expensive to try to match it against everything in your database, sure, but it’s still faster than doing it by hand, right? And there were things we didn’t move from Windows machines to Linux machines because Windows doesn’t fork memory, and all sorts of stuff like this, which was a big deal because they were a Microsoft shop thrown through. But we got it to the point where we were on the cusp of basically, like we could go find stuff on demand, and it was more robust than web scraping, because by searching, yeah, products are going to drop off of these websites, but other things are going to come back on so we’re not running with broken links, because we’re always researching. Oh,

Eric Dodds 08:24
yeah, that’s a great point. Yeah, broken links.

John Wessel 08:28
When you say searching, are you actually using, like, the search engine of the Yeah, website that’s made for finding products? Yeah,

Matthew Kelliher-Gibson 08:35
It’s brilliant, yeah. And like, the scraping the broken links thing is gone. You have to worry about it, and on top of that was the thing of, every time we did it, if it got accepted, it would go into our database of, like, acceptable descriptions of these different products because they had slightly different names. Yeah, sure. We’re dealing with stuff with, like, you know, that had measurements to them, and we could kind of group them all there. And so it was the type of thing that, when you put it in place, it’s going to get better over time, right? And we were going to, and we were planning on, basically having it where, if it was in a certain threshold, it gets kicked over to one of the people to actually say, okay, is this a match and everything. So it should be robust. It should get better over time. It had all these great things to it. So I got the project completely off track so we hit every KPI by the end of that year, and then two months later, that VP got me pushed out. They eliminated my position, disbanded my team, and he took over the project, but he took it over two months too soon, so it wasn’t ready yet. It wasn’t oh man, timing is everything, yeah. So I got pushed out, which was an interesting soul searching experience to go through.

John Wessel 09:44
So why are you cynical? Yeah, please explain. I

Eric Dodds 09:47
I don’t understand. You know, you do have to give the guy credit, though, that guy was kind of innovative, yeah, just in a very nefarious way. Yeah?

Matthew Kelliher-Gibson 09:58
Like, what do they call it? The code? Or whatever the Epilog of that story is, because I ended up going to another place and learned some important lessons for myself, just on my own ego and things like that. So it made me better. The experience made me better. In the long run, that particular individual was unceremoniously fired almost one year later, okay with cause if the story I heard was correct, wow, yeah, yeah. And so it was, that’s probably one of the coolest things. We didn’t get it completely to production, but that was mostly because of politics pushing me out, and which

Eric Dodds 10:33
is probably true way more than any of us would like to believe.

Matthew Kelliher-Gibson 10:40
Yeah. And then when he took over that project, because all the people who are on my team who are still doing things in there, one of them got placed under him, they would send me updates, and it’d be like, yeah, he completely scraped everything you did, and he’s trying to do it the old way. That doesn’t work. Oh no. So I don’t think that project ever got off the ground again. So

John Wessel 10:59
I actually have my own pricing story. This wasn’t what I was going to say. So, like, a cool project.

Eric Dodds 11:04
I got two categories, okay? So you have the embittered, yeah, the cynical data guy. And so Story Time With agreeable data

John Wessel 11:13
guy, yeah. So I’ve got, so there’s two categories of cool projects. Number one category, which is probably the most important, is like, hey, like, we have this outsized business impact. Number two category is, like, this tech is really cool, so I got one of each. Okay, great. My, my first one is a pricing project, yeah. Ironically enough, we didn’t talk about this beforehand. But round one, so we’re there’s an E commerce company managing 10s of 1000s of SKUs, yep. And was like, Okay, we need a way to stay up with a competition. So a lot of it was distribution, which means you have multiple people selling the same thing, so your price is, like, the thing that you went on or lose on. It’s like, okay, so we go down that road and like, oh, look, there’s like, this company that does it, and this company that does it that can scrape the price and, like, seek it to your database. And like, okay, great. So we, like, go down the line with several of those companies, we trial with one, and then was like, All right, that didn’t work at all. We trial with another one, and like, see some early signs of working. But it was one of those, like, nebulous, like, we take in all of this signal from your page to like, you know, help guide the price. And we did it. I was like, okay, yeah, we, like, got some early results, like, rolled it out, like, far broader than what then was wise, and then really screwed up our pricing from that one, and then went with a more like, like, so that was more like signal based, more like a like web scraper based thing, yeah. And that only got, like, marginally better. So I was like, What are we gonna do? Like, this is a problem. So what we ended up doing was just very simple, of, like, hired an analyst for pricing, and then basically had three or four sources. Like Amazon was a source, like, we had a web scrape thing that was a sort and we had a couple other sources, and then we’d have, like, basically reviews of a couple like 1000 at a time. We kind of we had like queries and things that ran to, like, produce these data sets, and we made it very efficient, but an analyst would, like, scan through it and then decide, like, to set prices and like, it worked like, fairly well, like, to, like, I think we ended up increasing. There’s a couple other factors, but this was a big driver to increasing margin like, 11 or 12% Wow, over the course of a year or two. That’s crazy. Yeah, it was crazy. Some of it was a product mix thing, where we’re moving to more private labels, which helped. And some of it was just pricing, because it was, you know, because it was not systematically managed. And when you get to like, 20 and 30,000 plus SKUs, and you can have a huge impact, yeah, so that’s my cool business outcome, like, story cool on the tech side, which is just as fun working with a client right now,

Matthew Kelliher-Gibson 13:55
it’s not as fun for that analyst, no,

John Wessel 13:57
I mean, the poor analyst, it was somebody that I had their first job out of school, so they didn’t have a lot of likes. I don’t know what working is like. Maybe this is what all jobs are like. We were smart with the hire. Yeah, actually hired somebody with a math degree, which was interesting, and they weren’t there for like, a super long time. But

Eric Dodds 14:15
I think that is a good example there. Like, even just thinking about those two stories, that’s a really nice contrast well, to highlight the fact that sometimes you do need to build something pretty advanced, yeah, and sometimes you just need a really simple solution for some subset that’s going to have the highest impact, right? And a lot of people overlook that, the simple solution, yeah, can we just have a person that’s Yeah, at least at first, let’s just like, figure out how this works. Yep, if it’s brute force work that is hard to scale from a human capital standpoint, because it’s hard, but yeah, that was interesting. That was a good but

Matthew Kelliher-Gibson 14:54
at the very least, if you do that, even if it’s something you’re like, we’ll never be able to scale this. You will learn Intel totally. That’s great about the process, about what it is. Rather than just doing the blind, we’re gonna throw data in this pool of machine learning and stir it around and see what comes out. Yep, because that’s when you get like, Oh, it worked for a month.

Eric Dodds 15:13
Yeah, yeah. Well,

John Wessel 15:14
and I think it’s important too, like, there’s, there was a push. I mean, this is from even the start of my career around automation, which is a big thing, but I feel like now a lot of times projects get scoped around, like, hey, like, let’s do this thing, and nobody even knows what the manual process is, yeah. And that is, that is a tough place to start. If you want to start right into like, oh, we want this like, ml thing, it’s like, we wouldn’t even know how to do it in a spreadsheet.

Matthew Kelliher-Gibson 15:40
Yep. Yeah, yeah. How are you going to do that if you, if you’ve never, if you’ve never done a spreadsheet? And I think that’s also depending on the analyst, or, like, the data scientist, or whoever you have, there can be a little bit of where they want to jump right into that. They’re like, oh, let’s make a decision tree or a neural net. And it’s like, let’s, how about we actually go through what this process is, and what it looks like to try to do this, right? I mean, when I was I, you know, I ran a pricing team at an auto finance place, and one of the things we would do every month was we would actually sit in a room and you would look at loan app, you know, like, so you’d see the application, you’d see what the actual, you know, what they’re buying, like, the car is what they’re buying it for. And you had to use that to try to predict, like, we were, like, very deep subprime. So we’re predicting, how many payments have you made? The total that you Yeah, but it does help give you some intuition over like, okay, these are things that matter. These are things that don’t. And it does make a difference. Compared to we had people who were on, you know, more than they were the modeling side, and they would do things, and we would look at them and go, like, you can’t use that. Like, that variable you’re using is something that gets backfilled later. You can’t do that right? Or, like, the idea, you know, where we would say, I don’t you know what the things you’re looking at are not available at the time you’re making this loan decision, yeah, but they didn’t know that, because they’re not sitting in these meetings to see that. Yeah,

Eric Dodds 17:04
I was working on an old car, and I had a new windshield installed in it, and I don’t think they did it correctly. And I thought that I had a slight leak, but I wasn’t. It was hard to tell. So I thought, Okay, well, I’ll just find someone who will pull it back out and, like, you know, sort of reseal it or whatever, anyway. So I called it an old car, and so not a lot of people want to work on it, right? Is there? Like, yeah, you can’t get the warranty bubble, yeah. So you need to find someone who can do this. Anyways, I called this guy who’s, you know, local, and he had a very short conversation, because I explained, like, Hey, I don’t think this is right. Like, can you pull it out and seal it? And he’s like, are you getting water in your car? And I was like, not really, but I don’t think this is done correctly. And he, this is great. He, he, I love it when he’s like, when people say, like, son, you know, it’s like, I’m about to get some good advice. He’s like, son, if you don’t have a problem, don’t make a problem. And I was like, what do I like? Was quiet for a second. I was like, oh, yeah, you know what? That’s exactly, right? You know what it’s like? But I feel like that with tech, right where, yeah, if you don’t have it, like, you can invent a problem by throwing technology at it unnecessarily, right? When the simple solution, in many ways, can show you whether or not you actually have a problem, yeah. Or you can

Matthew Kelliher-Gibson 18:29
do the thing where you’re trying to, like, build it for scale, when you’re like, Yes, this is for an implementation of an ERP, right? So we need it to work for three months, yeah, yeah, totally. So I don’t need all your fault tolerances. I don’t need the pipeline, yeah, I just need you to take this spreadsheet and turn it into this table that we’re then gonna match against another table and be done with. Yeah. Okay, sorry, we interrupted another Okay. Cool Tech yeah.

John Wessel 18:56
Cool Tech segment, yeah. So one of the cool projects I’m working on right now. Client has almost all their data in snowflake, most of their data in snowflake, and going through like, they have, like, some fairly rigorous requirements around disaster recovery, one of which is, like, if you have data in a SaaS system, you can’t just, like, trust, whatever the SaaS systems. DR is, like, it needs to get out of that SaaS system, yeah, which is the policy is probably reasonable, but can, but, and, but most people would be like, I don’t know. Like, snowflakes probably have that covered. Yeah, right, yeah. Or they, or you just buy the enterprise plan you get, like, you know, the extra stuff, so I’m sure they do have it covered, but for their internal requirements, like, this is what they do. So, like, Okay, so we’re taking backups from snowflake to iceberg, pressing an s3 so the data is landing there and then. If you’ve been following the space, snowflake has a data catalog that sits on top of the iceberg, which, if you’re familiar with databases, the way this works is a traditional bid database. Com. Comes with storage tables, a schema, which is basically, which is a catalog, and then all the security and stuff wrapped around that, yep. So like, as you get into these, like, newer, open source formats, like, everything’s kind of broken apart. So you’ve got your liking for iceberg, you’ve got icebergs a table, but the files are actually stored underneath and parquet, usually, and then you got your table, iceberg table, and then the layer above that catalog layer. So we’re doing all this, which has been really cool. Snowflake has some neat support, where you can sync tables with the Polaris catalog and then basically access the data through Polaris from Spark or from Trino or similar. The thing we’re trying is the hosted version of Trino, which Starburst has support for. So in essence, like you got this, like, read, potentially, read, write, but at least for our case, a read path that goes from the, let’s say Starburst, Starburst, like, all the way into the data in the s3 bucket, either using the Polaris hosted catalog or, I think Starburst is the one that that supports you querying the tables just directly. So, like for DR, it’s kind of cool. B for data sharing. So say you’re on like snowflake, somebody else has some other tech, like Databricks, for some other technology, it’s kind of neat, and that’s an application we’re going to look into and create these catalogs? Well, sure, yeah, where somebody can just use Spark or some other dialect that they want to use to create the data. But, yeah, that’s been one that’s been really interesting. And I see, like, a future where people go to like, Oh, how about we just park the data here and govern who has access to it, instead of just moving it all. Yeah,

Eric Dodds 21:40
Okay, two interesting comments on that one. Andrew Lamb from influx talked about this. I was, I can’t remember Brooks, we can put it in the show notes, but this is several months ago. But he talked about this exact thing, where you are going. He was saying that you’re going to see this shift and all sorts of interesting technology proliferate around it because of the advantages of flexibility that cost everything right? Also trivia, Kostas, the Yeah, I believe, helped name Starbursts cloud product. Oh, Galaxy, yeah,

John Wessel 22:26
That’s cool.

Eric Dodds 22:27
I think that’s true. Costas, I know you’re listening, because I know you still listen to every episode, so we’ll have you back on. So one of the things that just made me

Matthew Kelliher-Gibson 22:35
think of because you’re talking about, like, kind of did recovery, or like, disaster recovery. So my dad worked in state government for about 30 years, and he did, like, I mean, so to give you some background, when my dad, I got his computer science degree, was punch cards. He lived through punch cards. He moved to mainframes by the end of his time, because they weren’t state governments weren’t going cloud. Right at the time, he was doing, you know, he was in charge of, like, provisioning VM servers, which was also databases and all that stuff. So they had to do disaster recovery every year and test it. In case you don’t know it, most states are a nightmare, I’m sure, because it all involves, if there’s a disaster, you basically dump everything into a third party site. So it’s not like we’re proactively doing it, because, I mean, this is, you know, it’s tax returns, social security numbers, all sorts of things, you know, unemployment benefits, stuff like that, yeah. So they have a whole process of where it’s, you know, at the time, at least, where it was, like, all right, you know, they’d have to, this is up in like Massachusetts, they would have to drive out to like Connecticut, it would be all right, we’re gonna go through this idea of some disasters hit and all this stuff has to basically be pushed to Connecticut and then rebuilt again.

John Wessel 23:49
Did they not like the backup to tape store navault and then drive the tape to Connecticut? Nope. Okay, so, because I’ve heard those stories, like, where people, yeah, even, like, more recently, people still like backing up to tape and then, like, driving to a vault. Yeah, that’s where their backup is. So no, I’ll wish the joke’s on us if something major happens, right? Like those people have their data, right?

Matthew Kelliher-Gibson 24:09
One other, one other fun fact I’ll say about this just to brag on my own dad here. So when he retired a couple years ago, they had hired two people to shadow him for a year. So two people are replacing him in this job. So each of them had about half of his job within three months. One of them quit because the stress was too much to do half of his job. Wow. And he literally would do the crossword puzzle in the middle of his work day.

Eric Dodds 24:37
That’s awesome. Okay, I think we had a great story time. Yeah, maybe we knew we did that for, you know, to kick off, you know, to kick off Halloween. But maybe we need to, maybe we need to do that more often. Okay, LinkedIn, hot take, I’m just gonna read. Since we are close to time here, I’m just going to read. An excerpt from the article that this LinkedIn post linked from the author will go unnamed, and I’m just going to dive in here, considering that as recently as five years ago, the Chief Data Officer role was hailed as a critical addition to the C suite. What went wrong? How about that? First, this issue goes beyond budget cuts and raises important questions about the definition of the CDO role, the management of data within organizations, the future of executive committees and the impact of generative ai ai, as former Chief Data Officers, ourselves, we aim to explore these topics through a series of articles with a goal of catalyzing a broader discussion. Given the significance of the shift, clearly, a key driver has been the financial pressure organizations face over the last 12 to 18 months, as executive teams have scrutinized budgets, frustration has mounted regarding the lack of impact of data initiatives on operating margins. This is concerning, given that companies have invested 10s, if not hundreds of millions into building data capabilities and acquiring platforms. As one retail Executive Mark remarked to Ryan, all we have to show for our efforts are prettier. Sorry, I couldn’t get that out without laughing. All we have to show for our efforts are prettier dashboards. While few Chief Data Officers have considered profitability one of their KPIs, they are certainly being challenged.

Matthew Kelliher-Gibson 26:31
Okay, time out. Do you consider profitability one of their KPIs? What did you think your job was? Hi, we’d like you to spend $20 million as fast as possible. Like, I’m sorry, and that’s really the attitude of a bunch of CDs. That position deserves to go away. Well, that’s

Eric Dodds 26:54
what Edward talked about recently on the show, right? Yeah. Like, PNL, yeah. He was just like, if you don’t own a P&L, it, you know, if you’re not, if you don’t own something that’s directly contributing to the bottom of the bottom line, I think his words were, you’re in the kiddie pool. Like your job, it’s a fake executive job, yeah, and then they, then

Matthew Kelliher-Gibson 27:14
you try to make these, you know, okay, well, we’re not contributing to profit or anything, but we’re slowing the growth of our cloud computing bills and stuff like that. It’s, I don’t know, like I was early on, because one of the, one of the kind of, like, Chief Data Officers I worked for was a guy who came from the sales and product side. It’s probably the most effective CDO I ever worked for. So I have it in my head, when I’m here, like, how am I contributing? How am I contributing? And so I’ve been in that situation where you’re sitting there and you’re looking around and you’re like, I don’t see where this is going towards the bottom line, getting nervous about this. Right answers I’m getting from the people above me are not making me feel good right now, yeah,

John Wessel 27:59
well, that’s a tough situation to be in, especially if you’re a couple, like, rungs down, who’s you can see that, and especially if you’re young, right? It’s like, I probably just don’t see it. Like, I don’t think this is working, I don’t think this makes sense, but like, I don’t really know. Like, maybe I just don’t have the full picture. And then, like, when you’ve been around long enough, you’re like, Oh yeah, no, that’s the thing. Like, yeah, you

Matthew Kelliher-Gibson 28:18
are, you were you came up much more humbly than I did. I

Eric Dodds 28:21
was in my first job, and I tried and I went, this all looks stupid. Why are we doing this? Which is why we call you the cynical guy. Yeah, I will say probably it’s easy to be cynical about these CDOs, but my guess would be a lot of them were brought in to do some major data projects. They got a ton of funding based on a number of factors, like low interest rates, easy money, a lot of margin and budget to, like, do cool things and take advantage of new technology.

John Wessel 28:51
So in essence, they were kind of like Innovation Officer

Matthew Kelliher-Gibson 28:55
coming full circle. Wow. But they, most likely the company, did not actually have a plan. It’s more like, hey, we have this pool of money over here because data, it’s the underpants. No problem, right? Okay, we’ve got $50 million for data. Question mark, big profits, right? Like, then they’re like, Oh, you’re gonna help fill in, yeah. But I will say, I think the other thing that hurts them in that situation is a lot of times in order to do those jobs effectively, you’re going to need someone who’s going to, like, drive a certain amount of change, sure, but the problem is, a lot of times you get in there and everyone says, we really want to change. We really want to do this. Okay, cool. You get in here and you’re like, Okay, we need to change X, Y or Z, or we need to not do these types of things. And then you get that, yeah, we really, I’m totally with you on changing, but don’t actually change anything. Yeah, and it’s tough. I mean, I almost feel like, yeah, CDOs should probably get like, six year like, guaranteed contracts the way that, like college football coaches do, yeah? Because at some point I’m going to push back on something, and you’re going to be like, Well, I don’t really want

Eric Dodds 29:56
to do that. That’s yeah. That’s a great point. Yeah. I. Yeah, that’s a great point. All right, so we’ve heard sad stories, happy stories, we’re cynical stories. That is true, that is true, although you did tell a great story about your dad. That was awesome. Yeah, it’s

John Wessel 30:14
A happy story.

Eric Dodds 30:15
We talked about disaster recovery, yeah? All Yeah, today’s work. All right. Happy Halloween. Early from the data stack show plenty more great episodes coming your way. Subscribe if you haven’t. So you get notified about new episodes, and we’ll catch you on the next one. Stay cynical. The data stack show is brought to you by Rudderstack, the warehouse native customer data platform. Rudderstack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.