This week on The Data Stack Show, Eric and John welcome Evan Wimpey, Director of Analytics Strategy at Elder Research. During the episode, Evan shares his diverse background, including his Marine Corps experience, and delves into the concept of synthetic controls in data science. Evan explains how synthetic controls create theoretical models to measure the impact of new competitors on existing businesses. The conversation also covers the importance of qualitative context in data analysis, challenges in communicating data insights, and Evan’s unique venture into data-related comedy, highlighting his book “Predictable Jokes.” Trust us, you won’t want to miss this episode!
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:06
Welcome to the data stack show.
John Wessel 00:07
The data stack show is a podcast where we talk about the technical, business and human challenges involved in data
Eric Dodds 00:13
work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here today with Evan wimpy from Elder research, Evan, welcome to the data stack show. We are super excited to chat with you. Awesome. Yeah, thanks
Evan Wimpey 00:39
guys. Happy to be on.
Eric Dodds 00:40
Okay, give us a quick background,
Evan Wimpey 00:42
sure. So yeah, you mentioned Elder research. It’s where I’m director of analytics. Been there for about five years doing analytics consulting work as a data scientist, and now tell another data scientist what to do, and whenever I get a chance, I’m trying to make them laugh as well as do a little data comedy. Sometimes they’re at Elder research, and sometimes for anybody in the data space who could use a little bit of laugh,
John Wessel 01:04
That’s awesome. Evan, so one of the topics we spoke on before we started the show here was synthetic controls, kind of a data science topic, so I’m excited to dig in more on that. Anything else you want to cover on the show? Yeah,
Evan Wimpey 01:18
I won’t be able to stop myself from telling a couple of jokes, but yeah, synthetic controls are great, and diving into some of the technical stuff, great. Well, let’s
John Wessel 01:27
do it. Yeah, looking forward to it.
Eric Dodds 01:30
Okay, Evan, I think you are the only professional data comedian that I’ve met, and so, which is a special thing for the show here. We’re very excited, but I’m going to give the option, do you want to tell a couple jokes now, or do you want to make people wait on the edge of their seats until the end of the show?
Evan Wimpey 01:49
I think we want to build their hopes up all the way until perfect. Definitely. Yeah, let them down in one giant crash there. Okay, great, yeah.
Eric Dodds 01:59
So let’s start earlier in sorting your data history, so you have two master’s degrees. Is this correct?
Evan Wimpey 02:05
It is, yeah, you have to make a decision at some point. And
Eric Dodds 02:09
yeah, well, have you decided?
Evan Wimpey 02:11
Have you no, I need I’m going for another one sometime
Eric Dodds 02:15
soon. Yeah,
John Wessel 02:16
round triple crown.
Evan Wimpey 02:19
The audience will learn quickly how little I know, and they’ll think, Wow, you can get two master’s degrees, without doing anything. What’s
Eric Dodds 02:25
wrong with our education system? What are the degrees in? Yeah,
Evan Wimpey 02:29
The first master’s is in Applied Economics, which I think is pretty close to the data and analytics space. So I am from East Carolina. I got that one a dozen years or so ago, just before I got into the Marine Corps. So I got into the Marine Corps as a finance officer. So don’t picture kicking down doors, but more like sitting behind a computer kicking down spreadsheets. Yeah, exactly, yeah. Though I did not have an M 16, I was issued an Excel spreadsheet. That’s
Eric Dodds 03:01
basically the same, yeah, yeah, disassemble and clean your weapon please.
Evan Wimpey 03:07
Yeah. Data, yeah. Data, cleaning, reboot, yeah, that was all my boot camp. So, yeah, I, I would say that I still use that degree some. You know, there’s econometrics studying. You learn a lot about linear regression and all the assumptions behind it, you learn a lot about how people make decisions. But I got in the Marine Corps, was working in finance for about 10 minutes, and the place where I was working in finance, I saw opportunities for using analytics. My wife was actually, at the time, in a master’s program in analytics. And so I thought, Wow, you could, we could be doing this at the bank. We could be doing this at the bank, and there wasn’t a whole lot of appetite for it. I said, You know what? I want to go somewhere where there is an appetite for it, but I need to learn a little more. Plus, I couldn’t stand my wife knowing so much more than me, so I actually followed her to the same Master’s in analytics program at NC State. And so that was the second masters.
Eric Dodds 04:01
And how has, how does you and your wife having both, having a Masters in analytics, affect your decision making process together? That’s the big question I have here. Yeah,
Evan Wimpey 04:14
That’s a great question when you think about the different weights that you give to different input variables. Yeah, her weight is one, mine is zero. So
Eric Dodds 04:24
okay, so it’s actually pretty easy from a modeling pretty binary, yeah,
Evan Wimpey 04:27
very simple model, yeah, very, very simple model, parsimonious is what we are.
Eric Dodds 04:33
That’s great. Okay, so did you go to work for Elder research after that degree?
Evan Wimpey 04:38
I did. Yeah, so Elder research actually recruits pretty heavily out of that program at NC State. My boss is a graduate. The CEO of the company is a graduate. And so when I was there, I clarified real quick. Elder research is like an analytics consulting firm. They don’t do Elder research. They’re just founded by a guy named John Elder. Yeah. Haven’t heard John Elder. As soon as this episode’s over, go and read John Elder and read his books and listen to his talks. He’s great. He really motivated me when he came and spoke to our graduating class. So yeah, I was really excited to get a foot in the door with the Elders, and I’ve been there for a little over five years since then.
Eric Dodds 05:17
And what is so analytics is super broad. What is Elder? Do you have specialties as an Elder? Or
Evan Wimpey 05:24
yeah, this turns into a long answer. Analytics is super broad. Elder research is also very broad. So when I talk to consulting friends like management consulting, I say we’re very specific, we’re very focused, but when I talk to data and analytics people. We’re really generalists within data and analytics, yeah. So if there’s a specialty, we have a pretty big footprint with us, government, Department of Defense, so that that turns into a lot of, if you can believe it, fraud detection, threat detection, network analysis. So I think that’s where we probably got the biggest star and the biggest growth, 25 ish or so years ago. I don’t work with government clients. They have higher standards, so I
Eric Dodds 06:15
work with is that quality or security risk,
Evan Wimpey 06:19
That’s security. Look, that was a long time ago. Okay, yes, I work with commercial clients, and there it’s very broad on the types of problems we tackle. So you think of any big corporation, sometimes small startup, more often bigger fortune, 1000 companies that have marketing teams that need to know who to market to and how to market to them. They have operations and logistics that are building things, moving them around. So they have a lot of data, and they need to make a lot of decisions, and so we sort of help with that
Eric Dodds 06:46
cool Well, John, you were really I think we should just go for the jugular and bite into the juiciest topic. And I cannot wait for this topic, but I think you’re more excited than me, so
John Wessel 06:58
the data jokes. Oh, well,
Eric Dodds 07:01
we know we’re waiting to the end, but of course, that’s no,
John Wessel 07:04
yeah, let’s I want to dig into the synthetic controls. We talked about this a little bit before the show, but I was just sharing with Evan, was on vacation recently, got off on an exit middle of nowhere. There’s a gas station under construction, and then there’s one that’s looks like it’s been there for 20 years, right next to each other. Right next, yeah, right next to each other. Like, Les, like, same accident, my right turn to get into, yeah, yeah, right same exits. They’re on the same side of the road. And let’s say there’s, like, you know, there’s less than a, less than, like, a quarter mile between them, so pretty much next to each other. So, so you go and then, like, the gas station, it’s doing great. Like, the cars are full, the pumps are full. People are going in and out, like, it’s doing great, but you walk into it like, it’s kind of dirty, you know, you walk in, it’s kind of got that. This has been around for 20 years, like, not very kept up. The restrooms are gross, like, that type of thing, right? And then by driving back out, pay more attention to the new one. So it’s a quick trip Q tee, which is popular in this area, and they kind of have this business model. It’s, like, really, always really fast, like, it would keep it clean. They’ve got, like, so clean, yeah, unreal. They’ve got, like, kitchen, you know, they like, full food service, like, whole deal, just a, kind of a different experience. So I’m thinking, in my mind, from a data background, sure that General Manager, or whoever owns that that particular location, like, knows that it’s coming. They know that a the other place has the strategic location. It’s just slightly closer right to the interstate exit, yeah, but bigger, they know, like, they have that gut feeling of like, man, things are gonna get things are gonna get bad for us. But there’s no they have no way of quantifying that. And let’s say that, like that General Manager is like, Man, I wanted 250,000 $500,000 worth of improvements to try to compete. Or maybe we should go ahead and shut this place down instead of letting it, you know, bleed out for a year or two. But there’s no like, quantity. There’s no way to quantify that. I highly doubt either, like anybody there is trying to quantify it. So there’s like, the background we were talking about synthetic controls, and I’d love to here, pass it to you, like, how would you handle that situation? Or, yeah, a similar situation.
Evan Wimpey 09:05
Yeah, I think that’s great. And I John, I sort of share that with you. Just think about things in terms of data and analytics, and what it would mean for data and analytics out in the real world. So I loved it. That’s what came to your mind in this the dingy, old, yeah, inside, look
John Wessel 09:21
like the bars over the window. Yeah, you’re
Eric Dodds 09:23
taking your kid in. You’re like, my wife is not gonna like it. If I told her how dirty this is when I’m taking my kid to the bathroom, you’re like, how could you quantify the potential failure here? We
John Wessel 09:33
can walk in and you walk in, you’re like, don’t touch anything. Yeah,
Evan Wimpey 09:36
totally, yeah, yeah. The young kids don’t know. Don’t touch that. Don’t touch Nope. We got hands and feathers in the car. Don’t use that exactly. Yeah, right, yeah. So I’ll even take it one step forward before I set up synthetic controls. But let’s say the Quick Trip opens, and then you’ve got six months worth of data in that new gas station. And even then, it’s really hard to quantify what happened, because you. Know, but almost certainly there’s going to be an impact of the Quick Trip opening. May, in this case, it’s like this sounds like a very extreme scenario where, okay, sales are going to plummet because everybody’s going to go to the nice Lane quick trip. But what if there were already five gas stations there, and now they’re opening? It’s really hard to measure even after the fact what happened. And so this is where synthetic controls are, where we’ve used this with a few clients at Elder research, and it’s been around for several years, but I’ve only heard of it for about a year. I’m pretty slow, but man, it’s been super powerful, and we’ve loved it. So I’ll try to sort of set it up where, in the context of this gas station, where, if it’s just a single gas station, there’s not a lot you can do, but let’s say it’s a franchise that has a few 100 locations. Well, ideally, when you want to measure the impact of something, you’ve got a test and a control, but you can’t do that. In this case, you can’t have a test scenario where say, Okay, open the quick trip. Now go back in time and don’t open the quick trip, and then we can measure what the impact was. But what you can do is try to synthesize that control you’re going to have the test scenario, and if you’ve got 100 other maybe not even 100 if you’ve got 10,000 other gas stations, then you can find some combination of those gas stations that look similar, have similar demographic, have similar sales numbers, same, same location, or whatever, similar location. And you can say, you know, whatever these stores A, B, C and D, they they map, maybe with infer weights, maybe with non linearities, but they can predict pretty well the store E. So they can tell us, this is what if we can predict store E based on these other stores, and these other stores are not going to have a quick trip open right next door at the same time. So
John Wessel 11:44
you’re synthesizing a new theoretical store from a combination of other stores that would closely map this one that’s about to be impacted
Evan Wimpey 11:53
exactly spot on. And then you’ve, as long as you’ve got some historical data, you can go and validate those stores to say, Hey, does it actually work? You know, train it and then validate it on out of sample before the test event. So you can say, hey, it does a pretty good job. It does a very close job of mapping what this gas station in question is
Eric Dodds 12:13
one one question quickly, so sure, how intensive is the process of building the control group actually synthesizing that, right? Especially so you mentioned, you know, if it’s 100 stores, and, you know, we’ll go back to the kicking down spreadsheets thing, if it’s 100 stores, and I have them listed, and there’s, you know, maybe 50 columns of data in an Excel sheet, right? You could sit down in an afternoon and do a human synthesis of, like, okay, I can sort of filter down to the list. I think that’s good, right? If you have 10,000 stores, what kind of process are you using to try to build the synthesis control?
Evan Wimpey 12:54
Yeah, that’s a great question. And I think the biggest risk in this is, if you’ve got 10,000 stores, you almost certainly can fit training data perfectly on that 10,000 and first stores, you can find some combination that makes it perfect. And so you’ve got to be careful. And this where I think there’s some technical aspects to this, where you’ve got a holdout set that’s still in the past, but there’s some human element to this as well, where you’re not going to choose a store in New York for to map to the store to this gas station in South Carolina, where the roads are very different, the competitors are very different, even if the sales numbers are perfect and you can map the sales Exactly, and that’s Ultimately what you care about. There’s some qualitative sort of feature engineering that’s going to go into, okay, the market conditions are going to be so different in this location or this type of store than it is for this store that we care about. But ultimately it turns into basically a feature engineering where you’re trying to model this new store, and you’ve got 10,000 features that are, yeah, all your old stores, or all of the other stores.
Eric Dodds 14:04
And as you go through this process, sorry, I want to keep going, but this is just so interesting. You like the source selection. How much time do you spend studying the industry, studying those sorts of contextual aspects, like you said, right? Okay, well, if we just write a model to go find the perfect fit, but it’s not gonna, it’s gonna lack the context that would actually make the synthetic control that would imbue it with its deepest levels of meaning, right? So are you spending a lot of time studying because you haven’t. I’m assuming you haven’t. You know, I managed many gas stations. I’ve
14:40
been too long.
Eric Dodds 14:43
So is there a lot, like, I’m just from an Elder perspective, like you’re digging in, studying the industry, like all that stuff.
Evan Wimpey 14:50
Yeah, this is a great question, and I think it’s at the crux of, sort of, what differentiates a consulting as a service firm. Like Elder versus a SaaS or a product firm that is going to come in and say, Hey, here’s a solution that you can plug in and do this versus we’re going to come in and we’re going to charge you money, but we’re going to charge you time too, because we’re going to ask a lot of questions. We’re going to ask, what makes this store so we’re going to try to talk to store operators. We’re certainly going to talk to the people who are forecasting the sales or the impact or the cases. So, you know, a lot of that is a lot of that you can do, almost like in an academic setting, where you’re doing research to find what you can about managing a gas station, but much more of that is John, to your point, if you talked to the manager of that gas station, he or she knows that the Quick Trip is coming, and they know what is about to happen, and they talk to their customers all the time, and the context that they have is what you want for that qualitative context that you need when you’re building that tool. And depending
John Wessel 15:56
in the company and industry, they may even talk to other operators and have a good idea of like, oh, like, yeah, I typically trend with like, this, other stores this, you know, yeah, some of them even have an idea with stuff like that.
Evan Wimpey 16:09
Yeah, exactly, yeah. Very spot on. Okay,
Eric Dodds 16:12
so I derailed us for a minute there. Okay, so we have 10,000 stores. We’ve dug in from a qualitative standpoint, we have a model that takes that context. We have a good fit, and so now we have our synthetic control. So where do we go from there? Yeah,
Evan Wimpey 16:29
so now we’re able. Now it doesn’t necessarily answer the question of what happens when a quick trip opens right next door. If we’ve never observed that in the past, then sure, it’s going to be hard, but it lets us measure that in the future, there’s almost, you know, this, gas stations aren’t a new industry. New ones have opened up next to old, grungy ones all the time. So probably we’ve got something close to that which has happened in the past, and previously. There’s no counterfactual. So we can’t measure what the impact was. We can say, Oh, look, sales kept going up after that, yeah, but maybe they would have gone up way more if it was a quick trip. So now we’ve got this in the past. We can say when a gas station that meets some sort of objective, you know, some quantitative criteria, like new build in the last six months is, you know, a top 10 franchise for gas stations. And you know, whatever the criteria is, here’s what we have seen as an impact in the past. And that gives you, hopefully with some uncertainty bounds, which this method is pretty good, because as long as your model has some prediction interval or confidence window, then you know, you can be pretty confident. Hey, it’s going to impact sales directionally. And here’s some sort of measured interval of what we think the impact will be on sales, or number of customers, or whatever it is. Yeah, I don’t want to get too specific on the gas station example so the Elders got a pretty big footprint in consumer packaged goods. And the hospitality space, where there are a lot of franchises, there are a lot of products that have competing products, and so that makes a great use case for synthetic controls, because out of all the 10s of 1000s of products that some manufacturer makes and puts on a shelf at Walmart, you can see what happens when a Walmart shuts or when they expand a parking lot, or when they put a feature promotion at the end of an aisle. So there’s all these events that take place, and synthetic controls give you a way to sort of retroactively measure what the impact of those events were.
Eric Dodds 18:32
Yeah, when you start one of these projects, do your clients generally have a really good idea of the question that they want to answer, or is it more general? Like, is it more general, along the lines of, we want to understand the risk factors for this product line. Is it more general, or is it more specific? Yeah,
Evan Wimpey 18:56
it sort of spans the gamut. I’m trying to debate if I have a favorite and maybe I like the mix, and that’s why I like being in the consulting industry, yeah, but, you know, we’ve got some clients that have good data, and they’ve got a prioritized backlog of analytics projects, and they run agile, and they work with their it, and they deploy great stuff, and they know, hey, we want to measure The impact of new store openings on our stores. And maybe they’ve never heard of synthetic controls, but they know exactly what they’re trying to measure, and they’re asking us to do it nicely. That’s more rare than
John Wessel 19:32
I would suspect that. Yeah, that’s what I was gonna say, like and who and how often does that happen?
Evan Wimpey 19:39
Very small sample size. So you know, not meaning that this general is everywhere. We work with some pretty analytically mature clients, which would surprise me a little bit, because you think they’re analytically mature. They’ve got their own data scientists and engineers to do that, but having an outside perspective come in Can, can often be helpful, yeah, but oftentimes it’s more. Hey, the general manager at this gas station has been complaining that this Q tee is about to put them out of business, and we don’t know what to do. We don’t know what’s going to happen. Can you help us figure out what’s going to happen? And then that usually doesn’t start with us coming and saying, yes, we’ll build synthetic controls for you, and we’ll quantify with uncertainty estimates. So no, okay, well, what are you trying to do? What are your goals? Sure, sure, yeah. And it becomes a much more consultative problem than right? So I
Eric Dodds 20:30
was actually going to ask a question to both of you as consultants. And this is like, yeah, this is I’m genuinely interested. So let’s take a franchise, for example. We could, we’ll just keep it the gas station thing, right? But let’s say it’s a big, you know, national franchise of gas stations. And so, of course, they have a lot of analytical horsepower at the company, because they need to understand the real estate market. They need to understand, you know, fluctuations in the cost of oil, food, operating expenses, all. I mean, there’s huge, you know, the firepower is huge, right? And so you would think, you know, okay, wow, we actually do have a ton of analytical horsepower, right? But there’s, in every company, there’s a point at which an internal resource specializing in something really specific, it means that you’re not working on the actual product anymore, right? So now your analytics team is, you know, sort of embarking on something that is helpful to the business, but is, you know, sort of divergent from the core work that’s actually, you know, sort of driving it day to day, necessarily, right? To answer bigger questions. So how do you think about when to make that decision? And as you think about your clients, right, like, what should you keep in house, and what are the situations that you need to outsource?
John Wessel 21:49
So I think the most interesting thing with large clients say, I’ve got 500 horsepower, and that could be 500 people, or some other unit of whatever, but 500 horsepower, and it’s all allocated, some of it specifically to, like, like, we’ve talked before with somebody that was doing forecasting for a call center for benefits for, I think it’s for Amazon. Oh, right, very, very specific, like, in Amazon’s case, so, but it’s all deployed, some of us deployed specifically to marketing or finance, or maybe even a subset of HR or whatever. And then there’s some that may be under more generic corporate roles that do, like, maybe they do data engineering, or maybe they do more genetically they do reporting to roll up to investors or the board, like more generic things. Yeah,
Eric Dodds 22:34
executive but, yeah, right.
John Wessel 22:36
But if you had a project that kind of spans between those groups. A, there’s gonna be gaps in knowledge between the groups, even if they have the right skill sets. And B, as far as redeploying horsepower, like most people do not have flexible resources, even if they have the right resources. So that, that’s what it’s like, well, you know, it’s a big company, like, why can’t they just do it? It’s like, well, this person’s practically like, A, they’re just focused on this. B, they need to keep doing their job that they were hired to do. And C, they don’t have the broader context of all these moving pieces. So I think even if you technically have the horsepower, redeploying it is really challenging. So I don’t know if you’ve had that same experience. Evan, I think that’s
Evan Wimpey 23:18
a great point. I was going to mention something else, and I will, but I’ll touch on something like, I think that’s a great point, and it brings an example to mine, which happens all the time, but was really amplified during covid. So work with a bunch of like consumer goods manufacturing groups, and they’ve got marketing analytics folks, and they’ve got operations analytics folks. And if you went shopping during the pandemic, you might have noticed the shelves weren’t always full of everything, and so marketing analytics folks running specials and promotions on goods that you can’t even keep on the shelf is an absolute waste. It’s just a marketing waste, right? And so you’ve got your operations folks that are trying to supply the right stuff at the right stuff at the right time, and you’ve got your marketing folks that are trying to push the right products to the right people, and there’s not sort of that cohesion in between them that don’t market the stuff that we can’t keep on the shelves right now, or start to market the stuff that is taking up too much inventory that was not moving fast enough, yep. And so, like, that’s a pretty specific example, but where these analytics folks and these analytics folks are doing great, but the outside perspective, it becomes easier to connect those dots, yeah,
John Wessel 24:29
because they’re each optimizing for their thing. Like, I’m trying to sell more in my category. Like, yeah, I like, that’s how you gold me. Like, that’s where my incentives are. Of course, I’m going to keep trying that, when in actuality, like you said, you may have not enough inventory in one category, too much in another and, yeah, so that makes a lot of sense.
Evan Wimpey 24:47
I would say one. The first thing that I was going to mention that’s not as good an answer is, I think often times, you know this comes with, I work the John Elder founded the. This company about 30 years ago, and he’s he’s a well regarded name in the space, and a lot of times, just like the change management aspect, just driving some project can be really helpful to bring in an outside an analytics team that is peer to peer with a sales team that can’t get traction and can’t get a Sales Director manager to listen and do this, but hey, we brought in these outside experts that specialize in this thing, and they can do this. And now a sales director, not always, but maybe more inclined to work with that or listen or institute some change. Yeah,
Eric Dodds 25:39
yeah. That makes sense, yeah, makes total sense. Yeah, that’s always, it’s always an interesting question. You know, build versus buy? Um, changing gears a little bit. One of the other topics that I’m really interested in that we discussed briefly before the show was understanding model performance, right? So, you know, it’s for, you know, it’s easy for people to think about, like, Ooh, there’s a data scientist, right? So they’re just gonna throw this model, if you don’t know a lot about the space, right? Like, you can throw this model at something and, wow, it does all this crazy stuff. And, like, we get something at the end, but there’s this huge element of understanding whether what he was doing is actually working the way that you want it to. And, yeah, I mean, we don’t even need to get into the question of AI, right, because it sort of highlights the insanity of what’s going on. But how do you deal with that as at Elder and how do you think about that?
Evan Wimpey 26:36
Yeah, I would say that, you know, you asked earlier what our specialty is, and I sort of punted and said, Well, we’re more generalist. I would say we’ve got, that’s where we’ve built most of our credentials is on validating models, validating findings. And you know, if anything, it’s tough to talk to clients or prospective clients, because the flashy stuff sells, you know, we can forecast this. We can tell you exactly what this is going to be, where what we what we at Elder, what Elder research likes to do is come in and say, Well, this is how well we can forecast, and this is how much error you should expect, and these are the places where it’s going to generalize well and where it’s not going to and so a lot of the work that we end up doing is actually model validation for work that people have inherent teams that have inherited or have built internally. And, you know, sometimes that starts out of, we just want a stamp of approval, somebody just sanity checks this work, makes sure it’s good. But often that turns into, you know, you know, you’re reporting a general performance, but you know, it’s performing well in certain subsets and poorly in certain subsets. Or you’ve not validated it well out of sample, and it’s not going to generalize because you’ve got a leak from the future. Or, you know, conditions have changed at this point, and you know, as of this time period, it’s not going to generalize well. So I think that is becoming a more important thing as there are more and more tools and vendors that are out there that you can plug in modeling, and some of it’s super powerful, but it’s really tempting to use a lot of really powerful tools and then not know how good they’re actually performing. And so we’re trying to keep folks grounded in being able to measure how good a model actually performs. So
John Wessel 28:27
I think this is a huge issue in the customer data marketing tech space, a huge issue because almost all of the customer data tools like marketing focused customer data tooling, they’re all introducing predictive models, like on churn, on predicted next order, things like that. And I have yet to see one that provides any sort of metrics around, like, with what level of confidence do we think this thing? You know, it’ll just give it it just spits it out. It’s now the persona is marketing people, so, from So, no respect,
Eric Dodds 29:03
former marketer, because, yes, because we are easy prey.
John Wessel 29:09
I mean, that’s, I’m just saying that like the person is marketing people, and they’re like, they’re not, they wouldn’t know what to do with like person, yeah, but the numbers, like, in the sense, like, I understand, yeah, benchmarks, Yeah, or like, model like, yeah. Like, our values, like, that’s not like, what are they gonna do with that? Okay? So from a product’s perspective, I understand why that information would not be shown. But as more data teams, like, get involved in this space, where you get more and more complex, and you end up having a warehouse more at the center of your data stack, and then you’re tying in all these marketing tools. I think it’s becoming more relevant. And the data teams are asking, Oh, so you’re sending all these emails or messages around, like, churn, like, where did that come from? Oh, it came from this marketing tool. Okay, great. Like, how do we know that’s even close to being right? And the answer is, like, we don’t. And now. I mean, you want to trust your vendors, right? You want to trust that somebody at one of those marketing companies validated the thing. But you don’t really know. And there’s at least a few of the tools where some people in the data science community have kind of come out and be like, Hey, we’ve done this research and these numbers are not very good. Have
Eric Dodds 30:16
have you ever, did you ever, implemented any of those in a past life where you, like, turned something on that was predictive, but you had no idea what was going on, yeah? Oh,
John Wessel 30:25
for sure. So we, I mean, especially on the like, email was the easiest one, right? Like, I have my theory of email is, is, it’s about touch points. Like, you get real, you can get really into AB testing everything. And a lot of email is, like, the more touch points, like, the more it likes, converts, yeah, it doesn’t really matter what email says. So as a way of like, yeah, like, let’s do a campaign around, like, prediction of the next purchase. So we did that, and probably gained some, like, incremental benefit from it. So from that standpoint, it’s like, great. Like, who really cares? Yeah? But once you, like, you know, get past that, like, initial stage, then you’re trying to refine things and we don’t want and then we see, like, maybe we’re seeing higher churn because, you know, it’s 2020, and your inbox is even more full of email marketing than it’s ever been in your life, right? Right? It’s like, Okay, it’s time to, like, we’re seeing this high turn. We need to, like, cut back on these things. Like, how do we, like, make this a lot more precise. And then it’s like, well, what does this model do? Like, is it accurate? Like, I
Eric Dodds 31:20
I don’t know, yeah. So yeah, I’m
Evan Wimpey 31:23
gonna, I’m gonna mention one, one thing a little serious and one thing a little silly here. But, like, even a model that does really well, that makes point predictions is so much more limited in what it can do than a model that gives some type of prediction interval or window. And we’ve done forecasting work where they’ve already got a model that’s in place that doesn’t natively have any type of prediction intervals. And if you think, like, if you’re stocking, if you’re in charge of a restaurant and making sure there are enough french fries at the restaurant, and your restaurant predicts, you’re going to sell 100 cases this week, and the other restaurant predicts 100 well, you treat those the same, but if one says, you know, the 90% confidence interval is between 90 and 110 versus it’s between zero and 1000 Well that really changes how you want to manage your inventory there. So I think measuring uncertainty and reporting on uncertainty is not a language. It’s like when you start talking about distributional estimates or distributional predictions, like people’s eyes glaze over. But if you can frame it the right way , you need to understand that the model doesn’t know what it’s talking about here, but it knows what it’s talking about here.
John Wessel 32:38
But I think that’s really useful, though, because, especially in that use case, if I told you, in like, common English, hey, you need between 95 and 102 cases of fries. Like, that’s super easy, and I can make a decision, right? Whereas, like, it’s not as clear, like, you need 100 cases of fries with 89% confidence, right? Like, it’d be much better, like we can get to like we’re almost 100% confident it is in this range. And I think even that, like phrasing, is way more helpful to people.
Evan Wimpey 33:09
I also know what we’re talking about. If we’re bashing marketers. This is
Eric Dodds 33:14
marketers, and I actually like it even more now that I’m not marketing even anymore, from marketing to
John Wessel 33:22
product, yeah, recently moved to product. So even more fun, good promotion
Evan Wimpey 33:25
there.
John Wessel 33:26
He can wash his hands. Yeah,
Evan Wimpey 33:28
This is why I’m hesitant to tell it, because it’s not even my story. I’m just borrowing it from one of my colleagues at Elder research, but working with a client who had segmented their this has been several years ago, but had segmented their customers into, they call them cubes, but it’s like, hey, this, you know, 18 to 24 upper middle class suburban really likes this product. And this, you know, retired 65 plus widower really liked this type of product. And we’re trying to convey, like, you’ve not tested that at all, that could just be by chance. You know, if you break it up small enough, you’re gonna find random things. And they argued about it, and they didn’t appreciate it. They said, No, look, the data shows that this is what it is. And they had, this was pre covid, so it was in person, and their floor is this black and white, like tiled, checkered floor. And so the presenter is very senior, so he could get away with this, picked up the candy bowl that was on the table and just threw them all on the floor, okay, and went and looked and said, Oh, look, the snickers all landed in the black tiles. And oh, the Skittles always land in the white tiles. So we should sell Skittles to the
Eric Dodds 34:38
white that is unreal.
Evan Wimpey 34:40
I mean, it hits home pretty hard. It’s like, well, yeah, they won’t. That’s just by chance. He said, Yes, exactly. It’s just by chance. Yeah,
Eric Dodds 34:48
wow. That is an amazing story. Yeah,
Evan Wimpey 34:52
it got the point across. We were not invited back to that client anymore, but I think they’re doing
Eric Dodds 34:59
but, yeah. But you talk, somebody learned something that day. Yeah. So here’s a question for both of you. So one of the things that’s really interesting about this is setting our expectations around. I mean technology generally, right, but if we think about sort of data science as a discipline, and this modeling where you’re getting into a space that’s predictive, right, which is way different than, you know, sort of historical look back, you know, analytics, right? This is my time series, and I’m just sort of understanding what happens, right? I mean, actually amazing how often that is done properly, but you’re right. And I think that even reinforces my point, which is, how do you think about setting expectations around technology, right? And I love that you said, okay, at Elder, we say this is how accurate we can be, right? Which is setting a really different expectation for data science than here, like, I’m going to give you a like, tidy package answer, right? Yeah.
Evan Wimpey 36:09
I mean that that’s a constant struggle for us, and I think probably for everybody in the field, especially in the post chat GPT world, where there is so much noise out there. Or look at what you can do with this new AI tool. And demos look great. And in very specific circumstances, tools can be super powerful, and we come in as the voice of skeptics. Even if you’ve been in space for 30 years, or been working in space for 10 years, you come in as a skeptic, and it’s okay, well, this guy just doesn’t believe. He doesn’t think that this is going to work, and we’re going to jump to it and it’s going to work. And so I think it’s tough to try to balance that excitement about new capabilities with like, grounded in realistic expectations. And it’s hard. And as a consultant, when you’re trying to sell services out there, the flashy stuff sells, yeah? So I don’t know, maybe our strategy is to wait for the flashy stuff to break and then say, see, we told you, so you want us to help fix it, yeah? I
Eric Dodds 37:15
I mean, actually, a great strategy, yeah. I
John Wessel 37:18
I mean honestly, like, there’s a consultant we worked with that worked with a very specialty, kind of niche software. And like, his whole, and he’d been doing, like, 3030, plus here years, and his whole like thing was, like, I’m the cleanup guy. Like, I, like, it was a very small operation. And like, the big operations would come in and, like, and would kind of know the software, but not really. And he would just, like, basically be the cleanup guy, come behind each one of them, fix what they screwed up, like, like, and do it. I mean, it was like a third year business model. He did quite well, yeah, you know, and it’s, yeah,
Evan Wimpey 37:50
yeah. I mean, I mentioned model validation earlier, like that, broadly in it, but in analytics specific, like, there’s absolutely a niche there for model validation, for sure. Guy, I like it, the cleanup guy, yeah, yeah,
Eric Dodds 38:05
yeah, that’s super inside of a company as well. I’m just thinking about some of the stakeholders that you had, John, even at the executive level. And it’s, it seems like a really similar dynamic, right? I mean, you’re selling services as a consultant, right? So, but inside of a company, you have to sell as well, right? And I don’t know, would you say that’s easier or harder?
John Wessel 38:31
I think it depends. It can be easier in some sense, and harder, another easier in the sense that, like you, if you’ve been there a little while, you have that precursor. Like, track record, right? Of like, if you’ve been there several, yeah, and that matters. It can be harder in the sense where, if you’re getting into something more niche, then maybe you haven’t, like, like, we got into some very, like, specific types of forecasting. Like, I’m not an expert in this type of forecasting, right? So, like, that’s where it can be, that’s where it can be harder. But I think either way, it’s a sales job, you know, like, especially, I mean data. I think data people are maybe skewed more this way than a lot of disciplines. Like, they do not want to sell, they want to present the data, right? This is the data. Yeah? Evans, nodding, yep, yep, for sure. And, I mean, that’s how I feel too, like, yeah, like, I just want to present the data, but yeah, and the sale becomes clarity and communication and simplification often, like, without losing the core of it. And that’s,
Evan Wimpey 39:34
I think, the hard part, yeah, yeah, yeah, spot on, yeah, I would agree.
Eric Dodds 39:38
Okay, enough with the serious talk. Yeah, it’s time. Gosh, how did you get into data comedy? Evan, this is fascinating to me.
Evan Wimpey 39:48
I majored in it in college. You got four year degree
Eric Dodds 39:52
in data. That’s sort of the byproduct of two master’s degrees.
Evan Wimpey 39:59
Yeah. No, sorry, dad. If you’re listening, I didn’t. I majored in, well, business administration, so probably equally useful. So this is shortly into the pandemic of Elder research. We’re a Slack company. I don’t get paid by them to say this, but if you’re on teams, maybe just go on slack for a little bit and just see how nice it is. Slack is great for our internal communication. And since we were pretty involved academically, we’ve got people that do research and write books, and somebody was going to a conference on undergraduate statistics education. So basically, a bunch of stats professors and they had a call for papers. And I don’t have any stats papers to my name, but in the call for papers, they said, Hey, we’re also looking for fun stuff. So if you’ve written a poem, if you’ve written a song, performed music, little talent
Eric Dodds 40:53
show, a joke, yeah, yeah. Elicited
Evan Wimpey 40:57
that, yeah. And so I don’t tell my boss. But, you know, I spent like 40 billable hours just thinking up statistics. Of course, I came up with one. Actually, I came up with three submissions. One of them won, won first place. So I was invited to this conference virtual but I was invited to this virtual conference, got an award, read the joke I could, like, hear people’s eye roll virtual conference, but I got a $50 prize, so that made me a professional statistics
Eric Dodds 41:29
that you have been paid in award winning, yes, an award winning professional?
Evan Wimpey 41:36
Yeah, exactly. And so I absolutely wore that out. I told that joke to my colleagues hundreds of times over. My poor wife has heard it so many times, and just start to tell it enough, and start to think of some more jokes, and you become the guy with the stars jokes. And so, you know, we do mention, I did several, like teaching engagements with clients. Say, Hey, before we teach this forecasting class, why don’t you open with a few jokes? And so I’d tell a few jokes, and yeah, started getting a pretty warm reception. Started thinking of more jokes. Eventually I wrote, I did little market research. If you want to write a joke book, you have to have 100 or 1001 but there’s no way I was getting to 1001 so had 101 jokes, published a joke
Eric Dodds 42:18
book, and that’s a lot of jokes, yeah, it’s a
Evan Wimpey 42:21
a lot, yeah, the quality drops down a lot there at the end, right? It’s all flows. Well,
Eric Dodds 42:26
statistics is like a bell curve, yeah? I could, yes,
Evan Wimpey 42:29
yeah, I would. I made it. This is very far right skewed, or, I guess, left skewed. Yeah. Then you have a book, and then some conference organizers get a hold. Say you want to tell jokes at this conference and some university programs want you to come along. So now, you know, I still try to tell some jokes for all the research clients, but now I I’m not quitting my day job, but every chance I get, you know, if somebody wants me to come tell jokes for their data or analytics or tech team, then I’ve been doing stand up for folks. Okay,
Eric Dodds 43:01
so I’d have to, I want to hear, I want to hear the jokes, but there’s a big difference between writing a funny joke, yes, and delivering that in front of people, especially live. Can you talk about that a little bit? Because, like, that’s a big transition. Yeah,
Evan Wimpey 43:20
That is a really great point. And I think it mirrors sort of our earlier conversation, where you’ve got data science people that may be able to build these great technical tools, but they can’t sell it, they can’t communicate it in a simple way. And so I think you could almost get technical on joke writing, but the joke delivery is very much a human endeavor. How do you inflect? How do you stay engaged with an audience? I’ve done a few virtual and it is infinitely harder than it is in person, when you can make eye contact and engage with folks. So when I do stand up comedy, I don’t tell a single joke out of the book, unless somebody requests it, they’re just the jokes from the book are standalone jokes. The stand up comedy is much more storytelling around data and analytics and finding the humor in it. So, yeah, it’s a very different discipline.
Eric Dodds 44:15
Yeah, man, that is amazing. I could keep going down that path, because that’s fascinating to me.
John Wessel 44:21
We could do a whole nother episode.
Eric Dodds 44:23
We could, yeah, yeah. Data, comedy episode, stand up. Comedy episode, yeah, okay, we’re close to the buzzer here, so All right, and we listen this long, good. We have made you wait this long. That was a great call, making people wait until the end.
Evan Wimpey 44:39
Yeah, I’m sorry for all the angry emails you guys are gonna get. Yeah, we’ll take it. So the joke that I’ve told more than any other is the one that made me an award winning professional. Yes,
Eric Dodds 44:52
I was gonna ask. If you weren’t gonna share it, I was gonna ask her that last Yeah, actually,
Evan Wimpey 44:55
I gotta. I’ll tell you about it first. But I got a funny story about it too. Did you hear about the base? Ian, who built a model to tell her when she needs to go to the dentist. Well, it said she didn’t need to go for six months, but that’s probably because she just had a week prior. Yeah, that’s so
Eric Dodds 45:16
great. That is great. My
Evan Wimpey 45:19
The boss is way funnier than me. I’m glad he has a successful consulting firm so he doesn’t get into data comedy, but every time he hears me tell that joke, which has been way too many times, he counters with Evan, did they ever tell you if anybody else submitted a joke to that contest? Thanks, John.
John Wessel 45:41
Wow. That actually makes it even more funny.
Eric Dodds 45:44
That is great, Yep, yeah, oh, man. This one.
Evan Wimpey 45:47
This one. This is, I got the book. Yeah, I’m looking at the back of the book cover. This one is, you hear about the 12th grade student that failed his machine learning exam so badly that he had to go back to 11th grade as a classic case of scholastic grade descent. One of the best things about telling very nerdy jokes is, like the built-in defense mechanism, if people don’t laugh, then I just assume they just don’t get what they just don’t Right, right? Yeah, you can feel good about it.
Eric Dodds 46:17
Process for writing these or do they like it? Are there more lightning strikes?
Evan Wimpey 46:22
Yeah, that’s a great question, and it’s tough. So there are a few in the book that end up being sort of formulaic, almost, and I don’t like those as much, but most of the time, I had a commute. I’ve got about a 30 minute commute on days when I go into the office and when I don’t do it so much anymore, but when I was writing the joke book and when I’m trying to make new stand up material, I just ride in silence. Just ride with the no radio on, like windows up, and it feels so bizarre in today’s world to not have some sort of input. Yeah, the tech stack podcast playing in the background, but then just freeing my mind, and then realize basically no time in my life where there’s no input. And so I started taking walks, driving with no radio on. It’s doing things where there’s sort of the freedom to be bored and let your mind wander around things, yep, and there’s been a ton of dead end chases. But then I come up with something that I think, oh, okay, that that could stick, that could make it, yeah,
Eric Dodds 47:24
yeah, yeah. I love it. Awesome. All right. Well, do you have one more for us before we close it out today? Yeah, I
Evan Wimpey 47:32
will tell another one. This is even, this is objectively worse. It didn’t win the prize, but it was submitted to the same joke competition there. So you’ve got a guy who recently graduated, studied data scientists, studied data science, and is looking for a job as a data scientist. Can’t find any job market stuff, so he applies for a logistic analyst job. He didn’t know anything about logistics, but hey, it’s got analysts in the title. He goes in, he’s interviewing with the guy. He’s got a big map behind him. The boss says, let’s say you had to put a new distribution center. Here’s our network right now. Where do you think you’d put a new distribution center? They thought about it for a Senate, yeah, when he put a pin right in the middle of a divided highway, and the boss, boss just last so, yeah, you’re going to put our distribution center in the middle of a highway that’s not going to work, said, Sir, a normal distribution center is always on the median
48:33
background.
Eric Dodds 48:36
I love it.
Evan Wimpey 48:38
I love the Bayesian one. It’s, at best, the second best joke that I got. It’s,
Eric Dodds 48:41
it’s amazing, all right. Well, where can listeners get the book? Predictable jokes.com? Predictable jokes.com. Yeah, it’s on
Evan Wimpey 48:50
Amazon. You can find it, but I’m getting buried by all the people who know how to do good SEO and keywords. But predictable jokes.com you got to stand up there, you got a few clips, and you can get the book. Yeah, the book. Yeah, the book is called predictable jokes. Go
Eric Dodds 49:03
to Evan’s site. You know he ‘d rather give him the money than Yeah, exactly, yeah, exactly. Thank
Evan Wimpey 49:09
you. Thank you. I’m coming for you, Jeff.
John Wessel 49:14
He started with the bookstore. You’re starting with a
Eric Dodds 49:16
book Exactly, exactly, yeah.
Evan Wimpey 49:18
If we were doing synthetic controls like he does his trajectory math. Well, to mine, yeah, I was gonna say
Eric Dodds 49:25
I don’t know if you’re gonna actually be able to produce a good synthetic control there, but All righty, well, Evan, so great to have you on the show. This is a great time. Excited to check out the book. Thanks for joining us.
Evan Wimpey 49:35
Absolutely great conversation. Appreciate y’all having me.
Eric Dodds 49:38
The Data Stack Show is brought to you by rudderstack, the warehouse native customer data platform. Rudderstack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.