This week on The Data Stack Show, Eric and John chat with David McCandless, Founder of McCandless Consulting and former data analyst at Amazon. During the episode, David shares his journey from chemical engineering to analytics, leading to his role at Amazon. He also discusses the complexities of time series forecasting for workforce management, emphasizing the importance of explainability and accuracy, especially during the pandemic. The conversation also touches on the practical applications of forecasting in business, the need for clear communication with leaders, and Amazon’s culture of detailed documentation for decision-making. David’s insights offer a deep dive into the real-world challenges of data analytics in corporate settings, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. We’re here on the show with David McCandless, David, welcome to The Data Stack Show.
David McCandless 00:29
Thanks so much, Eric.
Eric Dodds 00:30
All right, well, so much to talk about. And I love your story of going from sort of the biggest of the big to small, we’ll unpack what that means throughout the course of the show. But give us a little bit of your background, how’d you get into data? And then what do you do today?
David McCandless 00:47
Yeah, yeah. So I studied chemical engineering at Georgia Tech, and started my career in oil and gas, and then basically, got married and needed to change jobs when the oil market was down. So I kind of had to reinvent myself. And I ended up in this budding field called analytics, the thought that I would suit nearly Well, started a master’s degree online at Georgia Tech is actually in beta cohort, GT OMSA. and graduated in May 2020. So kind of 2017 started different analytics positions, first analyst and manager. And then my last full time position for a corporation was with Amazon, from 2020 to 2022. First year, they’re forecasting second year kind of a data engineering role. And towards the end of my time at Amazon, sort of working for myself, and really enjoyed that. And then, toward the very end of my Blendo, at Amazon, they offered a voluntary severance package. And I saw that as a way to make my side hustle my full time job. So that’s what I’ve been doing since early 2023.
John Wessel 02:03
Nice. David, one of the topics, I’d really like to talk more about his time series forecasting. I think we were talking before the show. That’s an area that gets ignored. And it has so many practical, you know, practical applications for our business. So I’m excited to jump into that topic. Is there anything you want to discuss?
David McCandless 02:22
Yeah, I’d be happy to jump into the topic of time series forecasting. Nice.
John Wessel 02:28
All right,
Eric Dodds 02:29
let’s do it.
John Wessel 02:30
Let’s do it. David,
Eric Dodds 02:33
I want to talk about just giving some context for the types of things that you worked on, at Amazon to start out with. So you had a number of different, you know, analytics positions. Were there any specific projects that you worked on? Is Amazon such a big company? So just interested in any of the specific projects that you worked on?
David McCandless 02:54
Yeah, yeah. So I worked for a part of Amazon called the employer Resource Center. Amazon is much different than most corporations I’ve worked for, and that they really like to enforce things rather than outsource things. And part of that is philosophical, they have a very particular way that they like to run teams. And part of that’s practical, just a lot of vendors have a hard time growing with their scale of growth. So where a lot of corporations I’ve worked for, they choose to outsource kind of their high transactional HR stuff like, Hey, where’s my w two? Or, hey, you know, I think you messed up my timecard. Amazon chose to insure that. So I was responsible for time series forecasting for a team of about 500 agents distributed around the world. And those agents served about the 1.2 million people that worked for Amazon, at that time, passing as forecasting demand for when and how many employees were going to call in from down to a 30 minute interval. So the capacity planning team could plan shifts out to, you know, for the next two years, so it can set more strategically about how big a budget we need to expand to another location to hire agents. That’s crazy. How did you get into times that you worked with time series data previously, or? Yeah, good question. So prior to that position, I had taken one course on time series forecasting, and my master’s degree. And the experience on the job was actually pretty different from the experience in the classroom. And I felt like the classroom was very scary heavy, like very much let’s run some statistical tests to test stationary this data. Whereas when I got to Amazon, and was developing forecasts, I won’t say that its theory went out the window. But we took a much different approach to time series forecasting.
John Wessel 05:28
So I’m curious about roles like that. Did you take over for someone else? Or was there this a? I mean, I don’t imagine that would be a greenfield project at that scale. So I was in that transition. And that seems like a big step from like the classroom to like, wow, there’s 500 agents supporting you know, over a million, you know, and employees. Yeah,
David McCandless 05:52
yeah. So I worked for a really smart guy named Remender. And say, OH, MY laughs about this. So like I said, Amazon, they really like to insource things. So there’s this part of HR called Disability and leave services. And this group, they would handle really run of the mill stuff like, Hey, I’m having a baby, I need to set up my maternity leave, to really complex, gnarly stuff, like, Hey, I got in a car accident. I was severely injured, you know, I can’t do my job like I used to, can you accommodate me, so I can still work for Amazon. So Amazon had outsourced that service. And then they brought it in house, literally, the weeks that they brought it in house and went live was March 1 2020. And so not only that, not only is the world turned upside down for this team, but this team was told, Oh, by the way, you know, this whole pandemic COVID-19 thing, y’all got to handle this? Yes, y’all are tasked with dealing with, you know, the HR related needs of this. So from the get go, you know, whenever you are cast, with little history, you’re asking for a challenge, but then you add COVID-19 into the max. And you just want to pull your hair out, fortunately, not bald, so I don’t have much hair to pull out. So anyway, Remender, he had been wearing this hat along as forecasting for other teams. So a position was created as my position on forecasting, just for a part of disability and leave services. I didn’t invent the wheel. Fortunately, I got to benefit from a lot of knowledge transition from remembering, and then, you know, really forecasting for that team, and building out some of our framework. So that team makes total sense. That’s pretty wild timing. I’m sure that was.
John Wessel 07:55
I’m sure that was a heck of a ride. That’s like 100 years, low times of being in forecasting. I can’t think of any. How do you look at it? Maybe high because you could develop some skills quickly. But that’s asking for magic. Yeah. And I like to get anything half accurate. It’s like literally magic. That’s fascinating.
Eric Dodds 08:16
Hit David I’m interested in so what are the tools and sort of methods that you use to take time series data from analysis to forecast right? And time series data is interesting. It really can be so helpful. But it’s pretty challenging to wrangle and to get right. You know, of course, at RudderStack, we deal with time series data is one of our you know, sort of core pieces of like what we capture and deliver and so we’re really familiar with the use cases, and time series, like, rearward looking time series analytics is the most common first use case, right? Like, well, how is this changing over time? Right? How many users did X or did not do X over the past 30 days? Right? Yeah.
David McCandless 09:06
Describe going from time series analytics to time series forecasting. And what all goes into that from a methodology standpoint, and then even a tooling standpoint. Yeah, sure. Yeah. I’ll try to kind of work backwards a little bit from what was the impact like, like, why did we even have people trying to forecast like so we had 500 agents. That’s a lot of money. And Amazon also had very high standards for how long they wanted employees to wait on the phone. They wanted 98% of employees to wait less than one minute before a human got on the line and helped them so now, I know I know Amazon catches some shade about how they treat their employees and you know when you can’t, there’s gonna be balls dropped. But I really admired the way that they made the budget available to treat employees. Right. And that way, you know, I can remember being on the stand 30 minutes for a TNT and I’m thinking like, I paid like 200 bucks a month. And yeah, it didn’t make much sense. So anyways, to have that kind of service level of 98% of your customers and only having to wait one minute max, you basically can do two things. One, you can staff way too many people. Or you can have really accurate forecasts of demand, and then translate as accurate forecasts into a really efficient staffing plan. So Amazon did their best to take the latter approach.
John Wessel 10:50
How often then, the plan?
David McCandless 10:54
Yeah, yeah, yeah. So good question. The short answer is at a minimum, weekly. And I would create a forecast every week for the next two weeks at 30 minute increments. And then my friends in workforce management, they would take that and translate it into how many people do we need staffed in this 30 minute increment. But then every week, we also refresh what we called a long term forecast, which was a weekly granularity, through at least the end of the year. So that was, you know, kind of the next level and planning like, okay, you know, do we need to start recruiting? Or hey, you know, maybe volume is going down, maybe we need to let some of these temporary employees, you know, not renew their contracts. And then at least quarterly, we’re looking to the next year or two years out, ad demand, and then trying to translate that into a headcount slash cost figure. But back to your question of what that actually looks like, again, I’m trying to focus on the business problem, but basically, trying to provide an excellent experience to customers, but at the same time, doing that in a cost efficient manner. So our stack was pretty simple. For most of the time that I was there, I on my laptop would run our scripts, and our scripts would query our Redshift database and get some aggregated time series data. You know, if I’m going to forecast weekly, you know, aggregated that weekly, then that data is coming out of our telephony platform. And then, once it’s in our do some minimal cleanings, formatting, and then using a really common library, called the forecast package, by like godfather of forecasting, this guy named Rob Hindman has a great free book, if you want to know anything about time series forecasting, let’s look it up. So we could use his package. And do use a model that’s been around for decades, we could use a Rhema. And then while ah, we get a forecast for the next, whatever it is, you know, 14 days and 30 minute increments, or, you know, longer term forecasts a weekly granularity for the end of the year. And then we would load that doc to Redshift. And from there, my friends and workforce management, they would pick it up and use it to plan shifts. More so for the long term forecasting. There is also a pretty high touch process of taking the output and interpreting why there was change, and then sharing that with the business leaders who were responsible for the costs of running the business and the service level goals. And that was honestly the hardest part of the job was explaining to an in pretty, pretty sharp people, leaders at Amazon, but you know, they’re not Maestro does in time series forecasting, trying to explain to them why the why there is movement, like you know, why do we think demand is going to increase? Or why do we think your demand is going to decrease? Or? This is my least favorite question. Why do you think demand is going to increase in October when three weeks ago you told me that it was going to decrease? And like I said, I was forecasting in the middle of a pandemic. And COVID-19 was very unpredictable. And people and how they responded to COVID-19 were very unpredictable. And that was probably the hardest part. Part of the job for me, was just giving the best answer I could to the decision makers. But knowing, you know, as soon as it’s released, it’s wrong. It’s a forecast. Yeah.
John Wessel 15:13
Was there a goal like some, I mean, during the pandemic, I don’t know how you could measure this. But what was an accuracy goal? What was like good accuracy during the pandemic, and maybe good accuracy? For you pandas? Yeah. Most pandas? Yeah, yeah, yeah. So
David McCandless 15:30
for our short term forecasting, so that’s for the next 14 days. If we were off by more than 5%, then we had to give an explanation of why there’s a variance of more than 5%. So you know, if I forecasted 100 calls, and there’s 94 or less, there were 105, or more than I had to get an explanation of why that happened. And then same thing with the long term forecasting. If that is the case, you’re only allowed 5% of unexplained variance. Wow,
John Wessel 16:12
interesting. That’s why. So that must have been the majority of your time, like, you probably spent a very small fraction of time forecasting and the majority of the time, like, into investigating. Is
David McCandless 16:24
that fair? Yeah. So I mean, the actual like, refreshing of the forecast, that was like a 15 minute exercise. And, you know, if I was more savvy, I would have automated all that anyways. I’ve gotten to that point now. But anyways, yeah, the, the variance explanation took the majority of the time. But the good news about that was, you could feed that back into the forecast to make it more accurate. Like, whenever there’s a big driver of variance, there’s one of two outcomes. One, it’s a black swan event, that’s never gonna happen again, like the snowpocalypse of February 2021. And you can at least encode into your training data, a dummy variable to say, like, Hey, there’s this one time event, we never expected to happen again. Right? Or, you know, if you’re really smart, you can, you know, file that away somewhere, like, in a table of, of events. And so, you know, the next time you think that there’s a snowpocalypse coming, you could say, Well, hey, you know, back in February 2021, here’s what happened. That’s most frequently what happens. Sometimes, in a minority of cases, you find a new driver of your time series, you find like, Hey, I find a strong correlation between X metric, and between my time series. So then at that point, you can say, well, great, maybe I should incorporate this into my model, as it makes my model more accurate. But then you have the challenge of Well, great. You know, this other thing, is there a prediction? Is there a forecast available? No. Is it something like, you know, world population? Yeah, there’s a lot of forecasts available for road population. But if it’s something like, you know, I don’t know how many thunderstorms that are going to be in Texas, in the next three months, like your guess is as good as mine. It’s, it’s how many thunderstorms? It’s going to be in the next three months in Texas. Yeah. One,
Eric Dodds 18:37
I wanted to get a little bit more to the process of explaining variance. And I want to tie that to, you know, one of the topics we’ve covered in the show recently is just talking about tying data to business results. And I couldn’t think of a more direct way of doing that, you know, that was a huge part of your specific job role was actually explaining variance. So you kind of talk through understanding whether a particular variable had a strong correlation with the forecast. But what are some of the lessons that you learned about, you know, are some of the big takeaways, after repeatedly having to explain this variance to people who, which is often the case, right? Like, if you’re a business leader, you’re probably smart and driven. And, you know, that’s why you’re successful. And that’s why you have the responsibility of looking at data and making decisions. But you’re not an analyst, right? And you don’t know the ins and outs of the R script. That’s, you know, a fun thing to query Redshift, and you don’t even know and you shouldn’t necessarily know what all the individual variables are, you know, that are inputs into the forecast. Right.
David McCandless 19:57
So what are some of the lessons that you took away from having to do that repeatedly or maybe some of the ways that you grew in that overtime. Sure, yeah, definitely. One of the ways that I grew over time to be kind of mine, one, keeping the narrative as concise and as consistent as possible. And like I said, I worked for a really smart guy, Reminder Remainder had a great way of basically saying, Look, David, you know, whatever drives our volume, it boils down to like one of three things, which of the three things is driving the man here? Yeah, that’s a little bit of an oversimplification. But you know, 80% of the time, that kind of approach works. And, and you know, that I mean, those are encoded in the model, like, your time series as a function of its own history, plus these other three variables that influence it. So that was one thing, just making the narrative more concise and consistent to consumers of the forecast over time, and that, yeah, they had a really hard job, they, you know, they were the ones that had to go to clients and say, like, hey, I need more budget to hire more people. And these are the reasons why. So you know, the more consistent you can make that narrative, the more they’re going to understand it. When you start adding in all these one off things, it could just, you know, confuse them and make it harder for them to do their jobs. So that’s one thing. The other thing was understanding, what is the like, who is going to be the downstream consumer of a forecast? Like, is this just Hey, you, my direct customer, like, you want to know what the latest I’d say? Or you’re gonna, like, take this to finance and argue for a budget? Because depending on how you answer that, I’m going to treat this forecast differently. Like, is this like, a one time like 30 minutes after or like, I need to spend like, the next week thinking really hard about those?
John Wessel 22:13
Yeah, that I call that like a fidelity question, right? Is this like a Lo Fi directionally correct? Or is this like, hey, like you said, We’re gonna make financial decisions?
Eric Dodds 22:26
And, you know,
John Wessel 22:27
This is gonna be a problem if we’re wrong. Yeah. Yeah. Do
David McCandless 22:32
you feel like you It took some learning to sort of grow late, one thing you said that stuck out to me was, you know, they have a really hard job. Because they have to go to Finance. Is that something you grew in empathy for over time? You know, or did you sort of know that from the beginning? No, that was something I definitely grew and empathy for over time, especially when I started getting dragged to the meetings with finance. Like, okay, well, here’s the guy who made the forecast my business leader, that business leader, yeah, yeah. David, explain yourself. Yeah. Right. Yes, especially when finance started holding my toes to the fire. Yeah, I definitely grew an empathy for the leaders over time. And I think that even tied into how I forecasted you that there’s this trade off between explainability and accuracy. Now, maybe we could have included another two variables that might have made the model 2% more accurate. But then there’s two other things that the business leader has to be able to basically vouch for and explain to finance like, yeah, just the simpler, that we could keep the model, the easier it was going to be for everybody to work together.
John Wessel 24:04
I’ve got to ask this question, because Amazon’s famous, you know, for the memo, yeah. Oh, yeah. Is that like a whole company thing? Like, how would you prepare for a meeting? And like, did you use visuals? Like, did you have, you know, Excel or something? Like, I’m really curious to some really practical things about like, what Yeah, looks like
David McCandless 24:25
Yeah. Yeah. So prior to working at Amazon, I worked in a TNT business. And I don’t know if I could have, like, any sharper culture shock going from, like, Uber bureaucracy, like I worked at the corporate headquarters on the 15th floor. To an Amazon. They say it’s always day one. And they want leaders to be single threaded owners of their fate. So even though I worked for a company that had more than a million employees, it felt like I worked for a company that had 2000 employees. Because my director, Janelle, had so much power and moved so nimbly. Now, yeah, we had dependencies on other organizations. But it felt like I worked for a company in 2000. Given how fast it goes. So, yeah, and how decentralized decision making was. So, back to your question about the memos, yeah, we call them PR FAQ, or press releases, frequently answered questions. And so I can remember being frustrated initially, like, I would have an idea that I wanted to get legs like, I think this is a good idea. Let me go talk to some people. Maybe write an email about it. And they would say, Hey, this is great. But you need to write a doc. And enough people told me that I realized, if I want to get anything done here, I need to run a doc. And so I just, I, I guess I assimilated into that culture. And then once I did, I was like, Oh, this is amazing. You know, if you can just, if you can just take the time to put pen to paper. And granted, also get the right people to read your doc, you can innovate, like Amazon has done it. It’s not a matter of who makes the prettiest slides or who is the most compelling public speaker in front of a room. So yeah, a very peculiar thing that Amazon does. And I’ve even left Amazon, I continue writing PR FAQs for different parts of my life.
John Wessel 26:47
Yeah, that’s impressive. So when you were talking about the culture shift, I think you just listed two really practical things. Like, who has the best public speaking skills, who’s the best PowerPoint slide creator with the best graphic designer behind them? Right, like versus, you know, the DOT, the, you know, memo, which I would have thought, like, sure. That probably makes people like flesh ideas out more. But I didn’t at all think about the equalizing factor, right? Yeah. Now, like, there’s probably some difference in writing quality between people. But that stands out less than like a professional, really well done presentation and graphics versus just writing. So yeah. David,
Eric Dodds 27:29
one more question about Amazon. And then I want to switch gears and talk about, you know, we said at the beginning of the call, you know, sort of very big to very small, and your work with small businesses. But before we go there, I want to ask one last question on, you know, sort of taking data and speaking to business results or business stakeholders. So, I totally agree with you. And that’s been a learning for me, like, the more concise the better, right. And I loved your description of like, you know, accuracy versus explainability. And there’s a trade off there. Did you ever face a situation where you had enough conviction to say, Okay, we actually do need to dig into the details here, because it’s, you know, optimizing for explainability would obfuscate something that was really important. And if so, how did you handle that with the business stakeholder? Because once you go down that path, you have to think that’s a really tricky path to walk.
David McCandless 28:34
Yeah, that’s a great question. Yes. So I guess I go back to when you’re trying to interpret why a number is moving, it boils down to it’s a black swan event, it’s not going to happen again, or it’s a recurring theme. And then you can try to move in response to that recurring thing moving. But if that happens, then you have to have a forecast for that thing for that predicting variable. So there was a point where our demand was moving so much in response to something we were aware of. That basically, there was a change in staffing. But there is so much movement, that I don’t think I had to do much arm wrangling because the business really valued accuracy. I had to say, we need to start predicting based on this variable. Whatever it is, basically we need to make decisions based on this variable. But the only way this is going to work is if we have a forecast for it moving forward, and I kinda need to get that from y’all. So there’s kind of some shared ownership of the problem and the solution and that will A Yeah, yeah, I love it.
Eric Dodds 30:06
Okay, let’s switch gears, Amazon, you know, over a million employees, you’re forecasting for all of that gigantic datasets, big decisions, the clients you work with now look very different.
David McCandless 30:20
So tell us about your average customer as a consultant doing data and analytics. Yeah, so my average or median customer has less than 100 employees and more than 10. And I’ll say I have been in business for like 20 or 30 years. So for the vast majority of them, I am their own way of data resource. And so you can think of me like a fractional data team for them. So typically, it looks like they have some question that they want to be able to answer better to better serve their customers. And or they have some really tedious processes that they want automated, sometimes they go, things go hand in hand. And so I’ll work with them for engagements that sometimes last a couple of weeks, sometimes a couple of months to build an AI solution and get it up and running. And then I’m available afterwards for little tweaks, enhancements, maintenance.
John Wessel 31:45
Yeah. So on the maintenance side, we talked about this a little bit before the show. I think that’s a really tricky problem in this space, like SMB space, where, like, there’s tons of value in automation. You know, when it works, it’s great, but then somebody will change their API or something will handle it doesn’t mean that you are architecting anything wrong, you could have made all the, you know, perfect right decisions. How does that typically work for you? And do you even have a philosophical approach to that problem? Yeah,
David McCandless 32:18
That is a great question. Yeah. And I’m thinking about one project that was in an automation project, and was in testing. And we tested with a variety, it kind of hinged on this one API, we tested a variety of Odyssey organizations to play nicely with that API, and face no issues. And then as soon as the customer tried to start onboarding real clients to use the solution, all the organizations did not play nicely with the API. Sounds right. Yeah, those are lovely. And I don’t know if that’s, you know, completely stopped him from using the solution, or just kind of hampered the rollout of the solution. I’m not sure. But I think that a way around that problem is to pursue gainsharing models. And instead of the customer just paying for time and materials, or paying a fixed bed, because that aligns the incentives of the customer, and consultant to see the solution through to the finish line and kind of close the loop. And make sure there are results. And you know, if the results are not what’s expected, and the consultant and as, you know, financially incentivized, I want to do what I can to tweak what I can to make sure this delivers results for my customer. I haven’t done any gainsharing models. Can as a consultant. I’ve been on the other side of the table as a client. But I think if there were like a silver bullet, that’s the one that comes to mind right now.
John Wessel 34:23
Interesting. Yeah, that
Eric Dodds 34:25
is super interesting. What, like thinking of gainsharing, what are the types of questions you mentioned that, you know, okay, between 10 and 100 employees have been in business for a couple of decades. What types of questions are they asking? And how’s that? I’m interested in the contrast between the questions they’re asking and the types of questions that are being asked to Amazon.
David McCandless 34:51
Yeah, yeah. So one of the common questions is what Is the profitability of customers? Or really just any kind of financial metrics running a customer? And how profitable is this customer? How many resources does it require to service this customer? What’s the lifetime value of this customer? And then, of course, for more SAS or E commerce to promote more startup oriented customers, they can probably guess, you know, they want to know ARR and MRR XR goals. But, yeah, you know, what, that can also start to get into the automation space like, Okay, well, you know, great. We’ve built a very robust calculation of all these sales metrics. Now, what if you started automating more of your sales commission process?
John Wessel 35:51
Oh, that’s a good one. Yeah.
Eric Dodds 35:52
Yeah. So it’s like, yes, the Insight leads to, like, actual process change or process automation? And so do a lot of that work too? I don’t think I can say a lot. But I’ve done that work.
John Wessel 36:08
Yeah, the sales commission one, that’s a sticky one, right? Because like that is not just about the data plumbing like you like, because people if you get into automating something like that, there’s gonna be so many opinions. People like, Oh, this is a great time to change it. Yep. Like that’s, that’s sticky.
Eric Dodds 36:24
Yeah, yeah. How are these businesses answering these questions today? Or are a lot of them just not answering it? Or we’re just, you know, it seems like on a binary level, like, we’re not losing money on this customer. And so that’s fine. And we’ll just we’re okay with dealing with that, even if we don’t know the specific margin?
David McCandless 36:45
Yeah. Yeah. That’s a great question. As far as how it makes me think of it, I recently saw these high ups from Tableau, Power BI, and Oracle, showing off their data, those products. And they said, but the world’s preeminent and favorite BI solution is not on stage right now. And that is Microsoft Excel. And so, recently, that’s, yeah, guys at the Gartner twice before a conference? And I think that’s true for you know, there’s no citizen or any skill. And so, yeah, technically, for people. They’ve got Excel as, like, their Glue between systems. And what that means, practically, is, they don’t have the reporting that they want, as often as they want it. Or maybe they get it on the frequency, they want it. But it’s riddled with errors. Because Excel is just, it’s hard to maintain something scalable. Or, you know, for that one employee that knows how to make that Excel file work. Their life is like hell, because, you know, like, what if they have a baby or they don’t want to go on vacation? Right? Let’s say Don’t touch the Mac. Yeah.
John Wessel 38:22
It like blocks, the you can’t send the file anymore, because it has a macro in it like,
David McCandless 38:27
yeah, yeah. Yeah, I’ll find it’s common that customers have, you know, they have their SaaS platforms. And maybe their sales platform will give them a figure that they’re looking for, like maybe an aggregate, like maybe an aggregate, the SAS platform gives them MRR, but they want to be able to get more granular, gonna be able to slice and dice. And if they want to ask that kind of grain of questions, then they might be calling somebody for help. When
John Wessel 39:04
and you kind of have to do that, right? Like if you want to get to drivers and improve, like, you can’t just have a rolled up number. Like that. Just doesn’t work from a business standpoint. Yeah, sure. Yeah.
Eric Dodds 39:17
And so how do you usually give us just an example walkthrough of a typical client? Let’s say they’re 50 employees, and just pick maybe like an industry or something that they do? What does the typical engagement look like for you? And one of the things I’d love to hear about is, from a tooling standpoint, if they’re using Excel, where do you go from, you know, SAS can get really expensive really quickly when you start, you know, throwing a subscription service.
David McCandless 39:47
You know, that data problem. So how do you approach that for a typical client? Yeah, that’s a great question. Yeah, given Most of my customers are small, it automatically knocks out A lot of enterprise solutions just because the minimum contract values are so high for those solutions, like Teradata, for example. So I’ve gotten comfortable with a number of products that offer freemium. Or they offer pay as you go. So like one great example of a freemium solution is retail. So Rachel has free retail, they have a free Postgres database. Yeah, it was Postgres. No, yeah, I think up to like four or five users. So I have one customer who had this quarterly reporting they had to give to the states like it had to get done. This was not just like, oh, you know, board, one, some shiny metrics, this had to get done. And it just made, like, the end of the quarter hellacious for them to try to do all this Excel wrangling. So I built them a retool database solution that really streamlines and organizes the date of entry. And then getting the reports that they need the state is like a one or two minute exercise. So retail has been great. And then like I said, pay as you go, dilutions, my most common data warehouse option of choices, Snowflake, especially, since they offer a one month trial upfront, I can tell customers like hey, you know, based off of your data volume, I think this is what your bill is gonna be, but you know, we’re gonna get one month then. And I can give you a more refined estimate of how much it’s actually going to cost. And then, you know, kind of in a freemium space, it’s great that DBT, they offer one license for DVT. Cloud. So you know, if I’m basically the data team, then that that model works. And we can have a great transformation, slash orchestration, even some documentation tools. And then for data, vis, most of these customers are on Microsoft. And so the cheapest option for them is just to pay an extra 10 bucks per month for Power BI, per user. But if they’re not on Microsoft, then there’s a lot of other great options where, you know, you just pay like a license, like, Tableau got some of the common tooling that I’ll use on a project. Yeah, I love it. I love it. I love it.
Eric Dodds 42:49
I love keeping it simple and scalable. And I think one thing that one thing that’s interesting about all of the tools you mentioned, is that you don’t think about a like, I don’t know, I mean, I would say even I mean, John, and maybe this is just my own, not maybe if there’s a certainly my own bias, you know, being in the data industry. You know, but you don’t really think about a, you know, 30 person company or 50 person company using Snowflake, you know, right, right? DBT, necessarily, especially if they’ve been around for 30 years, and they don’t have a data team. And so it is really cool to sort of see like those tools. And David, you know, sort of bringing that tool set, did they end up taking over management of some of those things? I mean, you mentioned maintenance, John, but
David McCandless 43:49
do you eventually do a handoff for someone internally to run those processes? Yeah, I do a hand off to somebody internally, and also develop such that. I don’t mean to be a part of the loop for these things to for the show to go on. Like, David, you know, doing XYZ commands once a month or something like that. Yeah. Yeah. All right. Well, time for like one or two more questions. John, I’m gonna let you take us out because I’ve been dominating the conversation.
John Wessel 44:26
Ya know, the tooling is interesting and something you said Eric, prompted these thoughts that 10 or 15 years ago, the tools that these enterprise companies were using are okay, maybe more 15 years, but think about like Oracle, Informatica, like some like tools like that. They were not licensed or structured in a way where a small business could use them at all, like a fraction of it like nothing. It was like core, you know, corporate implementation. And then it’s interesting. So then you have this cloud computing revolution, right? Yep. But that also opens up this really interesting new thing, where you’ve got this fortune 500 level, you know, like what the top tier companies are using where you can actually use it at a small business, whether you’re using kind of a fraction of an ounce, or you have your own little like, micro instance, basically, that’s actually like, a pretty unique. And I really like history. Yeah, that’s even possible. And I mean, the last five years is for data, like data was never done this way. Like, if you were a top tier company, you bought Oracle, and you paid more Oracle and more money, like when your developers made mistakes and wrote bad code, you just paid them more money, and the database ran faster, and like you do to keep it going forever. Yeah, you have equity and it was our problem. But I mean, it’s just a really cool time in history where that is all possible.
David McCandless 45:59
Yeah, I agree. I agree. David, thoughts. Yeah, I agree. I really appreciate players that I’m sure make most of their money off of enterprise, like Snowflake, I’m sure that, you know, 80% of their income comes from 20% of their biggest dish. But I appreciate that they have paid as you go models. So that, you know, even if your bill is only $10 per month, that you’ve got the resources available to you that, you know, will say like Apple does. And I think that can act as a great equalizer for smaller businesses. I live in a small town, just 30,000 people. And often the way I try to explain what I do to people, I’ll say, you like baseball and say, oh, yeah, I love baseball. And I’ll ask, have you seen the movie Moneyball or read the book? And most of you’ll say, Oh, yeah. And if they haven’t, I’d say, well, in short, in the early 2000s, the Oakland Athletics had the third lowest budget in the MLB. But they finished the regular season with the second best record. They even had a better record than the New York Yankees who had a budget three times larger than theirs. And the way that they did that was they were really scrappy. And the way that they allocated their small budget, they basically use data to make better decisions. And I think that’s encouraging for small businesses, particularly in regions like mine in Louisiana, where we’re always losing talent, losing opportunity to our giant neighbor, Texas. So the solution for us to grow is not, you know, just pour more money and resources on it, because we’re never gonna win that way. We’ve got to do more with less. And I think data and some of these solutions that we’ve been talking about, are a way for smaller businesses to do more with less and win the underdog battle.
Eric Dodds 48:23
I love it. Well, David, this has been such a fun episode talking about insane forecasting Amazon, and the ways that small businesses can use very similar or many times the same tooling. So it really has been great. Thanks so much for giving us some of your time today.
John Wessel 48:42
Yeah. Thanks, David.
David McCandless 48:44
Thanks so much for your time.
Eric Dodds 48:45
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.