Episode 114:

Solving Data Infrastructure Problems at Startups and Enterprises with Max Werner of Obsessive Analytics Consulting

November 23, 2022

This week on The Data Stack Show, Eric and Kostas chat with Max Werner, the owner of Obsessive Analytics Consulting. During the episode, Max gives an on-the-ground view of data tooling by discussing switchboard operators, startups versus big enterprise companies, and the evolution of CDP.


Highlights from this week’s conversation include:

  • Max’s career journey (2:54)
  • Going from a small startup to a big enterprise (11:15)
  • Dynamics of a switchboard operator (17:09)
  • Common threads through different companies (20:53)
  • When data is not the answer (26:57)
  • The evolution of CDP (29:38)
  • Data sources to include in a CDP (35:16)
  • Working with event data (37:19)
  • Max’s take on other tools (41:18)
  • The cutting edge in data (43:09)
  • Building your data company in an evolving environment (49:28)


Find Max: https://www.obsessiveanalytics.com/


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Eric Dodds 0:03
Welcome to The Data Stack Show. Each week, we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Welcome back to The Data Stack Show. Kostas, today we’re talking with someone we’ve actually known for quite some time, Max Werner. We met him way back in RudderStack days and think they were an early RudderStack customer, but it’s been quite sometime now. Anyways, he’s done all sorts of data engineering type work, both at small startups, huge companies. And generally has a very smart approach to working with data and systems in general. And he’s not afraid to share his opinion. So currently runs a consultancy. Our listeners have heard past shows with consultants, I love the breadth of view that consultants get because they get to talk with a lot of different companies. And they tend to have a much more sort of objective, let’s say, like a realistic on-the-ground view of data tooling, right? Because they’re not building it, they’re actually implementing it. And so I’m really interested to know what he sees on the ground. And, in particular, what’s most exciting to him in terms of new technologies that are coming out. So that’s what I’m going to ask. How about you?

Kostas Pardalis 1:35
Yeah, absolutely. I think what is especially interesting with marks is that he’s a person who has experienced how things are get down the startup, in a huge company like Warner Bros, and also as a consultant to many different companies of many different sizes. So he’s the ideal person to try and find out common patterns and how these patterns might and things that I’m like different because of the size or like the company. So I think we’re going to have like a super interesting conversation with him. And we’re going to learn a lot.

Eric Dodds 2:14
All right, well, let’s dive in and talk with Max.

Max, welcome to The Data Stack Show. This is a great treat for me, because I guess we’ve actually known each other for multiple years now. Which is pretty cool. So that’s great. I’ve talked to DataSift for a very long time. So excited to have a dear friend here on the show. And yeah, why don’t we start out where we always start out, which is you giving us a background. So you’ve been a practitioner, and now you’re doing your own thing, so give us the story.

Max Werner 2:51
Yeah, I mean, thank you very much. It’s a pleasure to be Yeah, it’s been a number of years, like 2017 or ’18, something like that since we were working together. Back at that time, I was just working at recompete their customer data infrastructure, which at that point, included settlement that included RudderStack. Funny how these things happen.

Eric Dodds 3:18
I do have a quick question because I know you have an interrupt you, but what was your title back then? I’m interested to know because a theme we’ve had on the show is actually been like, how roles are changing. And I know what you were doing back then, which we would now call basically, like, probably a mix between like data engineering and analytics, engineering, and maybe even marketing ops, but I don’t remember what your actual titles.

Max Werner 3:46
Oh, that’s a good question. So when I started at that company, started there. And segmentation. I think it was performance marketing manager or something to that effect. Wow. I know, right? Because these things tend to always start up under marketing. Yeah. Right. Because marketing articles people to data usually. They’re the ones dashboards and analytics to do their things. It then eventually, as that position grew, and it was your analysts, and you’re selling the T, data engineer, specifically, I became the data operations pitcher.

Eric Dodds 4:29
Okay. That makes perfect sense. Sorry for the interruption. We can return to that. But then I said, we’ll walk back in history. Okay.

Max Werner 4:39
Yeah, yeah. So I did that for them. And as that kind of became a stable status quo, I mean, was happy to be there but go approach by a recruiter to do another segment implementation. And that was about two weeks after I had just closed on a house. And I tried to tell him, it was not a great time to switch jobs. Or when she replied, it’s like, Well, you got your five-year mortgage, schmuck didn’t have that waste the time. So I packed with him a bit more. And that ended up being about a year and a half long, kind of almost like a contractor position to do a second case because that was kind of at the start of COVID. So a lot of companies were hiring, but contractors were okay. And so, I kind of went out there on my own, but it was 40 hours a week, so it wasn’t really kind of doing my thing.

Eric Dodds 5:48
Sure. And that contract was it a what kind of company was it like…?

Max Werner 5:54
That was Warner Brothers.

Eric Dodds 5:55
Oh, wow. So it was okay. So we’re talking like a serious? That’s enterprise-level Implementation.

Max Werner 6:04
Exactly. Right. So whereas before, it was a small, medium business, b2b SaaS company, yes. And 100 employees. That’s right, up for two, two. Yeah. Go goto drink that for Warner Brothers is for like, specifically their video game division. Or your Batman said, Mortal Kombat and those kinds of things. Because obviously, those games generate a whole lot of telemetry data, right? Oh, gaming is crazy. Yep. Potential that can happen. Or it also just gonna be UX and nothing just UX game, which are the levels that people keep struggling with or whatever. And as that contract was starting to near its end, time kind of freed up a little bit there. I was either gonna look for full warning somewhere, it’s again, and just continue doing this kind of thing, as a freelancer or as consultant, whatever you want to call it. And, well, that’s what I ended up doing. So that was about, like, the beginning of last year really kind of went down fully on. And, yeah, it’s businesses. Good. It’s been here. It’s growing. I have a lot to do. Right. You have two employees now, which is crazy concept, pay people with your own money is theory thought. Yeah, it’s all good. And I kind of found somewhat accidentally a fit over a niche is kind of the types of clients. Oh, interesting. Yeah. Because the like, I mean, haven’t done startup and enterprise where it’s very much b2c right directly to the consumer. Yeah. I found really liked that the underlying problems really didn’t change all that much. Collection, about data modeling in terms of didn’t give the end users, right, the marketing, see all the stuff you have coming in whoever else, whether it’s like tire divisions that are only to a general market, like you, or your product analytics, or whatever. Yeah, exactly. But it’s kind of all the underlying, same problems, right? Like people go interact with a property all sorts by that a website or an application. And there’s some sort of user funnel there a way to get them to convert, want to see what happens to them after they convert so that they coming back. So I can bases all over the place. I’ve done used car sales with a company called CANADA DRIVES. They interesting, great project there and a couple of marketing agencies or agencies in general, I’ve worked with that kind of websites or mobile applications where their clients that are about sports a tour is a hotel book, it’s completely all over the place. But still, but underlying problems are the same. But what kind of stood out to all of them was and that’s kind of the niche that they kind of fall, for the most part didn’t really have Canada drive see exception, they had a really big data stack to start with. For the most part, companies tend to not have customer data infrastructure in the way that the four of us here would think about it right there. Yeah, some data infrastructure is there isn’t a SQL database somewhere that powers their application or row and now there’s a whole bunch of spreadsheets will export things tool to the aberaeron spreadsheet In more spreadsheets and kind of moving them from that over to something a little more sophisticated, or as you would call it, kind of the growth stacks. Starter stuffs.

Eric Dodds 10:14
Yep. Getting that data layer in super cool. Okay. So one thing that you said, I think is really a really helpful insight, which is that you sort of went from like a small medium, like b2b SaaS, startup, implementing a data layer, getting customer data infrastructure set up, etc. And then you went to Warner Brothers, and you were doing direct-to-consumer gaming, and then you’ve worked with a number of other companies. And there’s this common thread of a similar problem, which I think is a really helpful insight, right? Like when it comes down to data and sort of what you’re doing with that data, the initial problem set is very similar, right, the context may be different. And the particular data points may be different. But you have users interacting, and they need to do X, Y, and Z. And you need to track that to optimize it. I’m interested to know, though, outside of that sort of similar problem set, what were some of the biggest things you noticed, as a data practitioner going from like a small to medium-sized startup, to a company like Warner Brothers? And not even necessarily I mean, in general? Sure. But not even necessarily in terms of the data stack. But like, what were the big things that you took away from making that jump because that’s a pretty big jump?

Max Werner 11:37
Yeah, I’d say the biggest thing that kind of stood out for me is that people in a startup do 10 different things, right. Like, when I was performance marketing manager, and I had to do that implementation of segment, but also manage the Redshift instant, make sure that that doesn’t run out of space.

Eric Dodds 11:59
Performance marketing manager, managing Redshift is hilarious, by the way.

Max Werner 12:03
Yeah, that’s something you jump over to something like Warner Brothers, and now all of a sudden, you have very specialized people, where there’s an entire team of people like multiple people that their only job is to maintain their friendship clusters. And also, it’s clusters that right with different kinds of notes in them for more compute or more storage, and they customize the, what’s it called the work queue scheduler, or whatever, it turns out to work for their particular kind of workload. And it’s just like, it’s the skill set is not as broad on a skill set, but it’s extremely. So there are multiple people whose sole job is to email marketing for like, one day. And here’s Yeah, and other teams’ sole job is to do kind of in-app purchases and seeing what people are going by, Bill, or what users are drafting with. In Game Fair, it’s very specialized. And more red tape because once you get to a certain size, it’s just more processing.

Eric Dodds 13:17
Totally. Okay. So, for you as a data practitioner, though, like the work that you were doing, kind of, I would guess, at least by the sound of it, you are probably sending data from the game usage to like multiple destinations to like enable email marketing, or like theater, Redshift, clustered or whatever. So did you have to use did you have to interact with all those teams? What was that, like from your vantage point? Because you were sort of working on a layer of the data that touch multiple things that you mentioned.

Max Werner 13:53
Yeah, it? Yes, it was certainly a lot more kind of communicative and kind of scheduling and interacting with that coordinating things with each other. Spent it would be at the start-up. Even a lot less, it turns often. I wasn’t designed in tracking plans. Because for the most part, those things plan, but they are and obviously, like the game developers are working on these things. They know what is happening where Yeah, yep. Yeah. So they’re like the, there was some coordination with like these kinds of folks and noxious. The Redshift people and the VPN people because there are no security considerations for this kind of pipeline. And it’s like the main focus of kind of the pipelines and stuff that I build there was to get to gray kind of the existing different systems that you had with what they teach. Right and then say goodbye to part of that. As you also know, there was a patchy Airflow like multiple axis, sequential tasks, that we’re building data sets and exporting from Redshift or import to Redshift. And there was really kind of this, almost like the switchboard operator-type things. And okay, we’re grabbing this from here on this from there, and this needs to go here. Because when you have so many people that are sub-specialized, the biggest problem that you often encounter is that people don’t know, what is available as far as data or even pipelines. Yeah, right. They, you can even add a company that signs when it couldn’t like blow people’s minds or say, this is like, oh, yeah, no, like, We could totally tell you, if this person has played this other game, they were like, you can. It’s amazing, right? It’s like, Yeah, I’m Cam, right? Because would you are so kind of focused on the rule of law enforcement or niche their debt. And so having people like me there who’s basically sole job was to see like, okay, what are your data problems and how to resolve it with what you have was interesting.

Eric Dodds 16:12
Super interesting. Okay. I’m gonna ask one more question. And this is going to be kind of a lead-in to hand the mic off to Kostas. The concept of a switchboard operator is really interesting to me for two reasons. Number one, I don’t think we’ve ever heard that on the show. But I think it’s a very helpful analogy because I think that, depending on the role, like if we think about the data team, like a switchboard operator is certainly like a very imperfect analogy to describe like the function of the data team in general. But I do think it’s actually interesting to describe, like part of the value they can provide, as you just expressed. Also, I think it’s interesting because switchboard operator is, from a technical perspective, is you get into the area of like, orchestration. Can you speak a little bit more to that, like, when you think of switchboard operator, like, Where does the human end? And where does the like orchestration begin?

Max Werner 17:16
Well, in an ideal world, the human part ends at where it becomes wasteful for a human to do the thing. So that would be actually doing like the data exports at your own kind of like manual data processing important. So that’s where you would want to go into the orchestration side of things. Make sure that basically, anything that you traditionally say, oh, a computer is probably really good at that, because it’s doing the same kind of things based on the same rule sets very, very often over and over again, that’s where you shouldn’t rely on orchestration, the human part should really be used for what humans are good at, which is creative problem solving, and kind of making these connections and resale repair. So we have payment data here and Stripe and things like intercom for message. People go into these gates, or this app or this website. Nearly just kind of work with all those people that want to do things and kind of really discounted, okay, what, what do you want to do that you can’t rent? say like, oh, okay, so I always I don’t know when my customers are renewing their plans. And so I always have to manually check Iver after ask our developers and they pulled out of the application database or to go into Stripe or whatever it myself saying, like, okay, and wouldn’t be your go, I live and breathe sense versus look. Okay. So, in other words, if I can get the renewal date, up-to-date, automatically in your Salesforce, like account object for the customer, would that help the guests that would be great. So really kind of, like, that’s the way that switchboard comes in, right? And say, like, Okay, I have all these different kinds of data sources. And we’re destination tools tend to be both and say, okay, we can bring that over here helping them do their jobs, because then they start thinking about other things they can do with it was like, Oh, if I have this and I can build farm, work clothes off of that can suddenly get that renewal date in my marketing automation to say, hey, you’re three months out from renewal, we put them into a certainly Hayden’s out saying, hey, our prices are going up renew now and you lock it in, or whatever the case may be. And that’s, I think we’re human part adds value because you’re in control c control v and spreadsheets. It might be necessary at times, but it shouldn’t be it Tire people’s drop all day, every day.

Kostas Pardalis 20:03
Yeah, absolutely. By the way, Eric, I think there’s at least one company released data with a name switch balls.

Eric Dodds 20:12
Is there really? I didn’t—

Kostas Pardalis 20:13
I mean, I don’t know if it’s still out there, but there was like a couple of years.

Eric Dodds 20:19
Oh, interesting. We should start domain squatting either way.

Kostas Pardalis 20:26
Oh, yeah. I think we can make money. 100%

Eric Dodds 20:31
Probably one of our listeners is already doing that.

Max Werner 20:35
I mean, even like to know, if you look at the main overview or something like RudderStack or in particle Snowflake dashboard, what’s the first thing you see? Sources, you have destinations got all kinds of sprinkling lights coming. That’s your switchboard.

Kostas Pardalis 20:50
Yeah. 100%. So, Max, you have seen many different environments, right, like, from a startup to Warner Bros to run your own business and like helping other businesses like to implement their data strategies. What was surprisingly common for you, in all these different sizes and companies?

Max Werner 21:19
Yeah, surprisingly common. Okay, especially like the clients that I’ve been working with since being full-time a consultant for kind of dealing with these. I tend to call them the free to paid metrics. Go, we have spreadsheets and Google Analytics. Just yeah. The most common thing that was not surprising was people say like, well, but it costs a lot of new talent to run this, it could the tools cost money, like what I have, right now it’s working, and it’s free. And because for some reason, a lot of people tend to think that don’t draw sell. So the three people that you have, or are four or 10, who spent a non-insignificant amount of time importing, exporting spreadsheets. You’re paying those, those people either that all could be doing or at a value add fit, because moving data from eight doesn’t add value. Making decisions based on it be in that state, that can happen. So that was really a very Kong, Enterprise me, but not as much What really surprised me was that a lot of companies make incredibly important decisions for everything from product development to marketing spend, to even like hiring decisions based on like, oh, we need more support, people are on incredibly, not even imperfect data, but basically data that has to see an accuracy level of throwing a dart at a board. Right? So the underlying problem, they’re being things kind of to some degree, data literacy, tick, tick, web litoris, where they go like, Oh, what do you mean, F blockers make it so that to no Mixpanel, or Google Analytics, or whatever isn’t working class or privacy features, that I don’t see out that this person did this thing. And if they think they make good, data-driven decisions, like you’re supposed to, down, not one, go on to things like a CD, to take some precautions to make sure that we actually are capturing data accurately and completely. And all of a sudden, their funnels look drastically different. I’m like, Oh, my God, we’re going to spend a million dollars on the next six months in this area to develop this out and nobody’s using it, whereas everybody here is like the product decisions are based on anecdotal evidence from the sport reps. Athlete, the vocal minority problem. And again, playing a lot very loudly saying, Oh, this is the worst thing because it’s the most important thing to fix right now. And you look objectively into data and yeah, okay. Be percent of my revenue accounts have expressed that as a problem.

Kostas Pardalis 24:28
Yeah. And how long does it take for a company to start figuring out that something is wrong or right because there is something wrong I mean, okay, we are using data to make decisions, but if the data is wrong, I guess, for how long you can be liked, right, something like at some point will go wrong. So how long does it take, and what do you sound like companies is like the first reaction.

Max Werner 24:58
Good question. If it takes longer that you that you’d want her that you’d like to because even when they start feeling a pain point, right, and be like, Oh, my performance marketing, and are so busy all the time, right? Okay, yeah, that’s just the way it is. It usually takes them even longer to realize that, well, maybe I need to hire another endless and another evil is that it really just kind of cloning people to do the same thing, that there may be a better read is something that takes very, very long for companies to really, it’s often you think it’d be this like big thing, Oh, we did something horribly wrong, right, this big catastrophic event, or we need to reevaluate. Now, it tends to be, at least from my experience on more on the SMB side of the model for the clients that were just this long, drawn-out process where at some point they came, we’re doing things with data, we’re collecting data, but I, I might doing it wrong, because like, it feels a lot more painful than it should, especially a look at our marketing materials for analytics tools. Or even if you look at like a demo data set that Mixpanel gives, right, yeah, this detail beautiful for me, like, Oh, we don’t have that. Why not? That tends to be those quickest moments. So when they start kind of to look at potential solutions and then compare the demos or kind of the good state of things in those two worlds with what their current they have an issue.

Kostas Pardalis 26:57
Have you ever had the experience of going to a customer last like for yourself, and you reach the point where you have to tell me that they are is not going to help you in what you’re doing? Because we tend, especially like for us who are working with data. That’s our job. That’s how we like what we have, we take it for granted that like data is something super important, but they don’t not always is able to give you the answer that we are looking for. Right? So have you ever like experienced that as part of consulting, like having to make the customer realize that? No, it’s not interacting, but it’s doing right to give you lights, the exact thing that you’re looking for?

Max Werner 27:49
Yes, and no, I mean, on the one hand, no, because what consulting practice is built around customer data infrastructure. So telling people data is not the answer. It’s not exactly the best sales pitch for me, from what I have seen the kind of goes into that is when, when people want to prove negatives, right, which is inherently possible. So that’s the thing that data can’t help you do. Really, when it’s, oh, we want to see, especially the why behind this was like, Oh, I have a great customer final. And we want to know, not just what year, which is the easy thing. Where do they drop off? But why are people dropping stare? And how can our event tracking tell us why that that those kinds of the whys, which is something that really data can’t tell you could get experts in that, from UX, or whatever that that kind of look at that step and say, that’s probably where you’re loose. And you can run that test to confirm whether or not that case, but that’s kind of the closest as far as, like data doesn’t get serviced. Why is something happening?

Kostas Pardalis 29:03
Yeah, it makes total sense. All right. So you want to be mending customer data platforms, like the concept of CDP has been around for a while, right? And it has gone through a few iterations. And from what it seems there are more coming. Can you take us through the evolution of the CDP? And let’s start with that. And then I have like a few more questions about like, specifically about, like, listed to be infrastructure.

Max Werner 29:37
Yeah, I mean, if we go really far back, what’s really sorry, let’s say, the early 2010s, customer data platform, it really wasn’t a big trip there. But what you had a lot was kind of these third-party data platforms, right where you have your tracking pixel or whatever on the website. And it then, with like third party cookies and everything, it tries to be able to keep other data about your visitors, right Reagan’s by the MasterCard data set and try it and then it will tell you that like, oh, the income range or whatever for your customers, or entry visa versus this, this range in trying to kind of do advertising based on, I think that’s kind of where that really started. And it so really third-party based and move that slowly into more first-party platforms were usually just a place to SaaS tool of some sort, basically, just all of a sudden, as, like a concept of your user data, and user attributes. You instead of just saying, hey, this link in Google Analytics for instead of just tracking events, a customer from my conversion. Now I all of a sudden war with users individually, Ecuador’s specific, this user has actually changed my website or my app accordingly. So from that, that’s kind of where it more moved to. And I think now we’re going more into this day. There kind of two schools of thought one is kind of a little more open approach, we say, a customer data platform is more of a really good way to kind of ingest and export data. We talked before about the orchestration side of things. And the more close approach would give us the data. We’re a silo that contains all your data. And we can kind of do sell off these computations for you can build these user lungs for your inside our platform you can do based on your first-party data inside our tool in product analytics, be it an advertiser, or even the CDP itself, you just store data.

Kostas Pardalis 32:07
And what would you describe us, we’ll say, like a reference architecture for the CDP today, like an ideal CDP infrastructure. For a company like an SMP, let’s say, company, because I get like these things, probably like change also with the size and like the complexity of the organization that you have like to implement them. But let’s consider, let’s say, an SMB, that’s pretty agile, they just start so they don’t have something already in there. So there’s a lot of freedom, choosing what to use. Show us a little bit about that.

Max Werner 32:44
Yeah, at the core requirement for that kind of company for a CD, CD specifically would be, let’s say, your kind of standard, implement one approach, right? Data collection is instrumented via the P. And if that handles your, your destinations, right, it makes sure that you’re going to let perhaps if your app also handles kind of client-side delivery of things for talking in the context of a website, you don’t have to make sure that your intercom knows that this person has logged in and is you know, Joe Smith, as opposed to random, anonymous user x. That’s kind of like the core piece of it. It really depends on kind of the complexity requirements a day to get updated data storage solution in a data lake data warehouse or when do you need to have complex data transformation abilities inside the CDP? Especially if you’re just building something from scratch it new odds are you can data can influence the architecture of that product, or whatever that SME is building or selling. Which means that you can you influence that from the data set built with data Malines. No and not from a perspective of like data-driven decision making, but making sure that the way that databases are structured, I don’t make it easier for you to get access to more data. You don’t need that much actually that in the CDP. Right? Because if you build it right from the source, your CDP needs to do less work in whereas an existing situation, existing company, CDP might have to do a little more heavy lifting because guess what you get based on how the application is.

Kostas Pardalis 34:51
Yeah, yeah. So I mean outside of okay, the most basic and important piece of Information, which is the user directions themselves that you need to drop, to cover set up what other data sources, you think that they’re like really important to include in a CDP as soon as possible.

Max Werner 35:16
As soon as possible, I’d say, data storage. Because you can do that as dirt cheap as you need it to write something like an Amazon S3, data lake, even for hundreds of 1,000s of events per week, it’s going to, it’s going to cost you a few dollars. So that’s, that’s not a big investment. And it’s easy to set up. And that way, you can immediately kind of start collecting this nicely standardized schema data for all your event data collection, I think that’s one of the first because you can always grow from that later to if you want or have to employ, like, I have complex data modeling requirements, okay. But Redshift on top of your S3, datalake, or Snowflake, or whatever. The same. If you’re on the Google Cloud? Now, BigQuery can support no bigger things as it grows. But I think that’s kind of the first thing that you really want to get into, once kind of your core user interaction is comfort. The next after that is what, what I would call catalog data, which is setting up your ETL pipelines for well, it’s your catalogs of data, these other tools that use your CRM, your billing system, your ticketing system, usually, unless it’s a very homebrewer, obscure tool, there’s a variety of ETL providers that can just take whatever is the state of that tool, and move it into that warehouse or datalake. And then you can confirm some analysis on your ticket roll, or that you couldn’t do before.

Kostas Pardalis 37:00
Yep. Makes total sense. And you mentioned a couple of times, the term complex modeling means, so what kind of process it we use it and what kind of model, we usually have to do over these event data? Because, okay, when data, select the most complex type of data out there, like sounds like you have a schema that’s like, super complex. But I have assuming that there are like a couple of things that need to happen on top of this data that they require quite some processing. So what are like the first steps in determining these are all event data into something that is more, let’s say, useful for someone who wants to do analysis?

Max Werner 37:54
Yeah, for sure. I mean, I kind of tend to classify these into three buckets. And I call it bronze, silver, and gold level data, where bronze is your raw sewage. That’s the data that you’re pulling out of some legacy system, or just if you’re just ingesting a wet port from some tool, right, where you don’t do any processing, or silver would be low, more kind of standardize things. So that’s something that RudderStack, like those throw into your data warehouse, right, you’re very standardized. Okay, every property of my track all becomes a column in the table of the same list. And that is good. And it’s ETL data faults are done the same cap rate, you just get a certain fixed scheme. Well, it can run analysis, but generally, it’s extremely long, with large pasting. The goal data is what I would call things that you can directly take now you can print that to the board, you can hand that to the executives, right? Those are things that power the dashboards and scorecards, whatever kind of leadership that they don’t care where it comes from, or they wonder, Okay, last week, I got this many people into the funnel from these different work dimensions. So the message of the moment there is not necessarily like that complex, right? It’s often just doing some basic treat computation. Has a user interacted with a certain feature that you’re tracking in the last seven days, have they logged in last seven days? It’s like making these brilliant, or what’s the architect veneer game? So like, what’s the average dollar amount that the user spent on net purchases? Those are not complex questions to answer but they’re not necessarily readily available from just your not your silver data. So that’s, that’s these kinds of queries that you build. There’s the bulk of where like, kind of your more complex data. It’s, I mean, it goes way up from there, if you have complex resolution problems, but usually, like the written just start out with is just lives. Yes, this user has interacted with this feature to the account. That was, how many what’s the feature adoption rate for feature x on this account? Easy math or SQL.

Kostas Pardalis 40:40
Do you see the need of using something outside of SQL for that, like using Python, for example? What’s the link? That I don’t know, you prefer, like to work with rights. And we see also like, these days, like more and more tools, or the like, a little bit more hybrids, like you have like notebooks like six that you can mix like Python together, like with SQL. So you have a lot of flexibility there. But once you’re experienced with other new, what’s your preference? Because at the end, it’s also like a matter of preference?

Max Werner 41:16
Yeah, for sure. I tend to try and stay away from notebooks, not that I disliked them, they absolutely have to use case for kind of my customer base. SQL is often because at the end of the day, we want to have these finished datasets that we could plug into visualizations, or that we can look to reverse ETL, to send them to, to other CRM, ticketing tools, and kind of output a standard tables and stuff like that. It’s something that is just super, super easy in SQL. And as I just said, the complexity is generally not in measure we Hi. So taking a Python notebook and going into a panda’s data frames. It is just overkill at that scale.

Kostas Pardalis 42:04
Yeah, yeah. Makes sense. All right. My last question (before I give it back to Eric): we use in our conversation quite a few terms that are usually associated with what we call the modern data stack and this idea has been around for a while now. So I’m sure you’ve seen many different products, different techniques, methodologies, like best practices, I’d like to hear from you. What you have, based on your experience is like somebody walking as pawns of like Rudderstack, here to stay. And in addition to that, where are you excited that more stuff is coming? Like you are expecting, like more innovation to see happening there?

Max Werner 43:03
Right. I mean, on the modern data stack? Yeah, the buzzword terror. I think the worst aspect of the modern data stackassociated marketing is, from the modern data stack is a holistic approach to handling your data. It’s not a tool. It’s not RudderStack Lucky, but it doesn’t make them modern data stack. It is. Because you need to have good event data, ingestion and transmission capabilities. But it’s not just the tools. So when tools present themselves as like, oh, like, our tool is like the modern data stack, it is all the data that you could possibly I get a little doubtful there. Because no, the more complex the tool kits, the more use cases it tries to cover the “Jack of all trades, master of none” problem tends to happen where you get either you have problems that your use case doesn’t work because the graphical user interface of this tool doesn’t let you click this certain thing. So here’s where I want to see. Thanks, going. I see a lot of things going this kind of using existing standard and open practices. It’s up for data Ma, SQL, or if it’s a little more heavy on cleaning, kind of sanitation. Yeah, notebooks, open pipes on libraries. Like that’s great because there’s a swath of people that have the skill they know, they know, they know, JavaScript or whatever. your tool of choice and not necessarily like kind of locking people into the proprietary, you don’t truly know, his either. Worst case out there own, there will programming. Even in open we look at something like Salesforce wherein you have apex. Yeah, it’s job. A, you don’t need to call that. But it’s Java with some extra bits, but it’s still its own thing, and it’s very kind of closed off. Whereas DBT on the other end, yeah, you build your data wrong with SQL, because it’s data and you can trap with it with SQL. That’s what makes sense.

Kostas Pardalis 45:38
Right. I agree with you, actually, I was thinking there’s like, a lot of there is part of, let’s say, the data industry, and technologies that are like super open, like anything of has to do with the database usually, like, okay, outside of like Snowflake, but most of the database systems out there, like open source, right? There’s this tradition, that’s okay, you can really go to market and within your database without going through open source at the beginning. But I think there are also many other parts of the data stack, that they feel like extremely closed shorts, let’s say, like, you see so many products out there that they’re not open shorts and make a couple whole categories of products about the actual closers. And I’d love to see more openness to I think it’s also important, as the developer experience becomes like more and more of an important aspect of building a product, because it’s not just users that you have, they’re interacting with the product, you have developers also. So yeah, I said you and I totally agree with you really, modern data stack needs to be also a little bit more open, I think, at least in terms of like the source code.

Max Werner 46:55
Every tool and open standards. We end up with not to Bash Salesforce too much. It’s a great tool. But a very common standard concept of what act is WebParts, which is super, super useful. Salesforce can do webhooks, it’s very complex to set up yet it by default, unless you had extra third-party apps into your Salesforce is XML only, which was great 10, 15 years ago, but not great when all your other tooling these days, either requires or just works better with JSON? Yeah. Openness, interoperability between? Yes. I mean, that’s, it’s my thing, right? Customer data infrastructure. That’s what my company does, making sure it goes from A to B. So the more open these tools are, especially X for like getting out and of these tools, the likely it’s easy to use. This is my biggest kind of wish for the industry.

Kostas Pardalis 47:56
Yeah. makes a little sense. Eric, what do you think?

Eric Dodds 48:01
Well, I’m going to shift the focus for the last question, because what we are discussing is the tip of the Iceberg and the conversation around how data companies make money off of artificial scarcity. And that is a whole other episodes, and we are really close to the buzzer, but I have a lot of very strong feelings about that. Is Right. And that’s why Salesforce is a BI hasn’t changed, right? There are very, very lucrative reasons for that. But that subject aside, which we definitely need to tackle on a future episode because it’s such a good one.

Max, in our last few minutes, this has been such a helpful conversation. Let’s look to the future. So you’ve given us a really good picture of the current state challenges, like ways that you’re solving it? You mentioned open standards, but could you get a little bit more specific and talk about maybe some new technologies that you’ve seen, or even like, ideas that you’ve seen, that you think are going to sort of create, not the next sequel optimization, but the next sort of stepwise change in like the way that you would advise companies to build their customer data infrastructure, that’s pretty different from what you’re doing to that.

Max Werner 49:27
Especially for customers, okay. I mean, that’s there are two parts. So the first one thing as far as a comparison of kind of more close versus more open standard would be if it takes something like Zapier and compare it to a company called post, principally do the similar things, right triggers data, some changes, sending off some, right, like these kinds of stuff.

Eric Dodds 49:51
That is interesting conceptually, like they’re both like, almost identical and functionality as like a pipeline.

Max Werner 49:59
Yeah, but they work very different in Zapier, you have extremely little kind of control. And you can’t really change your data as I saw, Oh, yeah, you can create a conversation in this tool when this incoming thing from the other two packets, and it’s very one-to-one, ratio, and pipe dream. Yes, Pipedrive, that’s a CRM pipe dream works similarly, except that you can just define as many steps as you want, and you can mix the oil and have pre-built data transformation. All now I want to run a Node js function. Now I want to run a Python function after this based on certain outcomes can also branch off. So similar concept, but extremely different approach than actually working with it, and one being a little more open, because could just do a lot more basic things which you’re doing sounds great, Python, whatever your choices, what I would advise clients to do is build your thing, application, your site, your web app, whatever, with data in Mali, and then, I mean, it’s easier said than done, but basically, don’t kind of mix and match things into the same kind of database table that does try and separate out your concern, right? Because there’s going to be someone that has to work with that data that isn’t application code itself, right, be that ETL process that you took, get some things out of there because you just can’t collect it up. Or, or in lieu of really kind of, with some do data processing in our data consumption. It might make your application Iglu more complex because you might have to work with it in terms of calling an API, like my own API for my tool, but then that’s a standard thing that can be reused out by is P for it. Right? So one example for acid, no, we’re getting the buzzer, and I’ll make it short is, let’s say a company has a very manual process of doing like sales, right? They go into their payment tool, and somebody emails them says like, hey, I want to place an order for this, they go on to say, say, processor, free to invoice, email that off to the customer. At some point, it’s marked as paid to stripe or whatever. And they go into their admin back end and put in a shipping order or whatever it is. Well, if you build it, so that this kind of screening the shipping order or making the invoices, markers on this kind of internal API, why use something like a CTE where stripe consent, a webhook, listen for, like news types being marked as paid. And you sent that you can do your own companies do it. And now, this person that had to like manually go into like five different pools to make a purchase or happen, doesn’t, and they got this on, I don’t know, better customer service, everyone’s upsells where they can simply get owe me 10 times the revenue because they don’t have to manually process data so much.

Eric Dodds 53:39
Yeah, yeah. That’s so helpful, Max. And I think, really, when we think about building some sort of digital experience, if I had to summarize what you’re saying, it’s, you’re building it for an end user, who’s your user, and ultimately, customer, but it’s adding in a layer of thought that you’re building for an additional user, which is the internal consumer of the data that the application produces? Right? And really, there are probably multiple internal not probably, there are definitely multiple internal consumers of that data, the marketing or product or customer success.

Max Werner 54:20
That’s a great way of putting it read because you have the business models have changed, right? It’s over the last few years. That’s where Costas, where we talked before aid to saying that you were going third party data, then it slowly transitioned into first-party. Now, a lot of companies just have a lot more first-party data where consumers of that data are not just the customers themselves, but these internal folks, you can arguably only care about kind of making it work for your customers and have a horrible or time-consuming experience work that’s an efficient or go to like staff retention problem, or onboarding takes six months because somebody has to explain three pages of like, think that, like work was supposed to you log into this tool and then you export this and you take out AP and six.

Eric Dodds 55:22
AABB42. But the other thing I would argue, and this is really a great point to end on at, to use your words at the buzzer is that ultimately results in a bad experience for the end user. Right? Like, there may be initial gains there. But ultimately, the people who are trying to make that experience better, like are pretty limited because they have limited visibility. Max, this has been such a great show, as always, tons of insights, congratulations on having employees and running your own thing, which actually, before we sign off, if we have listeners who want to talk with you about anything, maybe some work, but also, anything you talked about in terms of data flows or solving problems, where can they find you?

Max Werner 56:09
Well, you can find me at obsessiveanalytics.com You can get in touch with me you can most things exist there as well. Obsessive analytics. It’s what I do.

Eric Dodds 56:25
Right, obsessiveanalytics.com. Well, Max, thank you so much for your time, and we’ll have you back on again soon.

Max Werner 56:31
Sounds great. Can’t wait.

Eric Dodds 56:33
Kostas, I think one of my big takeaways is the switchboard analogy. I know we talked about that on the show. And I know I said during the show that I liked it. But I just thought that was a really helpful analogy, in terms of describing, like a unique way that someone on a data team can provide value. Sure, they’re making connections between technologies. But the intuition of the right connection to make, depending on the circumstance can play a big role. So I call analogies imperfect, but I’m sure that’s been used before, but I hadn’t heard it before. And for me, it was just a really great description of sort of one of the like, value-added parts of a data roll.

Kostas Pardalis 57:23
Yeah, well, I found, like, from my side, like, super rigorous thing is how common some terms and some issues are, regardless of the size of the company. And that’s, that’s, I think it’s like very exciting, I think, for the industry, because it means that like there, there’s probably me like to address problems. That’s comes cable, like from startup up to a large enterprise in a way. So that’s what I’m going to keep brings about something that like, obviously, like personality excites me and lots as part of this industry in work in this. So yeah, that’s what I’ll keep in touch. I’m looking forward to Happy Monday. And to be honest, I’m pretty sure like that, like the good thing with like consultants. They always have like new stories shared. So let’s have some back to the phone again.

Eric Dodds 58:21
Let’s do it. And yeah, thanks for listening. As always, subscribe if you haven’t, and we will catch you on the next one.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.