In this bonus episode, Eric and Kostas talk shop around the wide world of data.
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co
Eric Dodds 00:06
Welcome to The Data Stack Show Shop Talk where process and I talk shop. I know it’s an extremely creative name. And this week, costed us, I think it’s your turn to bring the question, which I have not heard in advance. So what’s been going through your head?
Yeah. I’d like
Kostas Pardalis 00:25
to ask you, Eric, about your thoughts regarding the future of the customer data platform. Wow.
Yeah. I want to know, what’s going to happen.
Kostas Pardalis 00:43
You are the perfect person for labs. And,
Eric Dodds 00:48
okay, I’m gonna respond. Initially, I’m going to respond with a question. Okay. Do you have a a definition in your mind? For customer needed platform? Like, what does that mean to you? Or is that part of your question to me?
I mean, I thought something in my mind’s eye.
Kostas Pardalis 01:08
Do you want me to sell dogs? Yeah. Like your way to deflect like,
Eric Dodds 01:12
Rishi? No, I’m happy to. I’m happy to answer. But I’m trying to use the Socratic method, because it seems very appropriate.
Yeah. Yeah, I
Kostas Pardalis 01:22
think when we are talking about a customer data platform, I think we are talking about some kind of data infrastructure, but have like two main components, while me is, let’s say, more of like a database system that manipulates in with a good developer experience. The data that are associated easily with like user activities, they go, they can’t like some would say, some unique dimensions or like some unique ways of like working with them, and some very specific processing things that you want to do all them, right. So that’s one thing, which is the more let’s say, technical side of things at the more like data engineering side of things. And then you have anything that has to do with like, okay, yeah, sure. Like we do that, then what like, how do we use this data? And like, how do you, Iona or like, expose this data to the right audience and how the marketing teams can work with alcohol the product? General that’s and that’s a completely different, let’s say, functionality. I don’t think that’s CDP’s. In the general case, needs to implement both to be considered like a CDP, in my mind at least. But how to combine different flavors of a CDP? Yeah, but like at the end, I think that like a company will meet, like both sets of functionalities, if they want to extract value out of the customer data. So that’s what I have in my mind when I’m thinking about CDP.
Yeah. Yeah, that’s a great question. And something I
Eric Dodds 03:06
something I think about a lot, obviously, RudderStack, we’re building products that, you know, sort of directly answered that question. But I’ll try as much as possible to put my objective hat on an answer as a user of these products, because I’ve used a lot of them over the years. And, you know, you know, I’m fairly familiar. So I think there are a couple things, I think we’re already at the very early stages of the first phase, which is moving to some sort of central data store as your primary repository for all of this data, right. And I say we’re really early because there are still tons of customer data platforms that are doing really well, you know, as businesses. And there are really good products that manage this customer profile store in their own, you know, in their own systems, right. Remember that? That’s right, but they basically have their own set of databases, and they store a copy of the data, and they operate on it and can do a bunch of different stuff. But I think increasingly, that we will see that we will see that customer profile store, move to some sort of data store owned by the company who’s, you know, trying to collect all their customer data or whatever. Currently, they like the data warehouse seems to be the predominant pattern that you see here, which makes a lot of sense, because I think, you know, I mean, there are a lot of really great things about data warehouses, but I think we also have to keep in mind that like sheer practicality of it, they were already there. They were already being used for a lot of different things. And so, literally from it convenience standpoint, it just is sort of the path of least resistance, right? That’s the initial tool that he’s gonna use for that, right, you probably already have whatever you’re like replicating your product database into your warehouse to do some sort of analytics, your, you know, whatever pulling transactional data and from this system or that system, right? Like, okay, well, if you’re trying to like, build, you know, collect all of your data in one place, party started that process in the warehouse. And so it doesn’t make sense to try to like, create something really different. I think that trend will continue, I actually don’t know about that. It’ll be interesting to see what happens in the future. I think that probably has like a lot of legs in the future. But I also think we’ll see interesting different kinds of architectures that emerge around enabling that that aren’t necessarily just the data lake or just the data warehouse there. But conceptually, I think that makes some sense. And I would say, like, as a user, I’m a big fan of that, just because it sort of gives you like, ultimate flexibility, which is really nice. And that that tends to be the challenge, like package, you know, at least with the ones that I’ve used before, is that, you know, you kind of
you kind of have like,
Eric Dodds 06:19
actually, like most SAS is like this intentionally, right? You build use case, you build your products to cover, like a set of use cases for a set of users. And even then, though, it is a bell curve, right, you can cover like this primary set of use cases for like a certain number of users. And when you get out to the edges, you know, it’s just, it’s really hard to serve all of those particular use cases, when it comes to data, especially like what you can do with your customer data, manipulating and all that sort of stuff. I think having it on your own data store, you can almost think about it as like, sort of flattening the bell curve a little bit in terms of like, the things that you can do calculations. I mean, whatever, right? I mean, it’s your data warehouse. So there’s a certain limit. I also think that as the you know, one of the big things is like, Okay, this new architecture comes in, and you have tools that are enabling that and that’s sort of the David David and Goliath story. But I absolutely think we’ll start to see some of the really, really big players, like move down market towards that architecture, right. So you have like, the sales forces, the Adobe’s you know, like the Oracles, right? All these gigantic companies have, I think Microsoft, even right, they all have some version of like a large enterprise CDP. And what’s really interesting when you think about that, is that they a lot of these companies, also, I mean, Google has a marketing cloud, themselves much more emphasis on advertising, but they also have, like, data stores themselves, right. And so it is pretty interesting to think about those companies also moving towards that architecture, it’s a lot harder for them, because their entire companies are sort of, they’ve spent decades building products that are not an architecture, but that’s really interesting to think about.
So, here’s the other.
Eric Dodds 08:16
Here’s another one. So that I think is probably pretty established, I think a lot of people can like, you know, that’s not like a huge revelation, here’s a thought that I have, that I think will be really interesting to see play out, maybe isn’t discussed as much, I actually think we’ll see a lot of the logic around, like acting on the data, move further down in the stacks. You know, so you said, Well, what are you like, you collect all this data? Great, like, what do you actually do with it, you know, marketing needs to go drive more pageviews create more leads, or whatever, right? Customer Success needs to, you know, whatever, upgrade accounts, mitigate tickets, blah, blah, blah, product needs to optimize towards feature adoption, etc, etc. And one of the challenges you have there is that you have a huge amount of business logic. That is, even if your data is not siloed anymore, right? Because you like have it all in a centralized data store. So let’s assume you have all of your customer data. You’re and let’s say you’re like activating this data, you know, you’re getting it out to all your downstream tools, etc, right? I mean, those are pipelines that exists. Like that’s not those patterns are like available, and they’re not rocket science, right? There’s nothing novel there. But even still, you have a ton of logic that lives in these downstream tools. Yep. So, for marketing, you say like, if these certain conditions are met, then like a status changes, right, which is actually a pretty Big Deal that’s like kind of an event, but also kind of a user trait like, status changes are really interesting. That happens a ton and like sales, CRM, software, whatever. You know, you have like for in product, for example, you have a bunch of logic around like participation in experiments or stage of onboarding, you know, that sort of stuff. You even have a ton, I mean, the classic business logic silos in analytics, right, where you have a large bunch of conflicts, logic that lives in reports and stuff like that, which is probably like the lowest level in the stack.
Eric Dodds 10:36
that actually creates a lot of challenges, because sharing logic across downstream tools, especially across teams is very challenging. Right? I think it’s become a lot easier to solve the Data Silo problem, but the business logic, business logic being siloed is pretty difficult. Yeah.
Kostas Pardalis 10:54
So do you think, do you think that let’s say like to have like, more debt, or brace or whatever? Do you think that we are going towards like a future where more of let’s say the business logic that is being implemented, there is going to move to the data warehouse, and these tools will become like, SR in terms of like, the functionality that they have, and they will learn more into the distribution? And like the marketing execution sides only?
Eric Dodds 11:26
Great question. I think that mean, the honest answer is, I don’t know exactly what this will look like. I don’t know exactly what this will look like. But in short, my answer would be yes, I think that I think that in a way, those downstream tools like Marketo, will get thinner. Now. I mean, Marketo is like an Adobe products, right? I mean, you know, we think about the primary use case is to drive revenue for those companies like, this isn’t going to this change isn’t going to happen quickly, especially for those gigantic incumbents, right? We’re talking about like, decades and decades of, you know, established ways of working and processes and ways of doing business logic can unwinding That is no small thing. But if you think about, if you remove all of those sort of, like, practical barriers, if you think about having a shared logic layer, it really makes a lot of sense, right? Because you can have multiple downstream tools access that logic layer, which means that they get thinner from like a logic standpoint, and ultimately, they kind of become like, more of a last mile mechanism, as opposed to like a keeper of business logic. Hmm. I don’t know, if the warehouse almost has to be like a layer on top of the warehouse, it almost has to be a layer on top of warehouse, I don’t know, if the like, is, you know, is it something that Snowflake will build, like, into their ecosystem? Like? Really,
Kostas Pardalis 13:11
I think in the way they do already, like the this whole thing of like data obligations, I think that’s what he’s like, actually targeting, or it’s a good use case for that, like how you can build that’s a great point, the business logic for these kinds of tools. Because if you think about, like, in a way, that’s already happening, right, like with, like, something happened within versity. area, it’s that’s part of why the work that you would do, like on the strap downstream application, like, creating a new audience, right, now you’re doing the live data warehouse, and like, you just have like to create a new audience and use the data there and the audience is created, you do have like to do any querying or like filtering or like, whatever of like, the data in Marketo, or whatever to, to create the audience, right. Like it’s already done, like in the Data Warehouse now, can you do this, like with all their complex cases that you have there with the signals? And all that stuff that you talked about? With rivercity? are
probably not. But I think that I mean,
Kostas Pardalis 14:23
the data, the data, let’s say, the database, not the data warehouse, okay, just like to make it a more generic, I think, have like pretty much or at least like 99% of like, the expressivity that is required totals for these kind of jobs is specifically for like the stuff that you’re doing like with marketing, right? Yeah. At least like from the use cases that I have seen. So instead of like enriching, like pulling data out and writing the data, pushing it back to the data obligation, show the market you can go there and at the end didn’t make a filter to filter down, like, based on some attributes, the audience, why not do that, like in the data warehouse, right? Like it’s a complete waste of resources where I can tie in my going back and forth to Dwolla stuff. So that’s what I’m, I’m wondering. And I would love to learn more about like, what is missing, let’s say from the Databricks and Snowflake of the world of there to make this gap, because we’re not there yet. Right?
Eric Dodds 15:27
I think it’s, this is what I think is interesting. Your I agree 100% that the expressivity like, the functionality from like, an expressivity standpoint is definitely there. When I said like, is the data warehouse a place? I guess what I meant is, like, it’s more of a question around the interface. But you’ve actually talked about on a couple of recent shows, right? What does the actual experience of the end user look like? Now we were talking about sort of developer experience and how they interface with whatever, or something like that? And that is a really tricky question, right? Because one of the big reasons that a lot of this logic has remained in these downstream tools, just because it is really easy for those downstream teams like marketing, to interact with, in an interface that ultimately produces some sort of logic without being super technical. To your point, though, it is becoming increasingly technical, right. And the place that a lot of those base operations are done, is increasingly becoming some sort of database. Right? And, yeah, it is interesting to think about, you know, the, take reverse ETL is an example. Right, which is a weird example. Because like that, companies have been doing this pattern for years and years, right? It’s not necessarily something new. Really, what reverse ETL is, is just an interface that allows that makes it easier to like, interact with a database, and then, you know, sort of, like orchestrate it, whatever, right? Like scheduling jobs and all that sort of stuff, you’d have to do that mean, right? It really is just a basic interface layer. For pipelines that existed for a long time. I will say, though, one, there are two things that come to mind that I think are really helpful examples of patterns. So and we’ll use reverse ETL is one and then I’ll actually use great expectations as the other example, because they show this dynamic happening in two ways. So if you think about reverse ETL, like, one interesting pattern there is that ultimately, what’s happening is that you’re building logic in some sort of sort of interface, and then it’s producing that logic in code under the hood, right? Like, I mean, essentially that SQL at this point, right, like, Okay, I want to like get some data from the warehouse and pull it into Salesforce, or Marketo, or whatever, right? And so I use this interface to interact with it. And ultimately, what it’s doing is producing SQL under the hood that runs an operation and sort of, you know, executes that or whatever. So that’s interesting, right? So you like have your you have an interface that’s producing like logic as code underneath the hood. Hmm. Great Expectations is interesting, because it’s the same concept, but in the other direction, right? You define it data definitions and great expectations. I don’t know if you remember this, but it automatically produces human readable documentation for those definitions, even though you’re essentially like it’s a Python library, right. And so you’re making definitions in a Python library using Python. And it literally produces like human readable documentation. So those two patterns I think, are really instructive for thinking about potentially the ways that this could look in the future. But, you know, I don’t have a crystal ball. So we’ll have to see. We’ll say Indeed, indeed. Is that the buzzer that went by really fast. Did I talk to
Kostas Pardalis 19:18
Dr. Moss, it’s okay. right way of doing this?
Eric Dodds 19:22
Indeed. All right. Well, thank you for joining us for shop talk and we will have more good, tantalizing conversation for you on the next one.