This week on The Data Stack Show, Eric and Kostas chat with Jason Davis, Co-Founder & CEO of Simon Data. During the episode, the group discusses all things CDPs including the importance of the customer journey in marketing and how data teams can support marketing teams in achieving their goals. The conversation also includes mapping back to data terms that data teams can understand, the next frontier of innovation in data infrastructure for marketers, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to the DataStack Show, Kostas. We have a special episode today, because we’re going to talk with someone who is building a tool that has a lot of really interesting data componentry to it, but it’s ultimately intended for marketers. So I think, sir, a little bit of uncharted waters for you.
Kostas Pardalis 00:44
You know, I mean, it’ll be interesting, I think, what’s happened. We talked about this before, it’s like the first I think we’ve talked about CDP’s.
Eric Dodds 00:52
I think it was in a shop talk. But today, we’re going to talk with Jason from Simon data, and they call themselves a CDP. So I think this is the first sort of official marketing flavored CDP. And he has a background actually in machine learning at a PhD level. So that’s even more interesting to me, because he obviously understands data on a deep level, but he’s building a tool for marketers. So I’m gonna ask him about that. And his background, and some of the, you know, the ways that he thinks about marketing tools and the way they interact with data teams, because I think he has a unique perspective.
Kostas Pardalis 01:28
Yeah, absolutely. And I’m very interested in talking with him. And like learning more about what exactly a CDP has to do with the data to offer like its services rates. It’s very easy like, you know, SaaS, to focus on the user interface and talk only in terms of like, oh, okay, like, we just create the golden series. But this might be like, at the end, like a very complicated process, that not only it’s complicated, like to describe it in suit well, but also something that needs, let’s say, it has to be driven by someone who has no idea about like, data, or like writing SQL or writing code. So I think that would make the problem even more challenging and great to chat with him about all these challenges and see exactly how they can be addressed by a platform like Simon data.
Eric Dodds 02:24
Yeah, I agree. Well, let’s dig in and chat with Jason. Yeah, let’s loot. Jason, welcome to The Data Stack Show. Super excited to chat today. Thanks, Eric. Pleasure to be here. All right, well, give us your background. So we want to hear about Simon data, but you actually have a background in working with data. So tell us, you know, give us your history. And then what led you to start examining?
Jason Davis 02:49
Yeah, I’m an I’m a reformed machine learning researcher, and in my previous life, you know, I completed a PhD in machine learning, you know, it’s actually how I met you, my co-founder, Matt Walker, and CTO at Simon did. And today, we’ve been working together for over 19 years now, it’s pretty hard to believe, you know, anniversary 20 will be coming next fall. Yeah. But you know, I always joke that it took me about five years into my PhD to realize the value and data isn’t in the algorithms for machine learning and how the data is actually used in practice. You know, my business was an ad tech product. Yeah, that was acquired by et Cie. And through that experience, I really just saw the power of enterprise data, centralized data, and how big data can really be a disruptive sort of disruptive force. And the core thesis behind Simon really brings that, you know, to today’s cloud, cloud enabled environment, cloud enabled data is a huge force. I think I certainly was not expecting this seven, eight years ago, when we first started the business. And today, our thesis at Simon is really you know, that being the application layer, you have for the next generation of data driven marketing, to really rethink what a CDP is, and what their data requirements are to affect your better lifetime value, a better row as Yeah, and better conversion rates.
Eric Dodds 04:01
Yeah, super helpful. Okay, let’s have a couple of terms in there that I think would be super helpful. So let’s start by breaking down what a CDP is, you know, because this show is all about data. And customer data platform is a term that is not new. It’s been around for quite some time. But it’s really easy for people. You know, when the term CDP comes up to think of different things, right? On one extreme end of the spectrum, people may think about this as a tool that sends marketing messages to users, right? Like a push notification or an email. On the other spectrum. People may think this is just infrastructure that processes customer data. And then of course, there’s a huge spectrum in between. Can you help provide some clarity to our listeners on the term CDP and maybe even help us understand like Simon’s philosophy and where you fit into this by Jim?
Jason Davis 05:00
Yeah, it’s a great question. At the end of the day, the category is undoubtedly wide, you know, talking today with the assumption of a RudderStack CEO. Yeah, we were just talking about, you know, how our joint strategy and vision are actually fairly complimentary, you know, which is unusual for two vendors in the category to get together. Yeah, I’m very close with Michael Katz and particle CEO, again, another vendor, this end category where we actually share quite a few customers in common. Yeah, when we look at CDP, you know, it really starts with asking you, how do you enable business stakeholders, marketers, in particular, to be data driven? And with this, what are the marketing activities that require deep, bespoke and specific access to an evolving world of data? Yeah, that starts with segmentation. But for us, as Simon, that also includes personalization, that includes experimentation. Yeah. And that finally includes thinking about all the marketing channels that exist today. And how do you optimize across them in an asynchronous way? You know, B is something that marketers call orchestration, which is very different from orchestration?
Eric Dodds 06:06
Yeah, absolutely. And so what do you think about, you know, the data supplied? Or how does Simon operate on the data side, right? Because if we think about, you know, or maybe actually a better way to ask, the question would be, let’s say, I’m on a data team. And one of my internal customers at my company is marketing. Right? And let’s say they’re using Simon, what does my relationship with him look like? And how does Simon, you know, sort of, how do I interact with you break that down for us a little bit? Yeah. So
Jason Davis 06:39
you know, the really the way the thing about Simon, as someone you as a data practitioner, data engineer, data analyst, data scientist, who has, you know, Cloud Data Warehouse Redshift, big worry, Snowflake set up, you know, is we provide the infrastructure, yeah, you know, in the core ETL terrain to help your marketing team get started around the problem that you develop, you know, that starts with your holistic modeling around identity. Yeah, that starts with your thinking about treating batch data in your warehouse and real time data separately. And that ends with building a customer 360 Not for your warehouse, but for your marketing team, to really, you know, have that view of the customer relative to the application, they need to affect, you know, as a marketing organization.
Eric Dodds 07:23
Yep, got it. That makes total sense. And so, Simon’s actually doing the building or augmenting the build of that 360 degree view of the customer on behalf of the marketer.
Jason Davis 07:39
That’s 100%. Right?
Eric Dodds 07:41
Yep. Got it. Super interesting. Okay. I have a question for you here. Because you’re so familiar with and are building your products for the use cases that these marketers want, you know, to your point, you know, data is, in and of itself, invaluable? No, right? Like, what do you actually do with it? That drives value, right? For our listeners who work on a data team, and maybe even the ones that serve marketing teams as an internal customer, but maybe aren’t as familiar with, like, what is happening at the end of the line with the data, you know, because maybe their role is more around like, modeling, packaging, cleaning, whatever those pieces are, and then you know, sort of delivering this data product to a team to an endpoint to a tool. What are the top things that you think are important for someone in that role on a data team to know about what’s happening at the end of the line, or sort of the last mile as marketing teams are using this data? 100%?
Jason Davis 08:47
And I’ll answer this question, Eric, by throwing out a term that marketing talks about all the time and then mapping it back into data terms that the listeners of the show may probably know, the bit more familiarity, you know, so what markers care about is something called the customer journey. Yeah, the customer journey, you know, is the interactions, you know, that individual has with the brand and business. From that first touch point, you see an ad on Facebook, or you see an ad on the open web. Yeah, a month later, you might click through a different ad, and you interface with a website and you read about it for, you know, 15 minutes, you then read it, you then listen to the business, the company’s podcast for an hour. And then maybe a few weeks later, you finally dive into some of the documentation or material. Yeah, and then eventually, you might sign up and be a paying customer. And then there’s sort of the, you know, entire engagement path downstream from there. You know, the first problem that we saw from a data perspective is really you’re thinking about how data is modeled from it. Yeah, sorry. How would the user identify his model? You know, those first interactions are second party data. Yeah, these are interactions that, yeah, that you know, that already been anonymous users. They’re not users. They happen completely outside the realm of data as you have it. Today for most folks who are running a data warehouse, you put that first touch point, you know, on the website, you would be a fully anonymous customer. The second touch point, you might be a fully anonymous customer from a different device, so a different cookie. And then at some point, these paths may converge together, you know, and those anonymous browsers may link to a single node user. Yeah. And then there are all sorts of considerations down downstream there on householding. And beyond. Yeah, and what Simon does, is it stitches together the customer journey, from a data perspective, you build those identity associations, and it does so in a way that’s actually modeled directly in your warehouse. You know, we are built natively on top of Snowflake. Yeah, and those models are made available for our customers to have full visibility, and then our platform deployed directly on top of that, you have to enable our marketing teams to see it to have that full continuity across the entire journey.
Eric Dodds 10:49
Oh, fascinating. Okay, so I’m just gonna say this back, so I make sure I understand it, because this is really interesting. So because we’ll actually, let me sit back and say, you know, having been involved in this kind of work, you know, hand rolling it. Generally, when you talk about building an identity in a warehouse, which, you know, is, I’m a huge fan of, because you have visibility, and you can manage the edge cases for your specific business and, you know, patterns of customer usage or whatever. But ultimately, you’re talking about an unbelievable amount of sequels that end up being really hard to maintain over time. And, you know, the trap that I’ve fallen into multiple times, is that, inevitably, there’s someone in the organization or a group of people who have tribal knowledge about, you know, how this thing works, and, you know, the output and, you know, whatever. And so it sounds like you actually sort of remove the need for data teams to, you know, to hammer through all that SQL and have so many, it’s difficult to maintain, but you maintain the visibility, like in Snowflake, so I can see all of that
Jason Davis 11:57
it’s on a percent, right, and the way we view the world is we break data problems down into one of two buckets. And this directly comes from my experience, building data teams, and dealing with all sorts of complex data challenges that come, you know, from, you know, anyone on who’s listening to the show, as you’re seeing either on the front lines, or managing teams, or data functions, you know, the problems that are bespoke to the business around around a collection around the irrigation around core Metro definition, another continent problems that are, you know, really have a degree of consistency across any, any brand. Yeah, and our strategy is to leverage the ladder, and to sit on top of all the work that’s happening that, you know, as a CD people couldn’t possibly learn, you know, while you’re bringing efficiencies and generalization capabilities, delight? Yeah. So, you know, that’s really sort of how we think about things, you know, at a high level, and, you know, the number one point of value that we bring to our customers is speed, you have data in, in, in your warehouse, you may not be perfect, you know, but guess what you can get to value in a few days, maybe a few weeks, it works. Yeah, let’s not build out, you know, all your infrastructure. And let’s look at where you are today. And look, data is not an end state journey. No matter where you are today, you know, let’s get to value. And then as you improve, you know, as you, you bring on a platform like RudderStack, and have better granularity around the data that you’re collecting from your website and mobile application, that’s another level up in your journey that we can integrate into our model. Yeah, for us, it’s all about staying locked in the data capabilities that your customers have. Yeah, while providing you better use cases all along to our end users.
Eric Dodds 13:47
super interesting. I want to dig into the difference between the sort of bespoke business needs versus commonalities across businesses. And Costas knows that for some time, I’ve got up, he may be tired of hearing it. I have this theory that, you know, when it comes to data models, there’s probably like, less than 10 that every business could use, right? With like, like modification, right? So like, e commerce or b2b SaaS, or whatever. And, of course, there’s differences, right, but when you talk about, you know, just sort of the core model that you use as a starting point, there actually is a lot of commonality, right. And a lot of the differences that businesses create when they get the spoke are actually more around syntax. D is, is that kind of what you’re getting at in that you can provide, like time to value or like help teams move faster when it comes to it, because you’re acting on some of those commonalities.
Jason Davis 14:46
Yeah, and look, I think you’re 100% and I’ll add a couple of caveats to that. You know, the other dimension is even business models for civilians. It’s just, you know, your marketing dynamics specific. Let’s look at this. Look at all the funnels across each. Have your core marketing channels, you know, across capade, and own and direct mail and email and push, you know, there’s a degree of exotica coming out here, especially across, you know, tools as well. Yeah. And look, if not up, not over 10 customers are extracting data into their Snowflake with Fivetran. It all looks the same. Yeah. Piecing that together, you know, an interesting Fivetran is building canonical AdvTech. US but, you know, our view is all, you know, are very hyper focused and mapping back to our in application. Yeah, to get to, you know, to end value as fast as possible, you know, the other dimension to this, which I think, you know, I would push back on a little bit, Eric, in terms of consistency and consistency across business models, is when you look at a lot of enterprises today, you know, they have data, which is is complex. And I you know, anyone who follows CAD standards said on LinkedIn, yeah, there’s a huge movement around data collection. And you know, for RudderStack customers clean slate, you’re recollecting data. But the fact of the matter is, if you have Yeah, if you have a system, which is 25 years old, okay, you’ll actually be able to go and reformat the code. It just ain’t happening. You can talk about LinkedIn all you want. Yeah. But yeah, it’s a multi year build out. And quite frankly, I think reverse engineering, the data as it comes in, is sort of state of the art for many of these larger businesses today. Yeah. And it really is the only path. Yeah, so I think there is, you know, when we sort of look at, and I respond to your point, I think, yeah, in some sense, our strategy is if you think about how data teams and large enterprises have done all that work, yeah, when the data is certainly not perfect, you know, but there are large teams of data people were did it too, for specifically tasked with getting making it one step better every single day? Yeah, yeah. Let’s take that as an input. Yeah. And then, yeah, and then apply a lot of our standard transformations in a way that, you know, aligns directly with the core applications that, you know, our end users need
Eric Dodds 17:01
to affect. Yep. Super helpful. One. One last question for me, because of the dimension of transformation, Kostas has a lot of questions about the data model. But one question I have is: Do you loop back into the warehouse? Because one interesting thing about, you know, sort of last mile tooling is that it’s actually creating touch points on the customer journey. But a lot of times those can be a terminal destination. So do you move back into the warehouse? Feed the model in a loop?
Jason Davis 17:30
Yeah. 100%. I mean, it’s, I mean, look like I think, you know, ultimately, you know, as, as someone who’s run data teams in the past, I couldn’t imagine building an application that did, that wasn’t a thing, but a good citizen of data on both sides. Yeah. So let’s make sure that the modeling integration paths are in a box, and straightforward and extensible. And let’s make sure that any and all data that the platform collects or creates your reports on for that matter. Do is then you share data into the environment.
Eric Dodds 18:00
Very cool. All right. cost us. All yours.
Kostas Pardalis 18:05
Thank you, Eric. Alright, let’s, let’s start the conversation by talking a little bit about the data that is used by CDP, right? I even want to build like a CDP. What kind of data we are looking for. You mentioned earlier that, at the end, what the marketer wants is, what cares really about is, like, the journey that the user has with the brand and like the company, but how is this represented in data? Right? Like everyone understands what like, it certainly is, but like, what kind of data we need to recreate digitally base this journey?
Jason Davis 18:45
Look, I think, I think there’s sort of when we think about data, the marketing applications are two types of data. There’s data that joins, you know, that directory has a customer identifier that has data that does not, I think one of the poorly understood, sort of points that marketers understand but don’t really communicate back to data teams is how critical customer data is. Yeah, in his broader marketing journey, ultimately, it’s not about what the customer does is how the customer interacts with inventory. It’s about how to do all the metadata around that inventory? Whether it’s whether the customer is browsing your homes on the web, or buying wages in E commerce context, you know, what’s the property? What category is the widget in, you know, what’s the price point of the home? You know, what geo is the home in, you know, is the home in a geo that’s, you know, primarily, you know, vacation homes, you know, or is it in a large suburban development with grades, you know, that kind of data is critical to understand the journey. And it really when it comes to segmenting and, you know, identifying your audiences. It’s critical for that as well. Yeah, and I think this is sort of when I sort of think about a lot of the rich data that It drives, you know, some of the really interesting use cases for simultaneous customers the data that actually doesn’t even originate on the customer. It can be joined didn’t the customer? Yeah. And then I think, you know, the challenge is how do you build, you know, a UI that allows the end user, you have access to Stata, in a way that’s, you know, nine out of 10 times no code and low code when necessary? You know, because ultimately, I think, you know, the name of the game today, you know, with, you know, so many rich Cloud Data Warehouse enabled environments speed, you know, it’s not, the question isn’t, you know, can a data team build a segment? Yeah. Because like, the answer is, yes. Okay. Yeah. Yeah. The question is, how long? And furthermore, who’s actually responsible for building the segment? Yeah. And are they unable to do it in a way that can take a few minutes instead of a few weeks? All right. First of all, let’s,
Kostas Pardalis 20:52
Let’s talk a little bit about some definitions, because he used the term inventory, right. And here we are, like, also a big part of our audiences, like, engineers, and data engineers they might not be like, so you know, like, they know, like all that marketing terminology. So what is inventory? Like, when you say, like, when we are talking about inventory? What is this?
Jason Davis 21:11
It’s any database object that doesn’t key into a customer? Anything? Yeah. And anything that can ultimately have an interaction with a customer? Yeah. So yeah, that’s really what it is, in generality and practicality, it’s what the customer can buy with the customer , can browse with the customer and view the content that a customer might read. And here we are looking at, let’s say, assets that are only digital, right? Or there’s also like data that might be coming, let’s say from I don’t know, like physical stores and the directions that the user might have, they’re like, Is this also something that is happening? 1,000%? I mean, it’s, you know, there’s a question, you know, sort of give a marketing application, you know, I know you’re trying to bring love to the the data use case, but you can build, you can identify a set of customers who have an outstanding support ticket in the last month. Or you can ask your left, fine, everyone who has an outstanding support ticket of type X, or type X is something that you really messed up on, and you want to be able to remediate quickly, the support ticket might have a category or classification, you know, an x is the classification. And as is the business requirement of identifying every user, you know, three weeks when you do this to a joint, I guess, yeah, satisfies that condition.
Kostas Pardalis 22:25
And humans also mentioned something else you talked about, like there is like a great distinction. There is like anonymous data and data that have like an identity, right like that we can attribute to a specific user that we know some information about that person can, can you tell us a little bit more of how each one of these two categories of data is used. And if there is some difference there. And do these data ever, let’s say merge, like these, it’s part of like the process to connect an identity to the anonymous data. But we
Jason Davis 23:00
That’s the hardest part of the whole process is actually thinking about, you know, how you know how identity emerges and evolves? Yeah, look, ultimately, it’s not a linear process, you can have, you have two objects that you have identity type anonymous, that can merge into identity identity type of node. Yeah. And then you have a third identity type known that can merge into that as well. And the identities can change. Yeah. And then there’s all sorts of corner cases that have to be dealt with. And there are all sorts of generalized cases that are required to actually do the problem properly. Yeah, but yeah, but 100% is, you know, there’s real complexity here. And the bookkeeping, you know, requires some meticulous domain specific understanding.
23:44
And this is like, part of like the CDP responsibility to do like to reconcile like that identity and like, create these identity. I don’t know, like, database or graph or like, I don’t know, we’ll talk more about how it looks, but whose responsibility is it to construct and maintain these identities?
Jason Davis 24:01
So this is the million dollar question, Costas. Yeah. And I think you asked the reverse ETL. Guys, you know, kiss ash, I saw him the other day for, you know, at a conference in the Bay Area, or talk about this at length. You know, look, I think five years from now, I think the world is gonna look a lot different. Yeah, but let me tell you how it is today. Today, you know, I’d imagine you’re nine out of 10 listeners, if not 49 and 50. Listeners, you have the data in the warehouse. Yeah. When it’s relatively clean, they probably have some reasonable metric definitions, and they’re probably outgrowing their liquor. Yeah, their liquor models and trying to move it upstream. Yeah. And they’re adapting DBT. And, you know, using best practices. Yeah. The fact of the matter is, maturity around identity modeling. Yeah, today, and by the way, you can’t, you can’t identify their identity, you can’t build a customer 360 Because this, by definition, isn’t an integrated view of your data plus your identity to enable marketing if the identity isn’t done properly. per lead in the marketing application, the customer pieces can happen. And maturity today across data engineers, data analysts, data scientists, and certainly open source tooling, along with any sort of dedicated providers that do this, it’s incredibly low. Yeah, and the challenge is, and this doesn’t mean that, you know, a motivated data engineering team can’t take this on as their h1 project. Yeah, and the boat, a set of folks and figure it out and ship it. Yeah, at some point next year. But what it does mean is that the science experiment has a lot of, there’s a lot of risks. At the end of the day, there’s a lot of detail that’s still, you know, the unknown unknowns that lie ahead for so many folks who roll this on their own, you know, so our strategy is, is look, you know, we understand this, you know, we want to get everyone from zero to one, you know, one, you know, one can be a small step, or a big step, depending on on the eyes of the beholder, you know, and then the key there is extensibility, and enabling your look, every one of these corner cases can change from business to business. Yeah, the general approach, you know, there’s a high degree of consistency. But when you really get into how things work, you know, there is a level but if you can’t go from zero to one, don’t try to go from nine to 10.
Kostas Pardalis 26:10
Yeah, no, makes total sense. All right. So, okay, we have talked about the data a little bit like the identity. So you mentioned the data warehouse. And my question is, like, all these data and the identity, like, how is it represented inside the data warehouse? Like, what? Let’s say if I set up a data warehouse and put Simon data on top of it, like, what if we look inside the data warehouse? Well, I’m going to show you there.
Jason Davis 26:42
I mean, it does all matter what you have, yeah. And what you have is your most likely reflection of what’s important for the year, and where you’re going is going to be a reflection of how your business teams put pressure and align strategy with your initiatives to further build that data warehouse. Yeah. So again, I think, you know, I turn the question around, and ask, you know, what should the data journey be, you know, you as a business era is looking to evolve, you know, where they are today, you know, which is probably, you know, all the data is there, you know, some metrics are defined, you know, but the some of the aggregates some of the the nuances of the specifics around, you know, around various as business are still on the one or two year roadmap. Yeah, you prioritize that roadmap. Yeah. How do you take what you have, you know, and drive value today? And how do you align the interests of the business stakeholders, do with the strategic priorities across the data team, to make sure that you’re being implemented? You know, certainly going into next year, in this macroeconomic climate, I think there’s gonna be very little patience, you know, for, you know, for big science experiments and wondering strategies that don’t align with what actually needs to happen to, you know, show clear revenue.
Eric Dodds 27:54
There are some minimal requirements or best practices in terms of, like, what data should exist in the Data Warehouse before someone starts the journey of building a CDP on top of that data?
Jason Davis 28:07
I mean, so there’s a quiet I mean, first question to ask, is, there’s our strategy. Yeah, there. Yeah. And other CDP’s have other strategies as well. Yeah. Like our strategy starts with our customers looking at the warehouse as a source of truth for what they’re trying to do. You know, if you are a Salesforce shop, you know, this is relevant, you’re probably to 99 out of 10 people on the podcast, but if you’re a Salesforce shop, yo, and, you know, you’re gonna buy Salesforce, CDP music, collect other Salesforce data, you know, if you have a big gaps around data collection, from web and mobile, then you’re gonna look at a solution like routers, and, and that will populate data in your warehouse. Yeah. And while RudderStack activates, you know, our perspective is dead, in some sense, take a view of data as well outside of what RudderStack might be collecting, and look at a broader view of data that exists within the warehouse that might touch offline context and beyond, they come to us and you have nothing today, you have no cloud data warehousing, no cloud data warehouse strategy, beyond that, you know, it’s not a fit, you’re gonna want something else is something that’s out of the box, and just provide end to end value, you know, that starts the data collection and ends with activation. But yeah, if building a data strategy, yeah, that’s extensible. You know, it’s core to what you’re trying to affect. Yeah, then, you know, we have a story that can at least be considered. Alright.
Kostas Pardalis 29:32
And okay, let’s assume here that I have my data in the Data Warehouse. The data looks good, clean. We make sure that we have all the identities there. Like we can act upon this data like now we have to do something right with this data. So what’s next? Like how does let’s see the lifecycle of what a CDP looks like after we have the data that we need, and we can access it with like And Simon data, right? Like, what are we going to do next? What the marketing is going to do next release date
Jason Davis 30:06
100%. So, in marketing terms, there is a buzzword and I know you guys are going to beat me up for even going here in marketing terms of the buzzword around something called One to One personalization. I bet that actually 10 out of 10 people on the other podcast are familiar with all the marketing. Yes, mumbo jumbo, we like to use a term that we call one to one data. Okay, what does this mean? This means that if I’m a marketer, you know, I want to have access to the data that I need you to build segments into personalized, I want it in a one to one context, I want you to I want an application that is actually designed to integrate in jest and effect segmentation on the data at the granularity at which you did exist. One of the challenges with approaches like reverse ETL, yeah, as you have you know, you, you have your data and Snowflake, congratulations, high fidelity, it’s fully clean. It has number four quiet, but it’s fully clean. Yeah. And yeah, you have rich schemas that represent, you know, event history online, offline, you know, object metadata inventory, you name it, you know, and suddenly, you need to reverse ETL that data into your marketing tool. But guess what your marketing tools are built on MongoDB? And suddenly, you’re faced with a set of pretty difficult design trade offs around what data am I now throwing out? And then you go to your marketing team, and you have these lengthy conversations about, well, what are you trying to do? I remember, he doesn’t know, okay, he’s gonna be agile, and they’re also not data engineers. So it’s incredibly hard for them to have a productive conversation. Yeah. So, you know, tear it up and answer your question directly. costus, you’re really, our vision is to put the data in front of the end business stakeholder. Yeah. And we’ve invested in materially around, you know, incredibly flexible schemas, you know, and powerful segmentation capabilities that allow our end users to access the data and use the data, you know, in the finest of granularity, as an example here are around, you know, what this loss of data fidelity or throwing out data? Yeah, actually it can look like, you know, if you, if you have a segmentation layer, that can only represent the USA, the number of purchases, the or the dollar value of the purchases that you’ve made over the last year. Yeah. But if you have a question, is this effective, let’s say, I want to identify anyone who’s bought a full price item over $100 in the last 12 months? Well, certainly, that’s going to require you going back to the source and doing that analysis. But if instead, you can actually, if you have the interfaces in the application that allows for that data to be query directly by the end business stakeholder without SQL, then suddenly, you’ve saved an entire round trip. Yeah. Which if you have a functioning feedback loop between marketing data can be within a day. But for most enterprises, it’s a sprint period, which is a couple of weeks, or a month. Yeah. And then you have to ask, how often does this happen? And the answer is, that happens all the time. And this is really where a lot of the friction comes into, believe in a world where your data teams and marketing teams collaborate very in a very productive way. But we also believe in a world where, you know, when you look at roles and responsibilities and workflows, they should be separated in a way that allows each of them to do their jobs independently.
Kostas Pardalis 33:23
Okay, that makes total sense. That’s like a very good example. What is segmentation? What does like, I want to make sure that like, try also to talk about segmentation, first, from the marketing perspective. And then I’ll ask the same question also, like from the data engineering perspective, right? Like, and try to communicate with like to both audiences out there, but let’s start like with the market here, what does it mean, our market here, and I want to segment my data, what does this mean?
Jason Davis 33:56
Yeah, so the inputs to a segmentation interface are, you know, properties on the customer or fields that, you know, relate to the customer? You know, so we’ve gone through enough examples over the last, the, I guess, 35 minutes here, that I won’t rehash them again. And the outputs are a subset of your customers that display a set of properties. magic behind segmentation manifests in terms of a powerful UI allowing you and business stakeholders to to filter and refine the event set for 100% of your customers? Yeah, across their behaviors across how they purchase inventory across any of the objects and entities in your data warehouse and to whittle that down to 2.3% of customers who experienced or exhibited behaviors Y or Z do or any conditions that are specified, justifiable within the UI. And the basic optimization problem around segmentation? Yo, is to provide a data model and an interface which is as powerful as possible. Yeah, yeah. Out of the last 100 questions. Yeah, that you know, the marketing team has tried to do in their segments. In UI, How many were they able to actually figure out and do on their own? You know, how many of the guests say, Oh, well, I give up, it’s too hard on how many to actually then have to go and escalate to the data team to add new fields to get it done. Yeah. Because they were a generous case of segmentation, to have a segmentation UI, with, you know, with one field. Yeah. Which is the latest field that your data team put in. And when you want to segment, you ask your data team to build some 1000 line query, they build the 1000 library, it’s ready two weeks later, hopefully, it’s grabbed, you create a new segment with condition x, and then you’re done.
Kostas Pardalis 35:32
Yeah. And why is this like such a hard problem? Like, why do we need, like a user interface that ‘s so sophisticated for the market, or like to create these segments? Why is it such a hard problem?
Jason Davis 35:48
Look, I mean, it all comes down to use cases. And it all comes down to you know, it all comes down to you. Yeah, especially in today’s macro environment, you know, where understanding customer behaviors or customer behaviors? Do they change all the time? Yeah, when COVID hit first hit everyone, when everyone indoors? Yeah. And then ever thought it was over? Yeah. And then delta came, you know, then Omicron came, and now everyone has RSV, apparently, and the hospitals are overflowing. Yeah. And on top of that, the economy is going south. Yes. And all the data and the assumptions around, you know, the 12 fields, the 12 fields of customer data that existed in the fall 2018. Those assumptions are gone. They’re violated, I should say. Yeah. And, you know, in today’s world requires a much much deeper access to data to, you know, to really better understand and respond to the needs of the customers. And, you know, if you look at the composition of 99% of Aubree for marketers today, they’re non technical, you know, to be no sequel, yeah, they need a composable. And reusable construct, it’s easy to use, the rest of the team can have visibility in it so their cmo can look and be like, what are we actually doing here? Yeah, fundamentally non technical.
Kostas Pardalis 37:02
Yeah, it makes a lot of sense. I feel like both of you. And Rick, like during the conversation today about complex queries, like just like a few moments ago, you talked about like, the data team that will go and like, build, like a query of like 1000 lines, let’s say will take like a week. And like all these things, what makes the process of creating these queries on the data warehouse to protect for the particular workloads, like SCTP? Like does, right, like so complicated and hard? And I’m talking like, from the, we can assume you’re like, a technical person whose cost to do this job, right? Like, we’re not talking about marketers because obviously, when we are talking about non technical personas, like they shouldn’t have to write any code, right. But still, it seems that there is, like, intrinsic, let’s say, complexity, in representing the processing for the data warehouses and executing this logic over there. So why is this happening? Like why? It’s challenging?
Jason Davis 38:05
Yeah, I mean, I think also maybe provide you a mini segmentation one on one tutorial over the next two minutes to your yeah, there’s Yeah, you can build a segment of anyone who bought in the last year, you have in reality, that’s not how it works. You know, marketers, for one, have a notion of personas, personas might be early adopters, your product. And the different definition might be anyone who’s bought a product within seven days at launch. Personas might be longtime customers, they may be folks who have high margin customers, anyone who’s purchased over $300, across a set of high margin products, you would have margins about 40%. Yeah, when every business is different, you know, BarkBox was one of our first customers actually, right? The customer we share with RudderStack Yoga, they have corporate centers, running heavy chores, you know, people who you have dogs very aggressively chew. They’re either toys, you know, they’re not many businesses out there, you know, with that kind of segment. But that’s a core persona for them that defines their brand. And everything they do should, in some sense, you know, consider that the first layer is around, you know, what we call base segments from our platform, no core personas. On top of that, there are exclusions, yeah, these are people who you don’t want to market to, if someone is active, if someone is actively engaging your support team, and they’re really not happy with the business, you don’t want to send a promotion law. Okay? You know, if there’s something that can be compliance issues, or legal issues where you need to exclude people from the audience as well. So here are two sets of segments and on top, anything you might want to do needs to be considered an overlay. And then on top of all the examples, I just went through the required segments to consider behavioral, non non customer objects. Yeah. And then bespoke customer behavior, you know, either in the last few minutes or the last few years.
Kostas Pardalis 39:53
Okay, that makes total sense. All right. And one last question for me and then I’ll give the mic back to Eric, because we’re getting close to the end of this episode. So you mentioned, let’s say, a foundational part of the architecture that you’re operating on is the data warehouse, right? So from the data warehouses that we have today, like BigQuery, Snowflake, Redshift cetera, what you would like to see in the future to be implemented by them, that would make you happy for the stuff that you are doing at Simon data lake as a CDP that has to work on top of these technologies.
Jason Davis 40:37
Real time as well. Yeah, I think technologies like Kafka and Confluent have had good adoption. Yeah, in certain pockets of large businesses that have massive throughput requirements. sequel was a standard sequel as a, as a language doesn’t really map very well to real time data. I think as a category we have real work to do. And it’s not just an infrastructure problem. Yeah, it’s just a cool abstraction problem. Yeah. And, you know, for us, yeah, when we think about the world of data, you, we can’t be around a real time data warehouse, but it comes with real problems. And every year, those problems get better for you but again, the basic abstraction problems on SQL aren’t getting any better. Yeah, so I, you know, when I sort of look at what you and I asked you in the future, the big question we always ask is, what does cloud enabled real time data look like? And you can have another set of players that, you know, are equal to scale as a Snowflake or BigQuery. But instead, you’re bringing a similar set of capabilities? to real time in the cloud? Yeah. And I think it’s going to do and it could very well be yo, yo, yo, it could very well be RudderStack. Yeah, but yeah, we’re, we’re certainly not there today. Yes. So you know, while I think the warehouse, you know, in the Cloud Data Warehouse represents a generalized yo and extensible platform for us to operate a lot of core operations on real time is still sort of this end around. Yeah, that, that we fully support, but doesn’t have nearly the type of elegant solution, as I would expect to evolve in the coming years. Yeah, makes
Eric Dodds 42:19
total sense. All right, Eric, all yours. Okay, time for one more question. Although, I often break that rule. Jason, I’m interested to know, we’ve talked a ton about, you know, Simon data and all the use cases there. But you know, you are a recovering, you know, machine learning, machine learning algorithm builder is ready to do a PhD level. If we just step back and look at the data landscape, you know, as someone who has built data teams and works with data tools, is there anything out there that just excites you in the data space in general? You know, whether or not it’s related to Simon, or, you know, any of the other technologies? We talked about it.
Jason Davis 43:02
100%. I mean, look, I’ll answer your question indirectly. And hopefully, when I get to the end, you can tell me whether it’s a satisfactory answer. You know, when I look at the problem of machine learning problems, I see two camps. Yeah, they’re their problems where the inputs can be fully describable, DSL, machine translation, computer vision, self driving cars. Yeah. All the information that a human has, yeah, a machine has. There are other problems that are not fully describable. You know, yeah, I’m a customer of Barkbox. Like, am I having a good experience? Well, like last night, my dog, like BarkBox, is never going to figure that out. And marking teams are never gonna figure that maybe the support team will figure that out, you know, but at the end of the day, there’s a lot of clues and context that can be used to understand some of the generalizations and, you know, the broader macros and zooming that in, you know, as specifically as possible. And when I look at the future of AI and machine learning, it’s about taking all the clues that we have, you know, in a depiction of a world that is inherently, you know, and then filling in the gaps. Yeah, so I think mapping that back into trans like, I think the stuff, I think the way chat GPT is interactive is interesting, obviously, jtbd doesn’t have no idea what my intentions are, what my questions are. So it’s a back and forth interactive context. Yeah, but I think by and large, you know, what’s most exciting to me is anything that has a human interaction element to machine learning. Yeah, so in some sense, the problem we’re talking about in the show, you know, the feedback loop is around developing developing hypothesis, leveraging the data and the air that might drive it, and then testing in the market and then iterating
Eric Dodds 44:47
yep, I love it. Yes, indeed. Yeah, we should. We need to do a whole episode on Chad GBT. But that’s
Jason Davis 44:55
Moreover, if you guys want downloads, I think that’s the way to do it. Yeah.
Eric Dodds 45:00
it. All right. Well, Jason, this has been wonderful. I learned a ton. I know our listeners learned a ton as well. So thank you for joining us.
Jason Davis 45:09
Thanks for having me on guys.
Eric Dodds 45:11
All right cast this, you know, one of my big takeaways is that I’m so glad to finally hear about a marketing tool that, as Jason described, it, is a good citizen on either end of the data pipeline, both in terms of ingestion and and pushing data back in, because that simple challenge, with so many marketing tools is their terminal destinations, which has been just a huge pain point for me over the years, specifically in terms of data infrastructure. That was great. And I’m super excited to hear that kind of thinking is being done, you know, even for tools that are built specifically for marketers. Yeah,
Kostas Pardalis 45:53
I’ll keep the last month of the conversations that we had about real time and streaming data. And that this is, let’s say, the next frontier of innovation when it comes to data infrastructure for marketeers. And in a way also, let’s say, like the next frontier for the data infrastructure out there, right, like, because as you said, the technology is not there yet, like, yeah, we can ingest real time data into the data warehouse, but how we do it, how fast we do it, how hard it is to do it, and what kind of tools we have to work with real time data. Still has a lot of springs for improvement, too. I’ll keep that and I’ll be looking around to see how the industry is doing to address that stuff.
Eric Dodds 46:42
All right, well, we will keep an eye out and we will catch you in the next one. Subscribe if you haven’t, and of course, tell a friend. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.