Episode 146:

What Is a Customer Data Platform? Featuring Soumyadeb Mitra of Rudderstack

July 12, 2023

This week on The Data Stack Show, Eric and Kostas chat with Soumyadeb Mitra, the Founder & CEO of Rudderstack. During the episode, the group defines customer data, why data collection is such a complexity, bridging the gap of data collection and useful analytics, building a 360-degree customer profile, how Rudderstack translates data in its new profile feature, how future technologies will impact the data space, and more.


Highlights from this week’s conversation include:

  • Soumyadeb’s background and journey in data (5:49)
  • Defining customer data (8:10)
  • The complexity of customer data collection (10:04)
  • What is a CDP and how it is properly deployed (17:12)
  • Bridging the gap of data collection and useful analytics for marketing (21:46)
  • How Rudderstack translates data and the new profile feature (25:30)
  • The foundations of data in building a 360 degree customer profile (30:30)
  • Solutions for the intersection between engineering and business users (34:35)
  • How AI and other future technologies will impact data (41:14)
  • Final thoughts and takeaways (46:30)


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack. They’ve been helping us put on the show for years, and they just launched an awesome new product called profiles. It makes it easy to build an identity graph and complete customer profiles, right in your warehouse or data lake, you should go check it out@rudderstack.com. And today, welcome back to The Data Stack Show. Kostas, we have a pretty special guest today, someone who you and I both have worked for. I still do work where Sumi dad, who is the founder and CEO of RudderStack, actually, who helps us put the show on, which is really great. So that’s gonna be really fun to talk with him about all sorts of things. One thing that I actually am excited to chat about, that I’ve chatted with, seeming to have a bunch of out over the years, but it’ll be nice just to have a casual conversation, you know, about is sort of the history of tooling around customer data has been very marketer centric. And it really the weight seems to be shifting back towards the data team, you know, because data is sort of a fundamentally technical problem. So I think it’ll be interesting to get his perspective on that, because he’s done a lot of work over the years around data teams, ml teams, etc. With a focus on customer data. So I think that’ll be a great topic to cover. But what do you think?

Kostas Pardalis 01:41
Yeah, I think, I mean, outside of, like, the very interesting, like, personal things that we can chat about, because we’ve worked together with Sony, and like, we work with him from a very early stage of the company, right? So there are many interesting stories to talk about there. But outside of this, and like things that like, I think he’s probably like, the most appropriate person to talk about it. To help us understand, like, a little bit better, what is this whole thing about like customer data? Like, why do we have customer data? Why do we need to differentiate? These are the data that we have, right? Why do we need, almost like, a completely different category of managing this data? Why do we have CDP’s? What is the CDP, right? Like all these terms that they have been used, and maybe also, like, abused a little bit? In the market the past couple of years, and we’re still drunk, we like to figure out exactly what all these things are. I think we have the right person to talk about giving some very concrete definitions. And because Okay, we are addressing primarily, like data practitioners. What does this mean for like a data engineer, or like a data analyst, or, you know, like all these people that they get to request to work with this data? And yeah, like Sonia has, is the person who can give us the whole spectrum, like from the data engineering side of things to how the market here is actually like working with his data and why. So we’re definitely going to talk a lot about that stuff.

Eric Dodds 03:31
Yeah, that’d be great. I don’t know if we’ve had a sort of customer data platform type conversation on the show yet. We’ve maybe mentioned it, but I think we haven’t done a deep dive, especially as it relates to data infrastructure. So that’ll be great. All right. Well, let’s dig in.

Kostas Pardalis 03:45
All right. So hello, everyone, and welcome to another episode of The Data Stack Show. And this is going to be a very special one. First of all, because as you have probably noticed, I’m the one doing the introduction to the episodes, which means that I’m going to be alone, from what it seems like as the host. But we also have an extremely special guest day awaits. First of all, I can call friends. But also, it’s a personal note, I’ve spent maybe a little bit like more than like two years working at RudderStack. And seeing like, the amazing journey of the combined like, starting from, like almost like zero to become like the company is today. So we have Sonia, the CEO of RudderStack. Who, by the way, and that’s probably something that not many people know. But the first first like idea around having a podcast actually came from him. And he had this idea when we were working together at RudderStack and I took it and started working on it. And then the rest is obviously like history. But this case study also has been like, supporting a lot of biking, right? Because as you know, like The Data Stack Show it is very independent. So, we have the support of RudderStack to keep the SOAP like it is today. So, welcome and thank you so much for some of them. How are you?

Soumyadeb Mitra 05:32
I’m very well, firstly, musters really great to be doing this show with you. Thanks a lot for the very kind words about me and RudderStack. I mean, I have been a follower of the show. I mean, I mean, being a sponsor is the easy thing. You built an amazing show. Like getting to have those, like in depth conversations. Like, again, kudos to you for setting this up for success. But yeah, I’m very glad to be finally having the opportunity to be talking to you in this film. So thanks.

Kostas Pardalis 06:05
Yeah, it actually took us a while to make it happen, right? Like, it’s probably more than like all those three years around like now the SOAP, and it’s the first time that you are here. So that’s, that’s super, like exciting for me. But before we start, let’s do what we usually do with all of our guests, and I’ll ask you to give us a brief background of yourself. Who are you and what have you done? And what led you into it? Building RudderStack?

Soumyadeb Mitra 06:39
Yeah, that’s a great question. So I can maybe go, like reverse chronological. I’ve been doing RudderStack For the last four years. I’m the founder, and CEO, we started in 2019. Before RudderStack, I spent a year in a company called eight by eight, as a part of the data team, including some of their machine learning customer data initiatives, and building use cases on top of customer data. And some of the experiences and challenges I ran into at that company, prompted us to start trying to start now. Before that I was the co-founder of a company called MariaDB IQ. We were building almost like the next generation, AI driven marketing automation system. It was the early days of deep learning, and so on. And we thought maybe this is an opportunity to transform the way marketing is done. Like we were definitely early, both in terms of the tech, but more importantly, in our customers. Data completeness. So even though we worked with really large brands, none of them had good customer data. And if you don’t have data about your own customers, there is not much you can do. So think large, a lot of lessons in that company, ended up selling to eight by eight, try to do a similar thing inside of eight by eight. And then again, there’s very similar problems around collecting data and unifying data. And that’s what hopefully we will have kind of build something, grab a snack, and we will and will hopefully solve this someday. And, and yeah, prior to that, I worked in a company called Data Domain. I don’t know how many of you have heard of it. But this Frank suit man is now a big shot. That was his first company as a CEO. So a lot a lot from that company. That was my first startup experience and my first real job experience, and probably the only one. That’s like a quick overview. And I have a PhD in data. So kind of like working in the data space pretty much all my life.

Kostas Pardalis 08:40
But that’s amazing. And, okay, I have like a question that I have, like, from my side, like, I think like, I know what it is the answer, but I think it’s very important like to hear your definition on that and share it also like with our audience, because we talk a lot about it, but I’m not 100% sure that like people, like you know, like sell the same semantics or like deeply understand what it is. So the question is simple, what is customer data? And why is it not just, you know, like, any other data, and we need to treat it differently.

Soumyadeb Mitra 09:17
Yeah, that’s a great question. In fact, like, even on that, I don’t think there is a consistent definition or everyone has a different view of that. But like the, in the simplest form, like one way to think about it is like if you are a b2c company, let’s say you are a company who is selling stuff to consumers, those consumers interact with your brand, right? And they do that over many different channels. They’re probably coming to our website and doing things on the website. They’re probably going to have a mobile app and take actions in the app. Or they may be calling your call center. They might be going to your store. They may be making purchases. and so on, right? Now, each of these interactions with your brand produces some data. And a transaction produces transaction data. Similarly, somebody coming to your website and clicking on some products and browsing your catalog that produces data, what products, they look at what products, they collect what they added to the cart, of course, like, different data has different value, like your website, Qlik data cannot be as valuable as transaction data. But like, in a loose way, all of this data can be called customer data.

Kostas Pardalis 10:35
All right, that makes total sense. So we have like these, let’s call them behavioral breadcrumbs, right? User of our customer out there, and we want to collect them. Like, let’s start with that, because I think that’s like one of the first big, let’s say, challenges that we have to go through, what it means to collect this data, because you mentioned there, like many different channels, right. And from what I understand, like we’re talking about different channels, that can’t even be as diverse, I was talking about physical channels like someone enters your shop, right, and makes the transaction through like, a POC machine over there. And at the same time, like they forget something that goes out, and they make an online transaction to buy from your firm again, right? It’s really diverse. So let’s, let’s talk a little bit, actually, two things that I want from you on that give us like a little bit more color on like how complex this process of collecting is, and the history behind it. Like, I know that for the past, like 10 years, like starving, like with segments, there’s been like a lot of innovation in the industry. But there’s also like, How many things have changed, right, like from back then today? So I’d love to hear from you like you’re sharing with us like your experience on these two fronts?

Soumyadeb Mitra 12:11
Yeah, great question. So let me start with the complexity. And then you can comment about the history. Like and you pointed it out already, the complexity comes from literally three things. One is the diversity of sources, you have, as you mentioned, like you have your Android app, and you have like your website and point of sale device, your back end systems and transactional systems and so on. So you need to collect data from all of these places. So you have to like having SDKs that your developers can have made and all that stuff, right. So nothing rocket science, but like it does require a lot of engineering effort, like build this out. That’s number one. The variety of data sources, the second is around, I would say like the volume of data, right? If you have a reasonably sized consumer company, you’re talking about anywhere from millions to like billions of events per day, right. And like if you’re working with some customers who are at peak, standing like a million events per second, so you have to like set up the backend infrastructure to handle this volume, and so on. So like, these are not rocket sciences. But these are like engineering problems, like you have to set up a team to do that. And this is not core to any. I mean, this data is extremely important. But setting up this infrastructure is grant work that most companies don’t want to do. The third, and I think the most important problem is consistency. And I think that is what is often overlooked. Like what I mean by that is like, what is the goal of collecting all this data, right, you want to collect all this data, and you want to firstly, send it to downstream users of the data, you want to send it to, let’s say, like a tool, product analytics tool like amplitude or expand, or you want to send it to like a marketing tool like braze, or Salesforce or some other marketing cloud. Now, each one of them expects this data in a slightly different format, they have their own API and they have their own standardization. So if you have to, like bring this infrastructure from grounds up, you have to like, handle that, right? You have to like, make sure whatever schema and structure you’re using for your behavioral events that can decide to all these downstream destinations you have to manage those vaccinations. Alternatively, you can embed your SP case, but even the SDKs expect a standardized event format, right? So you have to like to manage that yourself. And each one is slightly different. The same goes with your user identities, right? Identity Management is hard, like people have people come to the website anonymously, and they browse things anonymously. But you can still track that activity by setting a cookie wherever it’s possible that says the mobile device. And then like once the login, you may have an email, or like an address, or like a phone number. And you have to like to stitch all these identities, and you have to manage those identities. So this standardization of like, what should I be calling the events? What are the events? Should I be collecting? What properties should I be collecting with those events, so that I can send them to downstream destinations, like this trans standardization is a lot of work that a lot of vendors have to do from scratch, again, and again. And that’s kind of like the three main pillars and data, the variety of sources, variety of the volume of data, and standardization that comes primarily because there are all these downstream users of this data. Now, if you look at the space, right, like, almost like, if you look at early 2000, then the number of channels that somebody would interact with a brag was fairly low, right? If you go to a store, you buy things, right? I mean, this was like pre mobile, early days of the Web. So this was not really a problem. I think that the explosion happened after the iPhone was like, pretty much like everybody thought, maybe slightly earlier, like when it became an important channel. And then like, over time, mobile became an important channel. So you have your sources exploded, and so did the volume, right. So that’s kind of one end. And on the other side, the number of destinations also exploded, right? When people had a specific tool for running email campaigns, and another tool for tracking push campaigns. So you have to get news to get to all these different destinations, right? That complexity suddenly exploded in the 2000s. Like, I would say, early 2010s. And that’s what the space, the technical problems of the customer data platforms came into being. segment, it was almost the early leader in the space, they built this mark multiplexer, right. I mean, you collect from all these places segment to segment, and they can pack it out to all the destinations. So segment was the early movers. But then other companies came like party tools and bacteriums. And so they all kind of handled some version of this problem. And RudderStack was, if we are in the same space, hopefully the last company in this space.

Kostas Pardalis 17:39
All right. So that was very insightful. So my next question is a little bit similar to how we opened the conversation I asked you at the beginning, like what’s the definition of customer data ease. But I think like now that we’ve talked a little bit about the problem. And also we mentioned, like a couple of vendors in this space. There is another concept that has many different definitions. And that’s the concept of the CDP, right? Like customer data platforms. And I’d like to hear your take on that. Like what is the CDP right and is rather psychotherapy.

Soumyadeb Mitra 18:21
Yeah, CDP is probably the most wrongly used term, because everybody and anyone is a CDP. Now, anyone who touches the customer data, they’re calling themselves a CDP. So. So yeah, let me take a stab at defining CDP the way we want to right now. at a fundamental level, right? The problem is what I described earlier, right? You have all these sources, they’re generating all this kind of data, you have to get that data and to all the downstream destinations, right? So it can support a customer data platform, right? But then the space evolved into these initial data multiplexers. Realizing that, okay, we have all the data that we are all everything’s flowing through us, why do we have to just multiplex the data, we can provide more value added services, right, we can stitch all of these different identities and create what is called a customer 360 like a golden customer record. And then we can let our customers come in and run marketing campaigns on top of that, and they can come in and create audiences and an audience is, let’s say, a list of people who have come to the checkout page but did not purchase that’s an cart drop off audience that you want to run your marketing campaign against. So all this Initial data pipelines, companies, they realized, Okay, now we can provide these value added services, we can create this customer 360. We can, like, provide this audience tool, we can provide the activation tool activation means taking that audience and sending it to something like Facebook so that you can run it, we can show them ads. So that was one evolution of CDP data multiplexers, providing more customer 360 Audience capabilities. At the other end of the spectrum, there were traditional marketing automation companies, which had that golden customer record, right? Like think of sales, think of CRMs like Salesforce, or marketing tools like Pardot, and Marketo. They all had customer records, they had the emails, they had the like, phone numbers, and then all the stuff and then those were traditionally used for running email campaigns. And others like sales and marketing campaigns right. Now, because they had the customer record, they figured out why I don’t layer on this behavior data. And I can provide more insights to, to run this to personalize these campaigns. So they also evolved into collecting this first party data, augmenting their capabilities to provide this customer 360 and segmentation capabilities and so on. So they also call themselves a CDP. So now you have the space with a mishmash of data pipeline companies providing some capabilities and audiences or the traditional audience companies providing data pipeline capabilities or data collection capabilities. And this database is now a CDP. And in that sense, RudderStack is also a CDP where we help our customers collect data, we help our customers unify with that customer, 360. And activating all the downstream tools. Where we differentiate is all of this happens on top of the data warehouse, right? We don’t store any data, this happens on top of customers Snowflake. I mean, I can go on and on. But I mean, that might be a topic for a separate conversation. But that’s how we position ourselves in this market.

Kostas Pardalis 22:15
Yeah, that makes total sense. And, okay, let’s get a little bit deeper into the data warehouse part. So what is the added value to the organization? By delivering, like, their own data, right? They’re all customer data into the data warehouse, and then start building on top of that the different layers that the organization needs to reach the point of having these customer 360 views or like the golden record, like the customer Golden Record. And how do we also bridge the gap from going from that? And data result or let’s say, data, product results to actually having the analysis and nothing analyst but let’s say the market here use this information to go and do marketing, right, because I would assume that if I’m a marketeer like the last thing that I want to do is like play around with databases, like while I want to do is probably focus on my marketing tools and being able like to run my campaigns generate revenue, and like all that stuff. Right. So how do we bridge that gap there? Yeah, that’s a great question. In fact, like,

Soumyadeb Mitra 23:35
That is the reason a lot of the initial customer data platforms came into existence, like, you had these traditional marketing tools, right? I mean, you’ll use like a sales force, or something else, the market here would use that. And they would complain about, okay, we don’t have data we need, like web data. We need mobile data to personalize their experience, right? I don’t have that data. They will go to it and say that, okay, can you set up these data pipelines to collect data from the website from the mobile app, and like, secondly, the data into my downstream tool, it would say like, oh, this is like the 10th project in my list. And plus, I don’t even have the capability to write these SDKs and create like this data pipelines and manage the pipeline. So that led to this whole space, that marketers have thought, Okay, we need some of the tools. And I don’t want to like, like, wait for it. So they will go and buy these vendors. And these vendors, like segmanta can solve the early adopters, early players in the space, they will say, oh, here’s the SDK, you integrate EPM, just embed this SDK. And like, that’s all your marketing people will be off you they will not come and bother you anymore. All the data will magically start flowing into their tools, and so on. So like, a happy ending. I think that’s it. that materializes the market quite a bit right. I mean, like, by no means it was a failure. I mean, we have a segment that got acquired by Twilio, there was like a huge customer base. I think when it started failing was the wrong completeness of data. I mean, it was always a laggard project. And I mean, it could not set up this infrastructure to collect data, store data process data that changed by late 2020. Like the late last quarter, like last decade, right. When the people started buying data warehouses and investing in data warehouse technologies, they started centralizing all of that data. So, that is what is triggering the new wave of CDP’s. But the traditional CDP’s, like, tried to address the exact problem you’re talking about. That make sense?

Kostas Pardalis 26:00
Yep. Makes total sense. All right. So, we have, let’s say, like in the past decades, there’s like a lot of data related infrastructure that came into the market that has changed a lot of dynamics around what can be built on top of the data that the company has. So we have tools like to collect the data for them into the data warehouses. Data warehouses are quite easy to manage, because they’re on the cloud. We have tools for modeling, and manage modeling with DBT and the likes. But still, like, even if we solve the problem completely of like delivering the data into a data warehouse, there still is, let’s say this process of like taking this role, noisy shoe, both events about the user, like all these breadcrumbs that we put like into, you know, like a basket in a way. And we need to transform it into something that can be digested by an analyst or even a marketeer. Right. So how can we do that? Like how we can do that, like with, with RudderStack?

Soumyadeb Mitra 27:13
Yeah. So before I answer, how can you do that with RudderStack? Let me briefly explain, like, how can you do that, like without RudderStack? And what are the pain points? And I think that will help understand, like, how you do the value of what RudderStack does. So you’re 100%, right, like that is the problem. And you get all these data streams, you collect, you use the tool, like RudderStack segment, your homegrown thing, and you collected all these behavioral data into your data warehouse, right, you have like 20, different tables of like 20 different events, then you bring your ETL data, like again, through RudderStack, Fivetran. Whatever, you name it, some other ETL tool, and you have ended up with like another 2030 different tables. Now, what you are trying to get out of all of this is like a clean customer view. Think of it as like a one row per customer with a bunch of attributes computed for the customer. And that’s all a consumer of the data, whether it’s an analyst , whether it’s market here cares about it. And when I say attributes, these are things like the total revenue for the customer, like how many times they have come to the checkout page, what are the seven products they have looked at? Like you can think of all these features like and funnel features, right? Have they come to the checkout page? Like these are all the features that we are computing for the user, which your downstream users of the gate are concerned about? So yeah, all this raw data and your left and you want to get this like a clean customer view on top of your data vendors. Now, how do you do that? Traditionally, you will go and hire a team of data analysts to come in and write the sequel. Like some of these would like to go into like pennies like this was a complete the hundreds and hundreds of lines of SQL DBT almost like was a big force in this space, right? Like instead of you having to like manage, like the poorly managed the sequel’s. Now you can apply software engineering to this practice with DBT, like with DBT, where you could organize the institute projects, take them into GitHub. So BBT got a lot of Saturday in the space. But he still had to go and write those transformations that you have to write, how to figure out stitching identities, which is the hard problem to do on top of SQL, you have to figure out how to create like features or funnels, which is again, like manual is hard to do in SQL. So you have to still go and write this with a team of data analysts. But the biggest problem here was like it’s not just one time that you hire a team and that they come up with a clean three is customer 360. Right? Every time your market wants a new feature. They let’s say they want total revenue in the last seven days. They want something else in the last 15 days. You have to go back to the data team. They will go and update the models. Then they have to push production every day. like probably anywhere from one sprint to like a couple of Sprint’s to even get this out, rolling. So that is the state of the art and very hard to hire. And like men go through this slow, painful process. Now, that’s what we’re trying to solve with RudderStack. DataStax ‘s vision was to enable this end-to -end workflow on top of a data warehouse. But something which does not require this painful process, right. So we are launching this product called profiles, which simplifies that. But number one, you can define all these things in a very high level language, and you don’t have to write complex SQL, we also have the UI around it. So even a non tech person can come in and define the features, everything goes into, like a contract, which can be checked into engagement, so you still have software engineering best practices. But then it exposes this process to non tech or like non SQL experts. That’s what profiles help our customers with.

Kostas Pardalis 31:03
Okay, that sounds awesome. I have like a couple of questions here. So first of all, let’s talk a little bit about in the foundations of like building a profile, right, like you mentioned, a couple of things you mentioned about like identity stitching, and a couple others, but is there like a minimum set of operations that there’s no way that you can avoid when you have like to go and build this, like customer? 360? Table, right? Like this table where you have one row per user, per customer? And then like a number of columns, each one representing something like doesn’t matter what, but what about like, if we want to, let’s say, like, define like, like the minimum set of problems and operations that somebody needs to do? They’re, what do you would say are like the fundamentals?

Soumyadeb Mitra 31:56
Yeah. So there are literally like, three things that need to happen to get to this customer 360 from your, like, dirty data. On the left, you have all these like events and like an ETL source, and you want this like clean transform data on the right, right. The first step is the step on identity, stitching. All right, at resolution. Like, as I was mentioning earlier, you have all these different identities about a user, if somebody comes to your website, you assign a cookie ID, and then they provide their email. So you have your email. Similarly, the same person comes to the mobile app, they have a device ID, and then they provide their email. And then now you can stitch all of these people into the same user, right? So when you’re computing a feature, like the total number of times somebody has come to your particular product page, you have to combine this mobile activity and the web activity based on these identities, right? So that’s like step zero, that needs to happen, like stitching all these identities. And it’s actually a hard problem, because like, it’s not just like one level IDs, right, you can have like multiple levels, right? You have like an email with a phone number, that phone number starts with an address. And some of these could be like non like, address is a good example. But it’s not deterministic. So you have to create this ID graph and stitch all of them into like, one, one ID, that’s step one. The Step Two that needs to happen is like you have to now define these features, right? Total number of times somebody has come to the login page, total number of times somebody has viewed a product, total number of times, like total revenue, like these are all interesting features. But then every business may be caring about features that are important to them, right? So you can’t have a static set of features. And you want this flexibility where anyone can come in and define those features. And what I mean by anyone is like, not necessarily just a data engineer or data analyst, right? You want a marketing person who is using that feature to be also, like come in and define features wherever it makes sense, right? So you need this, like an additional layer where multiple people can define and contribute features. So that’s the second step that needs to happen. The third step is actually what is called like, some version of time trap, right? It’s not enough to just compute the features. At today’s time, there are use cases like let’s say training a machine learning model, you’re trying to train a job model. And to do that, you need to compute the features which go into the model at the point of churn, not today. So a user churn six days ago, so you want their features at that point. So you need this. You can call it some kind of a lightweight feature store, where you’re not just computing and keeping track of today’s features. But you should be able to go back in time at any point and compute that feature at that time. And this is a hard problem. So you need these three things to happen. To create a usable customer 360.

Kostas Pardalis 35:05
Okay, that was super, super interesting. And you mentioned something that I find very fascinating as a problem. And I’d love to hear how you, rather sad, are dealing with it. And by the way, it’s one of the reasons that I’m really personally attracted to these types of products because we don’t have just one single persona that we can consider as our user, right? Like we are always in this intersection between engineering and business users. And no matter what we need somehow to make these people that are like, very different in even the language that they are using for their work like to work together in harmony, right? So you build it very well. You mentioned like we have a data engineer on one side, but we also want to allow the domain experts and the domain experts obviously is like the marketeer here to be able to define and express what they need. Right. So how do you do that? Like, how do you deliver, like a product experience that can resonate? Both with the engineering persona and the marketing persona?

Soumyadeb Mitra 36:25
Yeah, that’s a great question. And I mean, by no means, I can claim that you have solved this problem, right? This is almost like, like a, I don’t know, if anyone who solves that will get like whatever is the equivalent of no Nobel Prize for data engineers. But like, profiles is an attempt to do that, right. And the way to think about this is, so rightly so that when I pointed out, there are all these different personas that need to come together to work on top of this customer 360 Right, I mean, if you take that example, there are like data engineers who are responsible for producing the data and cleanly modeling the data. And then there are marketing people who are using that data. And they might want to define their own set of features like their own funnels, and you want them to come together. Now, I think, yes, your current experience has to bring them together. But there are also like, like, boundaries, like there are things that a marketing person does not want to do cannot do, right. A good example is an identity district. I mean, often as a marketeer, I mean, you know, the data sources you don’t like, this is my website, this is my mobile app. I mean, you have, you really know the nitty gritty details of like, how IDs are generated on these apps and how they are stitched together and all that stuff, right. So, that is a problem that is best left to the engineering team. Similarly, there are things like, what to call an inveterate should it be called like product underscore purchaser is a simpler problem, but it still like somebody has to take care of it. And so that’s, again, can be left to the engineering team. So there are things that the engineering team has to contribute like the stitching rules, and how does it happen to create this like, some version of initial customer 360? And then you want your marketeer to come in and build on top of that, right? What does a marketing persona care about? Right? I mean, they want to create funnels, right, when you want to say, give me all the people who have done X but not twice, right? That funnel step should not require going back to any time that you want. That user experience will define funnels right. Now the funnels are defined on events, which are defined by engineering, you define the properties, and they make sure that the events are clean. But it comes to Yeah, but the ability to create funnels should be exposed to the marketing person. And there are other simple events, alright, I mean, total number of times somebody has done a page for you, right? I mean, in the last six days, like that is a feature that again, should be exposed to a marketing person who shouldn’t have to go back to engineering. So now, these are the like, so the profiles product kind of enables this use case where like, a data engineer can come in and, like define these districting rules and these complex features into a config, commit to an app or push it to understand and then on the RudderStack UI non tech person can come in and build on top, they can build funnels on top of these features defined by the data persona and they can different other simple features and everything goes back to the same like conflict like the kind of linked to the whole country that was built by the data team. So that’s what we have done. There are problems we have not solved right there are things which actually require which don’t have that clean boundary, right. Good example is, what event should it be? a hacky marketing person may be interested in a new feature, maybe they are saying that, okay, I’m interested in how many times somebody has come to a specific page, but then the event for that may not be even present today, right. So somebody has to go back and go and instrument the event, which again, the marketing person cannot do, right, like somebody else to now go and instrument the event. And then like, make sure that the right properties are captured, like that’s a complex workflow, that again, I don’t think we have solved, but hopefully at some point, we’ll get to it.

Kostas Pardalis 40:29
Yeah. 100%? Well, that’s like, I think that’s a great way to describe like, in a programmatic way, like, the kind of problem that has to be solved here and how hard it is. And I do find it very refreshing to hear like, from someone about like boundaries, because many times and like What’s especially like, we see from products that they start from more engineering, let’s say mindsets, usually are very absolute in terms of this is like how things should work, right? Like they try like to impose, let’s say, a way of doing things, which of course, like, okay, it might work like for people that are like, like minded, like, engineers, but your your peers, but you can’t really go out there and like ask someone who’s a market yet like to change the way they think, right? Like there is a good reason that they think the way they do and that’s because that’s what helps them deliver the maximum in whatever they have to do, right. So having these boundaries and using these boundaries, like to develop a well defined, like user experience says, on a product, I think is key. So I’m very curious to see like how this has been implemented as part of like this new

Soumyadeb Mitra 41:44
product in RudderStack.

Kostas Pardalis 41:47
But we are close to the end here. And before we close, I’d like to ask you, some of them, about something that relates to the term that you use. Sometimes you mentioned the term like feature store, which is obviously very related, like to machine learning. But we also like living in very interesting times, right? Like there’s AI out there. There are some very new ways of interacting with machines through interfaces like Chartio, CBT, and all that stuff. And of course, like all these things, although these new technologies, they are data related technologies, right, like they are based on that, like we didn’t have like the data, we wouldn’t have the models based on your experience, and I’m talking about like, hear, like, your whole experience, right, like, starting from, like everything that you have done, like in your career so far. How do you see the future? What do you see next? And how do you see these new technologies and paradigms affecting customer data, and the space you are in? Yeah,

Soumyadeb Mitra 43:06
That’s a great question. And that’s something we talked about quite often in our company. So if we take a step back, right, I mean, we always wanted to have this holy grail of like, one to one personalization, right. Anything that you hear from a brand should be perfectly tuned for you, right? I mean, based on what you have, what your interests are, and what your desires are, and so on right? Now, there were two problems. To make that happen. Right. Number one was like to have all the data about your retirement, like unless I know you can even personalize, right, so having all the data was the first step. The second step was, even if you had all the data, how do you like, personalize it? I mean, like, if I have a million users, I know everything about them, their likes and their dislikes, while interacting with the brand outside of the brand. And what else? Even if I know something, if you ask like a human to come in and craft the perfect message, they can do that. But how do you make a machine do that? Right? So neither the data problem was solved nor the ML problem solved it. That’s what we tried to do in my previous company. And then like we kind of struggled on both the fronts right now, what Chad tippity has done is hopefully solve the second problem, right? somehow magically, if you can fill in all the data, like I tell it that okay, these are costs. These are all the products we have looked at in the past. This is where he lives. This is what his interests are. craft the perfect marketing message. I mean, he’ll have to do some product engineering, but I think chargeability can be a good enough answer, right? Allocate answers that are personalized to you, you could not do that earlier. That’s why you have to do this broad segment based marketing. And you have to create segments for all people in San Francisco. I’ll do something for all people in New York. I’ll do something else that those days will be gone in like five years, right? Everything will be personalized. And all the generative AI techniques will make that happen. Now, like we are, we are going we are going to generative AI, but we will be using a lot of other brands will be using that, right. But the data problem still has to be solved, that you still have to get everything that you know about a customer to call in to these genetic techniques. Right. And hopefully, that was not a big problem, because you couldn’t do much with the data anyway. But like this, this problem will explode over the next 10 years. And hopefully, we will have a role to play in that. Data problems, if that makes sense.

Kostas Pardalis 45:55
Yeah, absolutely. All right. That was like an awesome conversation that we had. I hope we are going to repeat it like material live. I’m like, after like another three years. So I’m looking forward to having you back. So Samia. But before we go, work on our listeners, like learn more about both brother stack, of course, and also like the new the new product,

Soumyadeb Mitra 46:24
The best way to do that is go to our website, we will be launching it on our website, request a demo, we will also do a how can you use show HN? So yeah, we ‘re kind of one channel, the other is like hit me up on LinkedIn. My first name is fairly unique, some of them so there aren’t too many of some others in the world. So it should be easy to find me on LinkedIn. So hit me up. And I’ve had to, like, I’d love to get feedback from anyone who’s interested.

Kostas Pardalis 46:57
All right, thank you so much.

Soumyadeb Mitra 47:00
Thanks for having me. I really enjoyed chatting with you.

Eric Dodds 47:04
Alright, cost us. What were your big takeaways from this? And the reason I’m so interested is because you worked with Sumita RudderStack, you’ve built tooling that had a pretty heavy emphasis on customer data, and getting in into the warehouse. So what are your takeaways? What do you think about his thoughts on CDP, the landscape, etc?

Kostas Pardalis 47:32
Yeah, there are plenty of like, really interesting insights, the conversation that we had, which told me I like, first of all, it was very interesting to go through, like the history of this category, right? Like how things started, like more than 10 years ago, and how they are still evolving? And how, although, you know, like, every time like, what do you have, like a cycle in the market? Like, it feels like the problem has been solved, but actually sounds like the beginning of another iteration of like, getting closer to, to the solution, right, like, so. It was very interesting, like to share all these things about what started like the first iteration of these platforms, right, with, with segments, and even before that, and where we are today, like, how do we work with this data today? And like, how much do we still have to build out there? Right. That’s one thing. The other thing that I found, like, extremely interesting, and I think it’s like one of the most interesting challenges in these types of products that are very mobile data oriented, is that you never have only one persona involved, right. And I think CDP’s are like, let’s say, customer data related infrastructure is probably one of the most exaggerated ovaries. Because you have all the data infrastructure that you need. You even have like the application developers, right. But at the end, you have the market here, and like the market theory is who is actually going to turn all these wars that have happened before into actual value, right? And the market here is like a very different persona compared to the rest. So it’s very interesting, like, we had a very interesting conversation about the difficulty of building products that can, you know, satisfy like all these different personas. And of course, as part of that, we also had the opportunity to see what RudderStack is doing today, new products, new solutions that RudderStack brings to solve all these problems. It was a very interesting conversation. Tamia does not talk that often or like as often as he should, in my opinion, because he has like, he’s really good in like, helping us understand these complex concepts. So I would suggest everyone like to tune in and like to listen to the conversation. And there is also a very interesting fact said about the origin of the soul. So I’m not going to say more about that, but that’s deletion. Yep.

Eric Dodds 50:36
Sneaky. All right, well, tune in for some insider information and a complete breakdown of customer data platform customer data, the whole nine yards. Subscribe if you haven’t told a friend and we will catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.