Episode 168:

Decoding Data Mesh: Principles, Practices, and Real-World Applications Featuring Paolo Platter, Zhamak Dehghani, and Melissa Logan

December 13, 2023

This week on The Data Stack Show, Eric and Kostas chat with Paolo Platter, Zhamak Dehghani, and Melissa Logan about the concept of data mesh, a complex topic that encompasses both technical and organizational aspects. During this conversation, the group discusses the transition from a monolithic IT team to a data mesh model, the implementation and adoption of data mesh, and the concept of data products within the data mesh framework. The episode emphasizes the importance of empowering domain practitioners and building platforms that support the creation and sharing of data products to enable the implementation of data mesh, and more.

Notes:

Highlights from this week’s conversation include:

Defining data mesh (6:37)
Addressing the scale of organizational complexity and usage (9:04)
The shift from monolithic to microservices (12:24)
The sociological structure in data mesh (13:59)
Data product generation and sharing in data mesh (17:27)
Data Mesh: Simplifying Data Work (24:09)
Getting Started with Data Mesh (29:14)
Building products for Data Mesh (36:42)
Building a customizable and extensible platform to shape data practice (39:28)
The characteristics of a data product (48:40)
Defining what a data product is not (50:45)
The origin of the term “mesh” in data mesh (53:32)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Kostas, we love covering topics that we have not covered on the show before, we’ve done over 150 shows at this point. And it’s kind of crazy that we haven’t covered the topic of data mesh. But at the DBTs conference, they announced a data mesh product. And so we work with Brooks to get literally the author of the data mesh book on the show, so that we can get this straight line story on what data mesh is. Which is great. I mean, this is a topic that a lot of people are talking about, there’s certainly a lot of conversation around it. And I think what we need to do, at least what I’m going to try and do is just put a sharp definition on it. Data mesh means a lot of things to a lot of people. And so if we have the author of the book, the person who coined the term, we can sort of, you know, level set on what data mesh means. That’s what I’m going to do. But I’m sure you’re going to have technical questions, because data mesh is fundamentally technical. So what are you going to ask about?

Kostas Pardalis 01:33
Yeah, I want to ask about, like, the products that support the date on which there are measures like, okay, like, similar, let’s say movements in like, how we change the day, the way that we are thinking and operating like in a business environment. He has like many parts, right, like he has like the people part. And also like the technology part, I think there has been a lot of focus on the people part and like the change of culture, and like the wave of like organizations need to change and like all these things for the datum is like, we made it and delivered its value. But we also like the right people today to talk about products, like what kind of products someone could use that either support or enable the implementation of the promise. And that’s what I would like to focus on. And also probably, if we have the time, like to focus a little bit more also on some terminology, like what’s the data product, for example? And how do you implement the data product? And it’s these kinds of things that are very fundamental as part of the database, as an architecture show, we have the right people to understand the damage from both the organizational, the people aspect, and also the technology aspect. So I’m very excited. Let’s go and talk with him.

Eric Dodds 02:50
Let’s do it. Welcome back to The Data Stack Show, what an exciting episode we have, because we’re going to dig into a topic that is really important in the data industry. But they really haven’t covered in depth on the show yet. But we’re going to solve that. Today we’re going to talk about data mesh, and we have an amazing lineup of guests here. So let’s see the mark, why don’t you start with an intro and quick background and then we’ll go down the line.

Zhamak Dehghani 03:21
All right. I’m sure Mack, the creator of data mesh founder, CEO of Nick’s data, which is a data mesh technology startup. Excited to be here.

Eric Dodds 03:30
Great. Thank you so much. Hello, how about you?

Paolo Platter 03:35
Hi, everybody. I’m Paulo, plasterer, CTO and co-founder of a GI lab. And in the data mesh story since the beginning.

Eric Dodds 03:45
Wonderful. And Melissa.

Melissa Logan 03:47
Hi, everyone. I’m Melissa Logan. I’m the director of the data mesh learning community, very excited to be here.

Eric Dodds 03:54
Wonderful. Well, Melissa, why don’t we start with you because data mesh, I think in some ways is a big topic, but in many other ways, is a simple topic. In terms of what we’re trying to accomplish. Can you tell us about the community because I think for our listeners, for anyone who wants to learn more about any of the topics we cover, the community is going to be the go to place can you tell us about the community and how to engage just so our listeners have sort of orientation on where to go next, if they have questions? Yeah,

Melissa Logan 04:24
absolutely happy to the data mesh learning community is a group of over 8000 data pros who are somewhere in their data mesh journey, whether they’re just getting started or if they’ve been at this for a number of years. We have some of the pioneers from Updata match who are in the community, answering questions and sharing their insights. Our mission for the community is to share resources, increase awareness for data mesh, and help people understand how to get started. We have a website at data mesh learning.com We’ve got a bunch of use cases on there case studies, articles, podcasts, there is a Slack channel where we have conversations about sharing insights about data mash. People ask all kinds of questions there. And we have so many great folks who are willing to share their experiences. We host a range of different virtual events we have, I think they’re even happening weekly. Right now, it’s at least monthly. But there’s quite a few topics we have, on the list a range of different topics. We held our first in person event at big data London very recently, in September, we shared the results from our very first community survey about getting buy in for data mesh. There, we had some great presenters. We also host virtual half day events called Data mesh days, we had one focused on the life sciences vertical earlier this year, we have one focused on the Financial Services vertical in Q1 of 2024. So quite a lot of different resources for the community. We did the survey we recently ran about and we have a white paper coming out about that soon. So truly all resources by and for the community, we really exist to help people as they go on their data mesh journey.

Eric Dodds 06:20
Awesome. Well, if you have any questions, definitely check out the learning community. And Melissa and team will be there to support. I’d love to actually start, you know, in the show, we’d love to sort of get down to the root of things and define what data mesh means. Jamak? Would you help us define what data mesh means? I think a lot of people have a lot of different ideas of what it is and what it could be. And I think a lot of people sort of interpret it in their own context. But give us a level set. What is a data mesh? Sure. You know, there are two hard problems

Zhamak Dehghani 06:56
in technology. One is naming things. And the second one is defining them. So I’ll have a go at it. I included the definition in the book. And I’m just going to say it word by word perhaps and then we can dive into what’s really behind those words. So this mission is a decentralized socio technical approach in managing accessing sharing data at scale for analytical and ML workloads. There are a few things to unpack perhaps in that definition that might be worth just double clicking into. One is that, you know, I coined this term, and I call it socio technical because, you know, for a shift to happen in how we share data, how we discover us produce data, we’re not only need to change in a decentralized way in a distributed way, we not only need to change technology, but we also need to change behavior, relate a relationship with data relationship among teams. So hence, it’s both a paradigm, it’s an approach that includes technological changes, as well as, I guess the social changes in organizations, there is a word there at scale. So that’s I think it’s really key that we know that, you know, data warehouse, maybe the lake and approaches we’ve had, so far really address the problem of technical scale, you know, over the course of the last 1015 years, we have address the scale of volume of the data, we distribute the disk in distributed storage is parallel processing of the data, we have addressed the scale of the velocity of the data with the streaming back once we’ve addressed the, you know, scale of diversity of the data with all sorts of third vector databases to time series and the others, what we haven’t address is really the scale of organizational complexity, and scale of usage and use cases and the scale of diversity of the sources of the data that really put a pressure point on this centralized approach to managing data. So it’s really, we want to get to the point that the data sharing the same way that you know, application API sharing can happen at a global scale, we you know, they actually tries to push as the data sharing method for analytics and ml in a way that can scale out to the global network of kind of analytical data sharing. And the last point is, you know, with operational transactional databases, the data sharing problem largely itself through transactional kind of API’s that services or micro services expose and enable, you know, applications shared, you know, changes to the data or the current state of the data that is suitable for transactional applications. What we hadn’t solved was really data sharing at scale where you need to train machine learning models across multiple dimensions. have data from many sources or running, you know, kind of statistical or other kinds of analytics or reading a large volume of data and correlating it. So that adds an interesting twist in how data can manage that scale. And that’s why all those kinds of components exist in, in the definition of the data mentioned, of course, behind that didn’t follow that definition. There are a set of first principles and then, you know, a set of technologies to support them. And we can get into that later. Yeah,

Eric Dodds 10:29
That’s super helpful. I have several questions. But one question I want to ask. So socio -technological, I think, is a really fascinating term. As you think about data mesh, you know, sort of as a manifestation of socio technological shift, as you look back over technology over the last several decades, is there another major sort of socio technological shift that you would say, was sort of interest industry defining in the way that you believe data mesh, you know, is also potentially industry industry defining?

Zhamak Dehghani 11:06
Yeah, absolutely. In fact, I took inspiration from when I wished I could claim that I was a creative person that made this all up out of thin air. But it was really the observation of how we went from monolithic, you know, single stack and a big tech application development to microservices domain oriented to Pizza teams only, you know, integrating applications and solutions through API. So I think that generally, the migration or transformation in digital organizations from monolithic big, you know, IT team application development to microservices domain oriented and smaller domain oriented Team Services and API oriented capability development, I think that’s absolutely a parallel scenario. And in fact, my hope was that data mesh can piggyback on that transformation that’s already happened in a lot of digital, Native organizations, and follow that trend and continue with, you know, we kind of data so we kind of had biz dev ops teams, we want to get to biz dev ops, data teams and, and follow that trend.

Eric Dodds 12:24
Let’s pull on that thread a little bit. So if we think about sort of the monolithic IT team, right, and let’s, you know, of course, we’re generalizing here. So no offense to anyone who you know, you know, their job titles DBA. But let’s just talk about a DBA as a job title, right, you sort of have, you know, there’s this concept of a gatekeeper who owns sort of a monolithic system, and a gatekeeper, not in terms of they don’t want to help people. But just the technology means that there’s sort of maybe a single pointer or a couple points of ingress and egress. And so it tends to bottleneck when people need data around the company, right. And so that’s sort of the monolithic state. And then you move to microservices. And so maybe you have a team who’s responsible for building an API that delivers a certain type of data. And so let’s just say a couple of examples could be maybe like, sort of a core KPI dashboard for executives would be one example. It maybe could be, you know, some sort of marketing performance metrics for CMOs and their various flavors of this right, but you have a team that sort of wraps around that they could be a data team could be an ops team, those lines are blurred. So that’s sort of the first socio -technological shift. What can you walk us down that path further? Okay, so now we have a team and you choose the example. It could be the BI dashboard with the exact you know, sort of performance dashboard for the CMO. But like, we’re walking us further down the path to data mesh, like what does the social and sociological aspect of that look like from a team perspective? When we go from monoliths, to micro services, and dedicated teams to data mesh, what does the sociological structure look like inside the org?

Zhamak Dehghani 14:13
Wonderful question. So yeah, so let’s imagine we are at the point in time that an organization has gone through that initial transformation of domain oriented teams. And you have an organization that has, let’s say, you’re a retailer and you have, you know, various teams, you have an E commerce team that takes care of E commerce apps. I’m sorry, I’m diverting from your exact example.

Eric Dodds 14:38
No, this is totally. This is great. Yeah. So you

Zhamak Dehghani 14:42
have an E commerce team is taking care of a bunch of e commerce applications and services and you know, it has the transactional database for that application that captures basically all the events that happen on top of that, you know, as the user interfaces with the digital channel, you might have a logistics and routing team that is job is optimizing how the items get across the different warehouses right or from the shipping from the warehouse to the store. And so when you have a sales team, right taking care of the actual sales transactions or credit cards, so you’ve got all these domains or entities, they have their own services, they are providing, essentially application oriented or transaction oriented API’s to the rest of the organization. But when it comes to the data for analytics, and ML, that’s where things actually don’t quite look the same as the rest of the organization. So what has happened is at this point in time, it’s a predate mesh. All of those teams listed there are actually quite advanced in their data stack. What they do is that they basically those teams, ecommerce, finance, logistics, they provide or somehow externalize their data, let’s say they’re pretty modern as domain events in some sort of street streaming backbone that lands that data into the warehouse and in the middle between those teams. And then the rest of the organization that wants to do analytics, and ML suits the data team, that often ingesting those events, and then try to model them or semi structure them and put them into a warehouse and lake, a few other places. And then you know, define and then other teams living their data to try to define the actual somatic on top of it provide policies and access control, there is a governance team is probably sitting in the corner defining some sort of a taxonomy over this data. So they see the whole machinery sitting in the middle, try to turn those domains events that came from upstream that they have no visibility into, to get their head around and arm around it, and then store it in a way that is suitable for analytics. And then you have sometimes business domains on the other side of this pipe or centralized data scientists teams that are being kind of borrowed by those domains to be able to generate value from that data, whether that value is as simple as some dashboards that help a CFO make some decisions. So CMOs make decisions or the more sophisticated, turn into machine learning models that then get embedded into those applications, right to make recommendations. But nonetheless, you’ve got this centralized team and a centralized set of responsibilities to make sense of the upstream data that they have no control over, structuring it, making it available in a way that is suitable for those types of workloads that we just described, right? So when data mesh happens incrementally, that middleman goes away, the responsibility of sharing data in a way that can be discovered, understood, trusted, and accessed and used by dashboards. And buying machine learning training models and pipelines, by analysts and data scientists is shared in a way right from the source, or maybe some newly T with Omega oriented teams that are formed in a way that in a peer to peer fashion, you can now have consumers of analytical consumers of the data talking to the producers directly. And they share data through this concept of a data product that we can get into the definition of later. And there is no middleman to ingest somebody else’s data to produce data for somebody else who is not part of a particular business domain. So that leads to existing business domains, taking the responsibilities around data product generation, and sharing, that leads to perhaps creating new domains that didn’t exist before that the job is purely providing their products for those newly formed domain, let’s say the recommendation domain might be just the data products domain because they’re just providing recommendation data. But yeah, you don’t have a centralized data team that is under immense amounts of pressure. So consuming data that they have no control over and providing data that they don’t really understand the use case for. And they’re just mechanical kind of tourists in the middle, like trying to kind of just move data on. And, you know, then, without really having the knowledge and that’s really, really tall asked to do ask any centralized team to do and I think that’s where complexity of the organization’s reach a pivot point or a tipping point, that model false fails to perform, and you’ve got to kind of start making the shift.

Eric Dodds 19:27
Yeah,

Paolo Platter 19:28
What great things can I do? Oh, yeah, one thing because you mentioned a really important topic. So the gatekeeper. So in the old paradigm, obviously, there were multiple gatekeepers in the decentralization process, which we want to make these domains really autonomous. It’s super important to introduce self service capabilities. So all the duties that are managed by the gatekeeper in the decentralization process must be provided as a service. So in this picture, we need to insert a platform that is providing self service capabilities. And you mentioned it also about previous transformation, where the data measure has been inspired, I guess, also, platform engineering practice is something that is present in the data mesh in the data mesh concept. So the concept of having a platform team that is taking care of providing services to all the other teams is crucial for the data mesh, this is coming from platform engineering, and team topologies. That is defining the different kinds of interaction among teams. So who is a value stream online? And who is providing collaboration as a service or full service?

Eric Dodds 21:07
Pelican? Let’s dig into that a little bit. Because, you know, one thing, and, you know, I’m sure that our listeners have varying, you know, thoughts on this, but I think one of the questions, Jim, because you talked about that, I think one of the big questions that comes to people’s mind is, I’m just going to use the term data literacy. Right. So if you like one of the roles of a central team, even though of course, he described the challenges, but sort of the bottleneck is that they serve as somewhat of a data literacy translator, right. And so, and Paulo, or GMAC help us understand, are there different requirements around data literacy, when you move to a mesh model, and specifically, what I mean by that is, we sort of there’s a very strong sense of democratization. But at the same time, you know, anyone who’s working in an enterprise, you know, that’s a downstream consumer of data knows very clearly that it’s, it can be pretty hard to become data literate, when you try to go upstream, simply because that’s not your area of expertise. And so can you explain the sort of the, if someone wants to move to a more data mesh type, team structure and technological architecture? What does that look like from a data literacy standpoint? And is that the sociological aspect of what we’re talking about? Yeah,

Zhamak Dehghani 22:41
I think I can have a stab at it. And hello, please jump in. I think when you say data literacy, three things, okay. For me, and I think there are very different kinds of aspects of data literacy. One is data, infrastructure, literacy, like the things that data engineers know, in fact that engineers are very good at, you know, knowing that data platform and tooling that is available to them to do, you know, ingestion and processing and cleansing and all of that. The other aspect is actually understanding the domain data. That’s where that’s in fact, that knowledge that let’s say you’re a you’re a pharmaceutical company, and you’re doing drug research, understanding what is disease, what are the different kinds of diseases and how you actually, you know, discover medicine for these kinds of diseases, what’s it what’s considered a clinical research, like, understanding the domains of data, that’s that knowledge and literacy exists in the organization and in fact, exists in the domains of business. And the third one is how the data can be used in all possible scenarios from you know, generative AI to all you know, good old ML statistical model to the, you know, dashboards and various kinds of analytical usage. So, these are three I think, classes of perhaps aspects of literacy that we can look at. And then, if you think about, like how the image relates to that data mentioned, as fellow mentioned, tries to remove the requirement for being, you know, having three PhDs in data engineering and data infrastructure, before you can work and share your data. So that’s where the concept of self so platform or a new set of tools that remove the complexity of infrastructure, out of the way from the cognitive load of a domain data person so that they can just work on, you know, kind of the data work that is related to that domain, right discovering medicine, looking at a variety of genes, a variety of clinical trials with the tools that are very just suited for doing that data wars. The tools are not necessarily tools of moving data around or storing data on a large scale . That should be that level of complexity and literacy has to be pushed down to the platform team. And they take care of that and provide a, you know, kind of a nicer developer experience to the data, data folks in the domain, I think that what datum is tries to do is actually embrace the fact that people that are closest to the data in domain are best suited to be responsible for that data, because they know about discovery better than anybody else, they know about what constitutes disease better than any other data engineer, so that literacy remains in the domain, and it is embraced by data mesh. And then the last part of it is, you know, the range of kind of analytical usage and application of data from, you know, sophisticated machine learning to maybe more basic statistical analysis. Again, database tries to, and I think I’ve been actually orthogonal to database, that is, has traditionally been in the domain, because for you to do that sort of work, you really need to understand the business domain. And again, the database embraces that and it doesn’t impact. I think it’s an orthogonal cause of concern. But again, we’re, it’s, it’s more aligned with keeping people that are getting usage out of the data and understand how to use the data within the domain as close as to the business to be able to innovate as fast as the business right, as fast as the market goes not as fast as centralized data. I hope that that helps demystify that kind of data literacy a bit, but I’m curious.

Paolo Platter 26:38
Not super right. Anyway, things but some kind of shift, in terms of data literacy is or competencies that are needed, it’s a different paradigm than decentralized at once or some, some skill and competence must be motivated towards the domain and embedded into the domain. These are obviously, depending on the level of automation and abstraction that you are able to obtain could be different. In some cases, it’s a huge shift, in some cases, it could be less. Obviously, every company should try to minimize the shift of these competencies, because otherwise, it’s becoming a huge transformation. And anyway, I usually say that image is a huge transformation, we should not minimize this. Because it’s a journey like the digital transformation 20 years ago. It’s a journey, a discovery journey. It’s not only a matter of skills, it’s also a matter of mindset and culture. Regarding how do we manage and train the data? Alright,

Kostas Pardalis 28:02
so I have a question about, like, both the technical side of things when it comes to what to do with data miss. And I’d like to start with that. And then like, talk a little bit more about like, the people’s side, right, because it is clear, I think that you can have like the data and it’s only with one, you need both of them there. Right? And my first question, and I’d love to hear an answer to that from all three of you is, what should come first, right? Like, and this transition is the technology that needs to come in dough, and like, let’s say, becomes like the spark of like starting the change, or people need to change first and then introduce the technology. So, Melissa, I’d like to start from you actually, what do you think because you’re coming from the, let’s say, more of like, the pure people side of things, because there’s a community there. And then I’ll finish with Apollo just because he’s the engineer here. So hopefully, he’s going to talk about the machines and the rules and all that SaaS, but let’s start with you first.

Melissa Logan 29:14
Yeah, no, it’s a good question. And it’s something that we get asked quite a bit, which is, where do you start? where do you kind of get started with data mesh, and in fact, we just ran a survey in the community to ask these types of questions to say, Okay, you all have been there done that? Where did you start? What would you recommend for people who are kind of starting this process? There’s kind of two parts to it. So one, there are four pillars of data mesh. If you haven’t read the book, it explains all the pillars of data mesh. And what folks say is don’t start with all of them all at once. Understand it, really try to dig in and see how this is going to work for your organization. But in the survey, it was resoundingly clear that people said incremental adoption and stages don’t do some kind of big big thing and try to do it all at once. And start with proof of concept in a specific area. So essentially start small and grow with data that matches what the recommendation was from community members. And of the four pillars. What folks recommended, or what they typically start with is data as a product, followed then by domain ownership, and then the rest. But those are the that’s where the entry point for a lot of people is. It isn’t the technical part social, it’s a mix of all of the socio technical bits. But that’s the typical recommendation from the community members.

Kostas Pardalis 30:39
Let’s move to John’s work.

Zhamak Dehghani 30:41
I’m going to answer without answering your question. Start, like what Melissa said, start with moving and then do both technology and people side at the same time, there is a notion of, I think there’s a notion of movement based changed, which was defined or introduced by folks that ideal which actually looks at historical kind of social revolutions that have movement actually creates change at scale. As Melissa said, I think, starting with kind of understanding that, first of all, your ability to do damage, are you ready to do a readiness assessment and really understand is this the right thing for you at this moment in time, given the maturity of the technology, given the maturity of like, understanding, you know, the industry as a whole, you’re still in the early days. So once you do that assessment, I think, start moving by finding a particular domain or a particular area that could move towards this pattern. And, you know, making shifts in the people’s side, on the social side, as in having the actual domain people engaged in, even if it’s just producing simple data products in the conversation, and empowering them and providing the tooling. At the same time, of course, you can’t just start telling people to change behavior and do something different without having the tools, tools reshaped behavior. And we can’t just throw technology at people without having their incentives aligned and be having them part of the conversation. So it’s really hard to say do this first, then the other, it’s really both at the same time, I’ve been part of many transformations, or at least a few back then talk about that really started with not considering domains first. So in some ways, I understand that Melissa’s didn’t have the point of view that people having, you know, the domain ownership is the hard part. And maybe don’t start there just yet. But I’ve been part of many transformations that didn’t include the domains, it came from a centralized data team trying to think about a platform that they can throw later at the domains. And guess what he just stayed within the data to and you have sophisticated data, which was run by the data team done by the data team, so that we finally didn’t change anything. And on the other hand, you know, you engage domains, like let’s say, you engage our list of folks in, in the domains, but you have no technology to actually support them to generate data products. So it’s kind of both needed, but we’ve got to start moving at an iterate over both angles. Yeah,

Kostas Pardalis 33:30
but it makes a lot of sense. Okay, follow you on my last chance here to say that it’s all about the computer’s like,

Paolo Platter 33:37
telling me now, I will not go, I will not talk about technology. So the data mesh adoption is a data strategy. And it must be part of a data strategy. So like all data strategies, you must start defining your goals. Assessing your disease, and the risk that you are planning in adopting the data mesh, because that’s a measure anyway, is a practice and all the practice has pros and cons. And they must be evaluated. After that, You go in, in data strategy planning. So you need to take care of people’s processes, planning budget, and take care of adoption from the beginning. And finally you can jump into architecture. And when I talk about architecture, I’m not talking about specific technologies. So I’m talking about standards. How do we decouple layers capability technology and How do we plan to create a platform that is enabling all the principles of, of the data measure? So how do we boost the cross functionality among technical people and business people, so it must be a platform that is comparable to more the old, all the different skills, and level and background that we have in the company. Otherwise, adoption will never grow. Then after that, you can start to implement something, but it must be taken away really carefully.

Kostas Pardalis 35:39
Okay, that makes total sense. And, again, if you’re, let’s say someone who sees like the data mess, as an opportunity to go and build a company around that, right, and okay, we’ve had like, both you Paulo and like Jama care, like trying to build products, right around like dilemmas. How can you do that? I mean, like, by definition, building a product requires something that has value, obviously, but it’s also like, repeatable, right? Like, you don’t go and implement it differently, like everywhere, because that ends up being a service, not like a product, right? So how do you productize the database from the point of view of a vendor, right? Like now I’m not talking about the user who likes to go into the menu, which is fine, it’s great. But as we’ve seen, like another, I would say, like, similar, like, let’s say, cases, like we have, like, a giant, right, it’s, again, it started more as like, a way to structure your work and go and like, create deliverables and create value, right. But there are plenty of tools that were built at the end like to support the implementation of agile. Of course, you would never have a dial just by having the tools, you need the people to go and be a child. Right? And correct me if I’m wrong, obviously, it’s not the same thing. But there are some similarities in the socio-technical side of things there. Right How you need to like it, both there. So please, tell me a little bit more about that. Like, how do you build your products, right? And give also, ideas out there like to people who are builders? And would love to go and build something?

Zhamak Dehghani 37:31
Do you want us to share a secret sauce? Is that what you’re asking? Is that the short answer is you don’t, there is no such a thing as a database product or data machine and box. I know a lot of vendors and rightly so. And you know, as a new paradigm comes, we as vendors think about okay, how I’m going to enable these products and database became a feature or an addition of an existing or a new product line, I’m going to existing so as you said, I think just like agile, if there is no I joined me in a box or as agile as a product, and there is no database as a product. I think every vendor has to choose their own battles and think what is relevant to them, and what problem is, what angle of enablement or removing friction or removing bottlenecks for getting to data which they choose to fight for. And they choose to remove, right? Who are the users are trying to enable, what shifts are they trying to create? What is the before and after will look like given their products. So I think Paula and I both probably have our own perspective as to what we want to do and who we want to enable and how we want to enable the movement towards Data Merge, but I don’t see either of us building data rational, data rich products. And that’s a funny way of framing it.

Kostas Pardalis 39:02
Yeah, 100% 100% and I totally get that. But yeah, let’s talk about what you’re building, right? Like just to, like reverse, let’s say, engineer, like let’s see, of it. How are two looks that promote and support, like the concept of the data mesh, instead of creating the opposite, right? Like we’re hurting like, out there like for the datum is like to be implemented. If

Paolo Platter 39:28
you can make another analogy, so maybe with DevOps and get ops, that they are practice as well. So you can buy DevOps or Gitops. But anyway, GitLab or GitHub, are products that are enabling the creation of such practice in our company. So for example, my vision is to build a platform that is customizable, extensible, and is helping companies to define their standards. So they are best practice and to shape their data practice. Specifically, it is not only that a mash can be whatever data practice you want to adopt. But basically, it’s helping you to define it, not just providing guidelines that maybe someone will follow. Because you know, even if you think about a branching model, it’s very easy, but nobody is going to follow it if you don’t enforce it in some way. So

Kostas Pardalis 40:41
the platform that

Paolo Platter 40:42
We are helping companies to shape their practice. So with a better time to market without reinventing the wheel, every time and creating a place where different data practitioners can exchange value like it happens in GitLab, for example, you have who is taking care of the pipeline, who is taking care of the code? Who is writing the shoes? So different personas have a single place where they can interact, exchange value and CO create value that in the end is the artifact to the product. That’s the division.

Kostas Pardalis 41:29
Okay, that makes a lot of sense. So, Melissa, I’ll go back to you. Because, you know, I, for some reason, in my mind, you’re like the voice of the people out there. That’s why I keep asking you these things. So from your experience, like we, okay, all these like 1000s of people, like in the daytime is community, right? Like, do you see people associating, let’s say, like, the datum is with specific, like, vendors or like technologies out there? Or they didn’t like to focus primarily, or only right, like in more of like the organizational side of things like and the strategy and the design?

Melissa Logan 42:10
That’s a good question. I’m not sure I can quantify what we’ve never done. I hear people talk about some of the things that they use to implement data mesh, but a lot of the conversations there are in fact, a lot less about technology in the community. It’s all about how did you get by and or how did you think about federated governance or things like this? We do get asked sometimes what vendors can I turn to and what consultancies Can I turn to? And we plan to add a landscape page to our website that kind of showcases, you know, here are some of the people that you can turn to but we haven’t done it yet. So stay tuned, because that will be coming. Okay, that’s so

Kostas Pardalis 42:48
Shawn. I’d love to see that. And going back to usual like, like, do you want to share a little bit more about the product on your builds? Not the secret shows? Because if you make me click on and say anything about the secret social love to be removed, I’m telling you, okay,

Zhamak Dehghani 43:04
Well, we’ll share our secret sauce. We’ll share the cue for a little bit more about our secret sauce. But yeah, I think we started really, when I went deeper into what are the inhibitors from making the initial reality or a gardener makes all sorts of predictions, that database will be defeated, and will, will be crossed out in the next five years. And I really need to take that to heart as in what is going to stop us from making this shift that has resonated industry wide and everybody raises their hands, and we want to do too much please, becoming vulnerable at work. And what’s going to stop that happening. And what is going to stop it is the lack of ability for domain practitioners, right? These data practitioners are hackers that are just like, you know, they’re discovering drugs, or they’re calculating ROI. But they need to package that as a product and share it with the rest of the organization and be incentivized to do that. That lack of empowerment of those folks is going to make data, you know, fail because it’s going to be limited to a centralized data engineering data platform team. So the product we are building is we had to work a little bit ground up. First of all, we had to kind of codify this concept of a data product, we had to abstract away a lot of complexity that goes beyond what constitutes a data product is for us data product is a lot of things get encapsulated as one data model data contracts, transformation, data storage all of the above. And once that’s abstracted away in a build time and a runtime concept and they find a developer experience with that peer to peer data and product sharing that is designed for data workers and unnecessarily data engineers. I think data engineers have a lot of tools that are serving them to pay. So that’s what we’re working on, hopefully, then we can nudge the needle, you know, toward kind of that distribution of ownership to people that know the data work with data, but they’re not necessarily data engineers. And they, we can then confidently remove going back to the beginning of this conversation, confidently remove the gatekeepers, that they have the best intention in their art, or when people mess up, you know, data availability, or data integrity. So you have this guided kind of explore developer experience. So you know, they can safely share data products, or safely discover and use their product. So that’s kind of what we’re working on. But this is a big hairy problem. And as a product company, I’m sure Paola knows this, like being able to get your arms around a product that is possible to feel and can achieve this. It’s, yeah, it’s a difficult problem in itself, right. And it’s a new category that doesn’t exist and is not that powerful. And I can look at other mutual products. And so we’re just building that a little bit better. We’re creating almost a new category for this. So we have, yeah, we have quite a lot of challenges to get these products off the ground and make an impact. But there’s plenty of opportunity for innovation. I know, you pose the question as other, you know, makers that might be inspired to do something there’s plenty of opportunity here to to build enablers.

Kostas Pardalis 46:35
Yeah, it’s 100%. Okay, and let’s talk a little bit more about like, like the concept of a data product. Now. You guys aren’t like you mentioned that, like, as part of describing, like, what you’re working on. But what is the data product? And the reason I’m asking is because like, it’s one of the first like pillars out there, like Notice also, like talk about that. And I think it’s probably, let’s say, one of the things that we can make, when we’re talking about data, like it’s the first thing that people will focus on, naturally. So what are their products, and why is it different compared to a chain? Depending on where you come from, like how it is different from a dashboard, or from a male model, or like a table on a database or a file on my hard drive? Right? What makes a data product, the data product, like in the context of damage?

Zhamak Dehghani 47:36
Yeah, this is a third question. I think it’s sorted. And I love to hear Palace, a kind of reflection on that. So when I wrote the definition, in the book, again, I started first, conceptually as what is if we were going to share data as a product, what would it look like, like a successful product as if it start from first principle of a definition of a successful product, which is, you know, something that is usable by the users, they love it, it’s feasible to build. It’s valuable to the user. So when you start from those first principles, and then work backwards, I arrive, and you look at the users that you have, these are our analysts or scientists, right? Then you work backward. I define data as a product, essentially, as the unit of exchange of value between the producer and consumer with a set of think a characteristics that can be acronym, does that mean it’s, it’s a unit of exchange of value in terms of data that is discoverable on its own autonomously, it’s addressable. It’s valuable, it’s secure. It’s natively accessible. The internet is important to data scientists and analysts, no matter how you want to access it, you can access the same thing and you use it in an automated way. So there’s trustworthiness like age characteristics. And then when you just peel the stack from peeling the onion a bit further and say, Okay, if you want to build something that has all of these characteristics is just like a product can be shipped to the users, which are a wide spectrum of users. What is it like? What are the bits and bytes that are in it is a find file with metadata. And I think that’s where the diversion of opinion and execution exists within the industry, there is no standard. So our definition of a data product is kind of similar to the implementation I think that it put the architecture around and in the book, which is, you know, the smallest unit of your architecture that structurally has everything that is needed to make the data accessible, maybe accessible, usable, and so on, which means basically, the pipeline and code that is generating the data, as well as the data as well as this may tailor, as well as API’s that control the data, the policies that control and API’s or get access to the API’s are discovered. So for us, it’s a lot more than that. Just the bits and bytes, it’s bits and bytes as well as ways of getting to it. It’s the metadata that lets people discover and understand it is a computer that’s going to keep making this data possible. And API is to get data in and out of it. It’s more than views with metadata is more than catalog entries of metadata. It’s, and unless we have a, you know, reference model of this, I think the DI, that will, at some point, maybe get mass adoption. We are at this point that the actual implementation looks very different from person to person and curious about the pelvis position of the data product, and its technical orientation at this point in time.

Paolo Platter 50:46
Your definition is perfect, obviously, sorry, I will tell, I will tell you what Teradata product is not. So I always start from this because it’s removing interpretation. So for me, the product is not a table, not a dashboard, is not a monolithic system, is not just a set of API, is not an operational system, is not a logical or physical portion of a data warehouse or a data lake. Because we need to change the practice behind that is not a logical view on top of some pre existing data. And there is not our bronzer layer or something like that. So all the things we are used to thinking about as the thermal data product were already present, unfortunately, in the data management space and data product, that asset was used to identify whatever kind of Data Trust out there. So data as a product, instead, is a totally different thing. And also, it will say, we can also see that the product has a composition of layers, so infrastructure, data and metadata and code in terms of deliverable that must be included in our data product.

Kostas Pardalis 52:28
Okay, I mean, probably need like a couple of hours to go through, like the data product concept on its own. So hopefully, we’ll be able to do that, like in another episode in the future, because we are close to the end here on this one. So Mary, the microphone is yours again. Yeah, yeah. Mike, I

Eric Dodds 52:52
i think the question is for you, mesh is an interesting, you know, sort of analogy for how to describe this. Did you consider any other terms or terminology, when you were thinking about sort of this concept? I just wanted to know, like, where did the term mess come from? I mean, it sort of makes logical sense when you look at it now. But you know, the names of things, the etymology, and sort of where it comes from are often different stories. So just genuinely curious. What other terminology Did you explore as part of this sort of naming process? Not many

Zhamak Dehghani 53:32
fabric was already taken Fabric is a great word as well, it was already taken. So I couldn’t use it when I did a Google search. But yeah, of course, the networking. So I did a lot of work on distributed system networking and protocol design. So mesh was in my vocabulary. And in fact, data mesh has multiple meshes in one, right, it’s a measure of flow of the data between input data products is a measure of the relationship between the price data types referral, lightning data, so there’s multiple actual mesh layers in that. But yeah, I think maybe my networking background influenced it and the fact that Michael was searched shows many, many examples of it. Perhaps the word there because maybe I didn’t find it. And I’m glad I use it. It kind of seems to be catchy why it stuck.

Eric Dodds 54:28
Yeah. Yeah, for sure. That’s, you know, naming things is actually if we think about, you know, even the fundamental principles of software engineering, naming things is the hardest part. That’s part of why I asked him,

Zhamak Dehghani 54:48
I haven’t any story. In fact, the very first public talk that I gave about data mesh was at O’Reilly Congress in New York. I think it was early 2009 and the Topic of the subject of the talk was beyond data lake. And I put this call out and I didn’t have a name for this thing. And I put a call out at the end of the conference. If people have a name for this, please come and talk to me. Yeah. Yeah. Nobody showed up. As he said this. Right. We’ll get shy about being judged, you know, with their choice of names. So yeah, I

Eric Dodds 55:24
would data like it could have been a data canal, right. Like you have a way to access you know, all these things, but on. Yeah, that’s great. Well, thank you to all three of you for joining the show. This has been very helpful. Thank you for helping us put a definition to data mash, you know, after over 150 episodes, I can’t believe we didn’t cover it. So thank you for, you know, bringing all of this to light. And we’d love to have you back to dig into some of the things that we didn’t have time to cover.

Zhamak Dehghani 55:58
Thank you for hosting us. Yeah.

Paolo Platter 56:00
Thank you. We appreciate it. Thank you.

Eric Dodds 56:02
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 168:

Decoding Data Mesh: Principles, Practices, and Real-World Applications Featuring Paolo Platter, Zhamak Dehghani, and Melissa Logan

December 13, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter