Episode 77:

Standardizing Unstructured Data with Verl Allen of Claravine

March 2, 2022

This week on The Data Stack Show, Eric and Kostas chat with Verl Allen, the CEO of Claravine. During the episode, Verl discusses data standardization, specifically within marketing, but also how it ripples into other areas.

Play Video

Notes:

Highlights from this week’s conversation include:

Verl’s career journey (2:46)
M&A data evaluation criteria (7:12)
What Claravine does (10:48)
The breadth of data (15:03)
Adding to content and advertising data (18:22)
How Claravine standardizes data (23:53)
Designing a data model (25:40)
The underlying technologies of building a product (33:43)
The main consumer (35:02)
Maintaining quality (39:06)
Helping solidify definitions (41:37)
Implementing Claravine’s model across various companies (44:54)
Internal changes affect on the model (46:47)
Connection brought about by structure (49:19)
Applying unstructured context to structured stamping (52:36)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 0:06
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Welcome to The Data Stack Show. Today we’re going to talk with Verl and, Kostas, they are doing some really interesting things for some really large companies, some of the largest companies in the world, actually, which is fascinating. And just to give a little preview here, Verl is part of Claravine. And what they do is basically take sort of unstructured context around date internal data at a company, as it exists in the form of things like marketing assets, and essentially applies a schema to them, so that they’re standardization across this massive, you know, multinational organization, which is really interesting. I want to ask him, What you, we talked about the concept of a schema as we were, as you and I recovering? I want to know, what is a if you think about a schema across a large multinational organization for creative content, what does that even look like? I mean, what are those data points? And what are they what are you know, what kind of data are they? Are they populating that schema with? How about you?

Kostas Pardalis 1:38
Yeah, I want to ask him how you can build one schema to rule them all, you know, sounds very powerful. So, yeah, I really want to see like how you can build something like this, what are the approaches the stakeholders involved? And how different looks from organization organization like at the end, how much we can standardize things. So it’s going to be very interesting conversation. I think it’s one of these problems that you don’t hear about too often. But in the future, we will have like to deal with it like even, like smaller organizations. So it’s interesting to have this conversation with him and see, like, how tomorrow will look like.

Eric Dodds 2:22
Yeah. All right. Well, let’s dig in with Verl.

Verl, welcome to The Data Stack Show.

Verl Allen 2:27
Great to be here. Thanks for having me.

Eric Dodds 2:30
Okay, so much to talk about with data standardization, especially in the context of marketing data, which is going to be a treat for me. But before we get there, just give us a brief background on yourself and how you ended up doing what you’re doing today at Claravine.

Verl Allen 2:44
Yeah, that’s it. So prior to Claravine, I joined Claravine in 2018. Prior to Claravine, I spent about 12 years first at a company called Omniture, which is kind of a leader in the web analytics space. My company was acquired by Adobe, and I think was 2010. And then spent about another eight years at Adobe, into my role there, I was leading up strategy around the What now is the experience cloud and also corporate m&a. So corp dev m&a, if you think about what the experience cloud at Adobe, really is, is a compilation of about 11 or 12 acquisitions done over a 10-12 year period, that that ultimately has kind of resulted in what is now this experience, the experience club I have there. So I spent a long period of time helping kind of build that business, if you think about it that way.

And 2018, was kind of ran into a friend who had started a small company, which is not clarifying. And I see that we’re talking he was kind of point where he’s like, I’m not sure what I would do with this, we’ve got some great customers, we’re kind of stalled out as far as growth, product issues, and I need to raise capital. And so as we were talking, I kind of said, Listen, I can introduce you to some people, I’m happy, I love what I’m doing. I’ve kind of got a four year plan to retire. And as we kept talking, I saw what he was doing in relation to where I was seeing challenges at my from my time at Adobe, where we had spent all this time acquiring all these, all these technologies and solutions. And I’ve done a lot of work around integration at the workflow layer and in other in other ways and integrating those solutions. What hadn’t happened, though, is there was no kind of standardized data model underneath it, and nobody’s really, it’s now with their kind of the Adobe Data option, other things and there’s Mercer CDP’s. But there wasn’t at that point, any kind of focus around the data side of the integrations and the data side of kind of standardizing that.

So as I start thinking about what I started looking at what he was what they were doing here Claravine again, it was like four or five people at that time. It really struck me that there’s a need in the marketplace, especially as we think about this, what I think of as the 2010s were the kind of the decade of SaaS applications and explosion of that marketplace, to where now you have you know, 50 to 100 point solutions in any enterprise in the marketing organization, the problem that they’re running into is, they were never really architected to work well together until after you got the data. And so as, as you’ve seen, the emergence of kind of the enterprise, kind of, it’s gonna be a cloud based kind of data infrastructure that’s exploded in the last couple of years, as becoming more readily available, even in the functional areas, it became clear to us that, to me, at least, that there’s going to be a need, just to standardize and kind of create common language or taxonomy or difficult dictionary way to call it across these applications, especially is that and it has to have context, especially as you collapse that data into these single instances in the cloud. And so as I was there thinking about the problems, they’re solving the problems I was seeing, even in our own business, inside of the applications that we had acquired, it became clear to me that there’s, if we’re struggling with this at Adobe for our customers, the brands themselves, I’ve got a bigger problem, because the number of applications they’re trying to deal with is multiple times, you know, what we were dealing with, just from a solution perspective, to take the experience cloud to market.

Eric Dodds 6:13
Fascinating. Okay, I want to take a quick detour here. So what a fascinating experience sort of being involved in building out a product suite from the M&A side. I mean, that’s, that’s just fascinating, right? It’s, that’s so interesting to me on so many levels. But I was gonna ask you, you mentioned a little bit, looking at when we think of especially about like marketing tooling or customer engagement tooling and the suite of infrastructure that surrounds that from, you know, analytics to actually the tools that are sending messages. Did you have an evaluation criteria on the data side? I know you said there was struggle around that. But as you’re thinking about building a product suite, you know, I think that when you think about customer experience, evaluating the ability to layer those products in from a data standpoint was part of the rubric. Even from an M&A standpoint. How did you think about that?

Verl Allen 7:07
Yeah, it’s interesting and I think this has evolved dramatically. The thinking about this, I think, much more true today than it was like 5, 6, 7 years ago, and a lot has been driven by some of the changes in the data ecosystem. And just kind of that the the ways that companies are looking at their business, not so much. Where it was, I think, back then there was more about the siloed approach. And you think about applications about how do we get efficiency and scale in a channel. And I think what the world’s turned to more holistically, and I saw some data point the other day saying, Before the pandemic, about 30%, of the digit of the interactions with brands was was digital now it’s like 55-60%. So this huge push forward. Yeah, what’s happened is, is that when when I was at Adobe, we were thinking about it as this application in this application, there’s data here that we need to get here. And and for specific, so it’s more about how do we how do we push specific pieces of data between the applications, it wasn’t kind of stepping back and look at saying holistically, what should that operating data model look like, for the marketing organization? holistically? Because I think the industry, even back in the 2016 2017 timeframe was just starting to kind of everyone’s talking about a single view of the customer and unified profile and all this stuff. But the reality is, is that you had on one side ad tech solutions, you had martec solutions, you had CX solutions, and they were sort of in different groups. I think what you’re seeing now is the convergence of this stuff around the experience around the customer more so. And it’s driving this really different way of thinking about the data necessary to operate the business, not the data to run the application and do my job in a channel, if that makes sense.

Eric Dodds 9:08
Yeah, super saying we, I’ve referred to that before. It’s kind of the daisy chain paradigm where it’s like, okay, well, I have data here, and then I need to get it here. And then I have it here. But then I also need to get it here. And so you end up with kind of this daisy chain architecture that degrades over time almost like a game of telephone, because every every system has its own flavor of database and data definitions and all that sort of stuff.

Verl Allen 9:33
Yeah, and I saw even in a sense of like, even within their within the cloud that Adobe had built forget about integrating other applications in that are not owned and you know, owned by under the Adobe brand. That was even conflict that that was, again there was a daisy chain, even within those within those applications. And then when you think about it from the put put the lens on the from the brands perspective, it’s even it’s much more complex. Did it then it looks like from Adobe’s perspective or from sales forces perspective, because the big, you know, the large enterprise software companies out there.

Eric Dodds 10:07
Yeah, for sure. Okay. So thank you for humoring me with that little detour because it’s fascinating. Let’s talk about the the let’s maybe could you use a specific example of the type of brand and the user who’s like, I am facing this problem every day in my job, and it’s really painful. And Claravine comes in and who like, this is so much better, like describe that for us? You know, I’d love for you to get specific if you can have like, I was doing things this way. There are data problems because of X, Y, and Z. And this is the new way that we’re doing it.

Verl Allen 10:45
Yeah, it might be helpful for me to even kind of back up a little bit and explain, when I came to Claravine, it really was we were helping analytics teams within the marketing organization address data problem. So if you think about it, the analytics teams are publishing or creating reports. And in doing analysis, what we what we see with a lot of customers is they do the analysis, and then they come up with this, you know, you got the report. And at the bottom, there’s this other bucket that 25-35-40-50-60-70% of the data drops into. And it’s vary degrees, depending on how, you know, how complicated or how integrated, they’re trying to, you know, to report on. But what we, what we really kind of were initially, initially just helping solve was taking data out of the other and actually specifying it and actually putting context around it. So you could actually attribute it in some way.

Eric Dodds 11:47
Just to put a sharper point on that. So like, and I’m just thinking through, like, our data engineers, and analysts who are listening, and even my own experience, like, okay, and reports that I’ve seen, or that I’ve, like, helped build data for, or whatever it’s like, Okay, we have, you know, paid search campaigns as a bucket, we have like that, you know, sort of like, where, I guess is that what you’re getting at is—

Verl Allen 12:07
That’s one type of report is just like, you know, by channel or whatever. I mean, we’ve all set in those meetings, you know, I have a finance background, but I spent in 99, I switched over to digital market, because I thought this is more of a analytical problem, that is a creative problem, it’s actually an interesting problem to solve. And so I sat in these meetings before, with teams where everybody comes to the meeting, you’re reading out channel by channel by channel and you start rolling it up, and the numbers do not roll up, they do not roll down. And so no, you aggregate the individual numbers from the channel, people sitting in the rooms and seats, and it’s x, you look at the aggregated reports, and you’re like, No, it’s like point 4x, or point 6x, or point 7x. Like, where, who’s who’s been successful. And it’s still a problem today, because a lot of that reporting is even though it’s done in, in, you know, in a centralized through centralized applications. The foundation is broke the foundation of that data, if there’s not, again, we think about as data standards. But so that’s one of the things we’re helping solve is really kind of taking and creating more specificity and more detail and more context to that data that improves that reporting. That’s one thing, but it starts that’s just on the recording side. But if you take that out even further, it’s a well, same data that a lot of cases you’re using for reporting, you’re using data to do other things like, optimize, spanned across channels, or, and I know I’ve had situations, I was talking to one of the largest consumer electronic companies out there, that you know, the brand is associated with a fruit. And they were we see we sat with their team, one of the one of the one of our larger teams, and they’re like, listen, we got a problem. One out of every seven days, we cannot optimize ad spend, because we’re having to rebuild all the models, we’re having to clean all the data. And so there’s literally one day at every seven where we’re still spending, we spend, you know, 50 100 million dollars a day, we have no, we’re flying blind, and then everything is delayed. And so with that, without organization, what we really helped them do was to reduce the time to insight dramatically by reducing the amount of data that had to be cleaned up in the operation side of things. So think about where marketing ops data ops, and in AD ops, they spend a lot of time between execution and kind of optimization cleaning data. And that’s what we’re trying to help them eliminate by adding context and creating standards in that in the way that data is captured.

Eric Dodds 14:41
Okay, so when we say standards and context, I want to dig into those terms. But could you just give us a sense of the types of data that are the inputs there, right, and so I mean, like adspend I mentioned that’s like an obvious one. for marketing, but like Clickstream data, could you just give us like, what is the breadth of data because certainly that’s a huge contributor to the problem is that you have a huge variety of different types of data coming from a bunch of different places.

Verl Allen 15:13
Absolutely, and to be clear, we are not a data collection, like a analytics or data collection application, the way I describe it is, and there’s a company that I think the 80s or 90s, called, it was BASF. And they’re like, We don’t make the products you use every day, we make them better, and so are our opportunities to improve the ROI and the value that our customers are getting from other applications or analytics, whether it’s analytics, whether it’s CDP’s, whether it’s ad serving, or whatever else we’re doing, there’s where there’s we’re spending dollars. And so what we what we really are, when we think about that types of data, it is Clickstream data. But it’s it’s not saying we’re collecting the Clickstream data replacing it, it’s appending, on to that Clickstream data context about a campaign or an experience, it’s appending, to content, standardized data about the content itself in relation to all the other content in the organization, and across different dams across different CMS. And so it’s really trying to take this complexity that exists in the market in marketing organizations, today’s marketing, I think there’s other applications outside of marketing that we’re talking about. But it’s really trying to take some of that complexity and create a layer underneath of it, or alongside it, that has standards around it, that that that can that is attached to or can be attached to that data, to enrich it, extend it, and also create meaning between some of the data that right now doesn’t have necessarily, you know, really, really great ways to associate it.

Eric Dodds 16:51
This may not be the right way to think about it. So tell me if I’m off here. This sounds amazing. It almost sounds like you’re taking a schema designed for full visibility and stamping it on the data across every sort of data repository.

Verl Allen 17:18
I actually think that’s a great way to describe it. I think it’s simple, simple way describe because because it is almost like an imprint against that. You know, it’s not it’s not that we’re collecting the behavioral the streaming data, it’s a it’s a set of data that gets appended to that or stamped to that as well.

Eric Dodds 17:35
Yeah, because like, if you think about like standing a schema on a certain set of data, like there’s a delta, if it doesn’t, you know, contain all pieces of the schema. Maybe I’m extending that metaphor a little too far. Kostas, let me know. But okay, I have two more questions. I know Kostas has them as well.

The first one is, could you give some examples of the context that you add on to a specific type of data, a couple types of data, just you know, like you mentioned advertising performance, or even content, which is interesting, like, in the context of digital asset management, or other things like that is pretty fascinating. Like, what does that look like? You know, you have a piece of content, what are you adding to it, you have advertising data, what are you adding to it?

Verl Allen 18:20
Yeah, so the thing about content, there’s a lot of situations where you have people creating content, which is more of a kind of content creation, or I’ll call it the creative side of things, you’ve got the content side of things where all sudden it gets, you know, loaded up into a CMS, what you have happening is you have typically people creating it that have a creative brief and all this context around this piece of content was created for this purpose for this business student, for this stage of funnel for this geography for this demographic, for you know, it’s all that’s all that information inside that sits in the craters head. Hmm. The problem is, is that once that stuff goes in the dam, there’s a there’s not a great way to you’ve got craters all over the world, I can think of a large multinational organization you have, you have craters in agencies, you have craters internally, and you have people all over the globe. They’re speaking different languages, like how do you create a standardized language across all those teams, peoples and geographies and business units. And that’s really what we kind of help them provide. And so instead of having once it gets loaded into the CMS, instead of trying to have the, you know, the content, the people that are loading the content into the CMS, add that context, that piece of creative i that ID with that creative is actually associated with a bunch of context in our application that allows them to really kind of create a different way of solving this problem.

Eric Dodds 19:47
That’s really interesting because anyone who has worked with data knows that, I mean, I don’t want to be too incriminating here, but relying on human input for critical data is never a good idea because people are always going to get it wrong, fat-finger it, it’s the least reliable way to capture data in many ways.

Verl Allen 20:19
Yes. And it’s interesting though, you when you eat it. So if you think about it, though, those individuals have all the context, a lot of the context around what is actually happening. So in some ways, it’s funny, I signed one of our customers about this. And they said, suppose that you create the illusion of choice, that we create the illusion of choice inside our application for the end user in that situation where there’s an end user in the application that really kind of forces them down a path that limits the errors that can kind of get curious. So you’re almost kind of forcing a set of decision on a much smaller and depending on who they are, what channel they manage what you know, there’s all sorts of controls that you can build, and logic you can build around, what level of choice you create. And as data, as you know, through integrations, there’s a lot of contexts you can get from the integrations that awared other data is coming from that out, help kind of inform what options we should or some give them.

Eric Dodds 21:26
Yeah, that’s interesting. It’s kind of taking a consumer app optimization mindset where you define very clear pathways and success for the user and applying that to like internal creators instead of a business, which is super interesting. But could you give one example, maybe on the paid side, just so we have another example of the context that you layer on to a particular type of data, like advertising data? Is that performance data?

Verl Allen 21:57
We work really closely with a lot of our customers, both internally and with their agencies around that performance data. And so what we’re helping them add into that is, in some cases, there may be data fields that are collected in one application that are not available in other applications, or the way they name fields in one application are not consistent with other applications. So it’s trying to help solve naming kind of differences in the way that fields are named. And we can do some mapping for them. The other way to think about it is, again, similar to a creative brief, think about a campaign brief, there’s all sorts of contexts and and inside the nine campaign, which are typically managed inside of spreadsheets, that we help to onboard into kind of the enterprise data, houses data, you know, the data model, if you want to call it that. It’s it’s things around what stage of funnels a campaign on, who was that? What was the segment? What was the creative, and it’s mapping that creative bend back to standard, you know, mapping, it’s creating almost like a way to map even across elements of an experience, that that just not are not specifically and standardized data across those elements of an experience that are not so just specific to the campaign itself.

Eric Dodds 23:14
Super interesting. Okay, I have one more question and then I’m going to hand the mic to Kostas because I’ve been monopolizing this conversation. So how do you do that? Is my next question, because that sounds very complex, especially when you’re thinking about organizations. It sounds like fortune 500 level 100 Level companies that are just massive, complex organizations that are producing content across who knows how many vectors and business units and product lines and all that sort of stuff. So how do you do it?

Verl Allen 23:49
Yeah, it’s interesting. So where we thrive, we’ve seen where I think the opportunity exists for this is in those organizations that where there’s more complexity, and you’re hitting on it. So we have a customer, right? One of I think one of our customers, they have about 700 users across the globe, both internal and agency users, specifically around standardizing taxonomy around content. And in that, in that situation, we are baked into the workflow when they when you’re, when you’re when you’re going through that creative process of submitting creative briefs and things we are integrated into the workflow and capture data in that example out of work from. And so it’s through integrations that we get access to data. And it’s through integrations, as our as users are adding data into fields either in other applications, we derive that information into our applicant into our application into our solution. What we have at that point is the ability to compare what was input to what the available standards are and identify where there are differences between what is maybe input manually in a second and another application and what the organization identifies as the standard around that field or that attribute in the data. And we’re able to identify where there’s breakage in that, either A, through our solution, automate the crush of that, or allow the individuals or organizations to surface those areas, areas where their problems, fix them in our application, and then identify ways to enforce it upstream.

Kostas Pardalis 25:22
This is very interesting. I mean, I don’t know if like, I’m going to be a little bit too technical, but I cannot stop thinking all this time on like, how do you design a data model like this? Like, where do you start, in my mind when we are doing like data modeling? Because it’s like modeling in general, right? Like, what we are trying to do like create, let’s say, an abstraction of real worlds, that are like, two ways that I can think that you can do that go like high level and be like, Okay, well I want to do is like, I want to model the marketing domain, what are the main concepts that we have there, what are like the main processes that we have and try like to create a data model around that, and then all the data, all the instances that they come back from different applications, go there and try to, to connect them on this, align them with his data model. The other way is like go from the application level, right, which is the other extreme, I have these applications, they have like 10 different data models, let’s align this 10 different data models and see what happens on the end. But my feeling at least these that none of them is of the end, like so successful, like you mean something else, something in between? Probably. So how did you do that?

Verl Allen 26:40
By way, I agree with what you’re saying those options. And the way I the way we think about it is, it’s not necessarily an either it’s, or an either or it’s, it should be much more of an ad. So where we, it’s interesting, where we come into organizations is becoming more and more this way, I think as as companies are becoming more mature about this, and we’ve seen in the last, specifically last 18 months, is we are now sitting situations where it’s not just the marketing organization sitting in the room talking about the data model and the data taxonomy. What’s happening now is are bringing in the enterprise, I didn’t know these people existed, but there’s they’re all over out there. There’s enterprise data taxonomists. It’s the enterprise kind of architects and data architects that are coming in and working with the marketing organization to kind of do exactly what you’re saying, which is, some of this is going to come from the application side and how do we how do we connect data across the different applications, and that’s some of what we we help them do is to really kind of string create relationships in the data that don’t natively exist in the applications themselves. And secondly, it’s coming from the top down, saying, hey, there are other attributes and elements that we want to capture that are really specific to the enterprise that have nothing to do with the applications that we want to be able to incorporate into that model. And so and so some of that becomes, you know, way we think about it is, I’m thinking of just a simple example is there’s associations between like, for example, if you think of a car manufacturer, like a Toyota, for example, the association between make model, and then other sorts of, you know, trim packages and things like that, it just think about FIBA from from that perspective, like a data model for a car manufacturer, like, there’s also in those are, those are all related. And they are not in their, in their exclusive in some situations, like I have this model, if I have this manufacturer, like Lexus versus Toyota, I only have certain models available. So you can exclude things. And so some of that becomes work that we do with them in helping it’s something we do internally, we also work with some of the largest eyes out there that help work, they they help them work through a lot of this coming up with the right data model and and identify what are the relationships that we’re missing that we want to try and enforce in this data model that don’t exist through the applications themselves? And so it’s a little bit of kind of meeting both ways, if you want to think about that.

Kostas Pardalis 29:07
Mm-hmm. Can you give us an example of such a relationship?

Verl Allen 29:11
Yeah, I think a great example, get really creative. What you what you have in creative, for example, is if you think about an answer or an answer, as you know, one of the elements of a an ad campaign is the actual creative. And in a lot of cases, the way that creative gets named inside of the ad serving solution, because in some cases, it may be a dynamic ad that gets created has nothing to do with the way that the asset or the creative is named or identified in the digital asset management solution or other other applications you have. And so one way I think about it is and then if you have assets that you’re managing inside of your dam, and it’s got a bunch of metadata with their attributes about that piece of creative, how do you associate that creative with that camp? Hey, when the way that the IDs are not there’s there’s no linkage between them. And so that’s where we are. And then how do you take and associate that campaign that creative with all these other attributes? That’s kind of one simple example of how we help them create those relationships and stitch these things together that don’t naturally exists within the applications because they weren’t necessarily architected and they weren’t they weren’t meant to work that way.

Kostas Pardalis 30:26
Yeah, of course. And if I understand correctly, this model that we are talking about is a combination of, let’s say, what are additional color schema, which is more about like, making shapes and taxonomies? Or is this also something else? Or it’s just these two?

Verl Allen 30:43
Yeah, I mean, you got the other way to think about it is, you know, we work with a lot of a lot of our customers work with agencies, I think about one, one of our customers right now that has, I think, almost 100 agencies around the globe, they work with on the media side. And each one of those agencies is executing a trafficking media. Well, they have trafficking sheets. And in those traffic issue, there’s naming conventions around how they’re naming certain things. Well, agency, a, an agency be at HSBC, if there’s not, if there is not a way to enforce and take them out of the spreadsheets and enforce the way that they’re making that, you know, creating naming conventions and naming data fields. Similarly, then everything, you can’t actually extend your data model out into that it’s very difficult to extend the data model on the enterprise side out into the agency, when you have that much complexity with across the agency team. So if you think about it, we’re helping that the brand, extend the enforcement and the use of that data model, not just within their own teams, but even outside of the organization as they think about their business, because it extends, you know, execution happens within those agencies as well. So you may have sheets or naming convention seats, that around trafficking or traffic and sheets. And if you’ve ever seen one of these things, these are spreadsheets that have you know, they’re on version 137. And they’re, they’re, you know, hundreds of rows y columns wide, and they’ve got multiple layers to them, and they’re trying to pull in the creative, they’re trying to call them data about the audience. And it’s, they’re fraught with errors. Uh, huh. And the and when when an Our point is like, there’s actually a lot of valuable data in that, in that that should be part of the enterprise kind of data model, and data and data ecos and data store. And so that’s, that’s one way to think about it is, is that we’re helping those situations where information workers aren’t really working in spreadsheets and pulling them into an application.

Kostas Pardalis 32:50
From what I understand like, there are, let’s say relationships and also like, constraints, probably. So you can say constrain the way that things can get associated with or like, what value is like I’m taking, like, all that stuff. How do you represent that? I mean, that’s a bit of like more technical question, but like, what the technology looks like to represent on something on a way that like the machine can understand that enforcement, right? Like, what technologies are used? Like, how do you build your product to do that?

Verl Allen 33:27
A lot of this comes down to, I don’t want to get into it technology, like what, what programming language like that stuff, but what I would say is this is that when we think about this problem, we think about, it’s very similar what you’re talking about, which is there are known relationships between datas and the data model between the data model, and there are relationships that need to exist, or there, there are fields of data that need to be part of the data model that are kind of naturally natively in there. So we can we can, we can help them bring those in, we also have all sorts of logic in the application that says, hey, if this user ID, again, it’s coming back to knowing who the user is, what functions they’re responsible for, what campaigns they have, like, for example, on the campaign side, what geography what brands, there’s all sorts of way to limit down the number of available fields and the number of available options that you that you expose to a user based on what you know about the business about the the application or the execution area of execution and what you know about the user themselves.

Kostas Pardalis 34:42
Mm-hmm. Okay, so let’s say we get all the architects and the taxonomies and you all together and we generate like this unified let’s say model for for a big corporation, who is consuming that after it is done inside the organization? Who is the main consumer?

Verl Allen 35:01
Yeah, so what’s interesting is, if you asked me that question a year and a half ago, what I would say is the prior two years ago, the primary consumer this data are the analytics teams within the within the marketing organization, what’s happened now, over the last year and a half again, some of this as a result of not necessarily what we’re doing. But it’s I think it’s a bigger trend that’s happening within the enterprise is, as more of the functional teams have access to data, you know, large kind of scale, cloud based data infrastructure, they’re moving more that more of that work and payload into those, you know, whether snowflake or others. So we’re, we’re, we’re pushing data now downstream into, in some of our customers into their BI teams, we are they are using it to go into their, they’re pushing it into their CDP’s. They’re pushing it in some cases into whether they’re using snowflake or other applications, they’re pushing it into their machine learning and AI infrastructure, because what’s happened with realizing is, again, the size of data that meets the quality standards to point machines at to make decisions on behalf you know, better decisions on behalf of human than humans, is really valuable if they’re scale. Yeah, that scale to the data. And scale, the data is really a, a byproduct of in some ways of how much the quality of the data and the relationships that exist in the data. So what as I talked about the creation of relationships, and in the improving the quality of the data through standards, it really is, all those kinds of different applications are areas where our customers are pushing the data. And we have data transformation capabilities in our application that allow them to, you know, to either we are directly integrated, or to push it out to you know, AWS bucket in a certain format, and then be able to capture and pull it in. But more and more customers are wanting the native integrations so that changes happen in real time. And it’s not just about us informing downstream. But the other way to think about it is, you know, I’m thinking of one of our customers who has a very quickly changing set of inventory. It’s a, it’s a large athletic shoe manufacturer. And as they are constantly constantly releasing products, it’s how do you keep an up to date, Product Catalog available to your marketing organization and other users that are creating campaigns creating content, all those other things? How do you how do you expose that to them in a way that it’s, it’s up to date, and it’s limited to their geography, or the channels that they’re selling into versus, you know, because of channel conflict and other things. So it’s managing things like that as well. And being able to expose, if you call it upstream Product Catalog got it and other data into the dealer marketing organization, and other parts of the business that actually have logic associated with it and allow them to, again, limit the the number of selections and the variants on that the variety and the choices that they’re they need to select from to actually get data into this model, if you will.

Kostas Pardalis 38:26
Yeah, makes a lot of sense. So if I understand correctly, like, correct me, if I’m not, I see like two main two main, I mean, at least two ways that like value is delivered, like in the organization. One is that it’s easier, let’s say you make data easier to be interpreted by people because of the standardization that goes there with the model. And the other thing is like data quality, right? Like when you have a reference schema and the taxonomies, which also like other loads into the quality aspects, you have, you can increase and monitor like, quality a lot. But when it comes to quality, there’s always something can go wrong. Someone we mess with the data, let’s say, right, so what happens then, Mike, how the tools that you have in place, like the product that you have clamshell or not? I don’t know, I mean, with addressing the issues that are creating.

Verl Allen 39:30
One way to think about it is we think about solving data, like traditionally think about data quality has sort of been solved reactively the data pipeline or does you know, with ETL, downstream or, you know, there’s lots of ways it’s been solved downstream because when you get ready to actually utilize the data in runtime, you realize that we got, you know, we got problems so it’s trading, there’s a lot of that gets fixed downstream. We see it differently. We look at it say listen, if you put in place data standards on the front end There’s a proactive way to solve a lot of the data quality issues we’re not gonna solve. It’s not about solving world hunger. But it’s about solving a set of problems or kind of type two problems. Really think about context and things that really enable the organization to create a way to bridge between the creative side or the creatives, if you don’t call that and the quants, and actually create a more holistic way of thinking about data quality, because typically data quality is that the problem gets kind of shoved downstream. And it’s data engineers, data and data analysts and data scientists that are dealing with data quality, we think that there’s a better way to solve it, which is if you can put tools in place that enable the information workers on the front end, to solve some of this, then it It benefits downstream, you’re not going to solve everything, we also have situations where customers are revalidating data. So they’re reading data back through our application to read again, always bending it up against the standard, to make sure to see where there’s where they’re where their problems. And that’s kind of how we think about it, it’s a very different way of thinking about solving data quality, rather than fixing data.

Kostas Pardalis 41:15
Makes sense, makes sense. No, that’s very interesting. And okay, we talked about, like, how we’re gonna make like the data easier to be, let’s say, managed by machines, with the validation and like, all that stuff. So what about the probability like how these model that has been created, that the end can be communicated to a huge organization, right? Because from what I understand, we are talking about, like organizations that are really, really big, you might have like, I don’t know, hundreds of stakeholders that they have to agree on these definitions, or this schema has worked. So how does this work? How technology can help and how organization can help because I assume that probably like the solution, somewhere in between sounds like just a technical problem.

Verl Allen 41:57
Yeah, it’s interesting, because I think what we’ve seen, and we’re seeing less some now I think there’s, there’s more thought being put into this kind of proactively within organizations. But really, organizations have to come to, in some ways an agreement on what is that data model that we’re going to use operate, not that it can’t change, but we typically get involved with them in a situation where they sort of have that salt, or they’re close to having that salt, and we may help them, we may because we have so much context from all of our other customers, we may come in and say, Hey, you may want to think about these other these other fields of data that may be important your business, there’s, you know, there’s there’s Company X over that has that looks, it’s in a similar field or whatever. There, there are other attributes that you guys should be collecting that you’re not, you’re not identifying and standardizing. And so some of that we come in and help them with but a lot of times, that’s either being done internally with like we talked about earlier, the architects and the business users and for marketing, or there are situations where we get brought in, where they’ve already engaged with a Deloitte Digital, or an Accenture. And it’s a it’s the how do we, how do we deploy and activate this model that we’ve built into the and make it into the organization, just what we’ve seen is, organizations have data taxonomies, they send in these taxonomy solutions, but they’re not connected to anything. So it’s kind of like, that’s great, but it’s sort of shelfware. So it’s about making it active and making it actually usable and functional, by the information workers. And that’s kind of what we connect that we kind of connect the two and talk about, connect the architects and the taxonomists with the business people. And that’s, that’s hopefully who needs to solve it. But there’s, there’s a lot of cases where it has to be facilitated by third parties that have been through this many times before, but But it’s surprising how many how many organizations themselves know this stuff better than anybody. It’s just the getting, making it a priority and getting the right people in the room to actually have those debates and have those discussions.

Kostas Pardalis 44:14
Makes a lot of sense. One last question from me. You have implemented this process in many, many different companies. How different the implementation is from company to company and how much you can end up and say like, okay, there is a VMs one distills model for marketing that makes sense for pretty much everyone. Okay, there are like its cases and some specializations on that. But this model, like if you take it, you pretty much understand like, let’s say 80% of what marketing is like, in an organization.

Verl Allen 44:52
Yeah. The way I think about that is there are I’ll call it elements of those models that are fairly standard. There are also big portions of it that are unique in the sense that each business, if you think about context, each business is organized and structured differently. And I saw this with one of our customers and, and we’ve seen more and more where our data is now ending up in the finance organization, because what they’re realizing is, wait a second, we’re trying to do profitability analysis, and all sorts of analysis from a finance perspective. But we don’t necessarily have a way of looking at the data as it relates to the way we think about the business from a p&l perspective around business units around contribution margins, by business units, by product lines and things and so that when you think about product lines, business units, that structure of the organization, that’s the part of it, that becomes sort of unique, it’s not, it’s not like, every scene was different, but the elements of that are really have to be customized and be aligned with the organization itself. And so there is there are pieces of it that I’d say are standard, and there are elements of it, that are unique to that organization, which is really where you think about organization organizational context versus the channel or the application context.

Kostas Pardalis 46:21
Mm-hmm. Okay, one more question before I give it to Eric. The last one, I promise. So how much and how often you seen these model changes inside your organization? And is there like some indication like some dimension of the organization that whenever it changes, let’s say, also, these reflects back to the model, like, under some kind of connections there?

Verl Allen 46:46
Yeah, I mean, again, these models change as the or like, you know, as they’re adding, well, those, definitely, there’s a difference between the model changing and the underlying data itself, or the, the Available attributes in those fields, if you want to call it that, or that, you know, for some customers, like I mentioned earlier, that stuff is constantly changing. And it’s, there’s a bunch of logic around that the models themselves do change. But that is a much more of an organizational discussion than it is. And, and there’s controls around that. And there’s even ways that particular functional areas or users or geographies can actually, you know, there’s no reason they can’t modify their model. But it’s, it’s being able to say, Hey, this is the standard model that we’ve agreed on as an organization. And there’s these attributes that are added by this geography. And it’s how we treat them differently. When we do analysis, understanding that this is this was built in there for a specific purpose. That is, it has nothing to do with what the way we think about it, looking at it holistically from an organizational perspective. And so there’s some of that nuance that that we can, and we help them manage that and understand and put a lot of controls around who where when these models can be changed, because you can’t have people. You know, you have to be very specific about who has that ability, who has the rights to do that. So sometimes around governance and, and and access. But yes, they do change. But that’s not it’s a for the most part, the core kind of capital D data model versus the lowercase d, there’s that there’s a difference in how and when those change and how frequently?

Kostas Pardalis 48:23
Yep, super interesting. Eric, all yours.

Eric Dodds 48:28
I’m loving this. I mean, just the concept of, you know, I think, Verl was so interesting to me is that this kind of data is so rarely talked about within an organization. Yet, it’s the kind of data that makes existing data so much more valuable, which is super interesting. I may be wrong with this with this parallel. But one thing that really strikes me as I think about what you’re doing is that in many ways, like you talked about, let’s say, the Creator, who has a ton of context, right? That feels very similar to taking unstructured data and applying structure to it. And when I think about that, one of the, which this is gonna sound very buzzword II, but I start to think about things like machine learning, or graphs, networks where you can discover connections that previously were undiscoverable because everything was unstructured. Is that something you’re exploring something you already do? I just love to hear about that.

Verl Allen 49:42
What you’re hitting on is, and this goes back to what I was kind of he really is kind of connection between quants and creatives and things is that, again, where we’re where we’re laser focused. And I think, you know, as you step back, every organization out there whether they know it or not, actually dealing with this problem. And most of them recognize that they’re not sure how. And I think for too large, we come in, in situations where we’ll be brought in by one group, and they’ll bring other people to the meeting in other groups, the meeting and people sit down and within five minutes are like, Oh, my gosh, yes, it’s a huge, we didn’t, we’ve been, we know, this is a problem, we’ve not talked about it. And we’ve sort of pretended like it didn’t exist, because there’s not a lot of people up from us that really understand it. So we just kind of like shove it and keep shoving it under the carpet. But it’s becoming a bigger and bigger problem, as the scale gets bigger, right. And the gaps are getting wider. But as you’re talking about almost like a neural network, like, in some ways, I don’t, I’m not sure that’s our problem to solve, I really think that what we’re trying to do is empower, like, enable our customers, you know, their machine learning teams, their kind of data science teams, to use the technology are using, but augmented with this data or this information for this context. And that’s, that’s really kind of how we see the world and we don’t care. We’re very, you know, we’re, we look at this as we play Switzerland, this and then we want to make this available wherever it’s needed to be, where it adds value. And we will, we’ll integrate upstream, wherever we know there are challenges in the in getting context into either, you know, campaign data, or performance data or whatever.

Eric Dodds 51:27
That’s super interesting. And it totally makes sense, I’m just thinking about use cases where, let’s say you have a pretty wide set of product lines, you might discover something with the added context about the relationship between product lines, and a particular subset of consumers who meet certain demographics that would be hard to discover just with your basic clickstream and purchase data, for example. That’s super interesting. So you’re essentially providing a data set that a machine learning or data science team could use to draw that conclusion. Super interesting.

Verl Allen 52:07
Yeah, and again, that’s that data. Like you said earlier, it’s kind of stamped on the behavioral or that other data, so it gets stamped onto it in a sense.

Eric Dodds 52:15
Yeah. Okay, one last question because we’re getting close to the buzzer here. Could you just give us one example? So the the marketing and sort of content asset use case makes total sense. You mentioned earlier in the show that there’s some other contexts where it makes sense and some of those pop to mind for me, but I just love to hear from your perspective, what are other areas of the organization where this unstructured context to structured stamping, if you will, makes a lot of sense?

Verl Allen 52:45
Yeah, it’s interesting. We’ve really been pretty laser focused on marketing. But what’s what’s interesting is, we had a we had a situation recently where we were brought into he was, it was interesting is back to kind of think about the Adobe situation where you’re talking about integrating multiple solutions that have been acquired, it was a company that acquired a number of demand side platform, DSP solutions, and its solutions for buying, you know, programmatic buying and then selling media. The problem they ran they, it’s interesting, they were trying to use I think it was Informatica or some other solution, to try and map data together that allowed their, their finance teams to appropriately build clients across platforms. And so Oh, interesting, you don’t exposed inventory across the different platforms, having that one team sell it and having one buyer and the problem we’re running into that came up I guess, it was it became a reason they came to us is the problem got exposed by auditors and was something they were gonna have to disclose their massive public company was gonna have to disclose their financials, it was a couple 100 million dollar problem. They ultimately they’re like, there’s a billion dollar opportunity here that we can’t actually get access to, because we got this underlying data problem. Well, think about not just like campaign create, you know, creation or content creation. But if you’ve got a sales team that is creating sales orders in around inventory, that it’s being bought from different different through different DSPs or different platforms, how do you how do you standardize that down so that you know and are able to associate this stuff you know, that that inventory that the, the fulfillment and that stuff together and that’s really what they brought us in for us to really try to help map that and I think that is probably to me that kind of, it was one of the situations where like, Okay, I would never have thought of that as a salute that as a solution for that. And so it kind of opens my mind. And again, we’ve been so focused on what we’re where we’re going that there are other applications up where you have people interacting with applications. I think it’s a discount side marketing and you’ve got people on supply chain, you’ve got people in on sales and other places. It is the same, there’s a similar opportunity in all those situations. And it’s we’ve chosen to start here, it’s mainly because that’s our DNA. And we see the problem as being something that the customer our customers are seeing as a big challenge. And we and I think we can we just feel like that’s a natural place for us to start and, and we get pulled into other, you know, in the content thing came up through another customer that pulled us in saying, Hey, we’ve got this problem, I think you guys can help us solve it. So that’s kind of how it came through a customer.

Eric Dodds 55:31
Yeah. Fascinating. Yeah, the this, it makes total sense. When you think about like going to your example, with finance, reconciling transactions that have happened, that relate to inventory across distinct siloed platforms, is essentially a mass reconciliation problem. You know, super interesting. Well, verl, this has been such a fun episode, I love talking about data and uses of data and context for data, you know, sort of outside the the standard stuff that we talked about. And it sounds like you’re doing some really fascinating things at Claravine sort of bringing that to light. So thank you for your time. And thanks for sharing with us.

Verl Allen 56:13
Thank you. It’s my pleasure to be here. Thanks a lot, Eric. Kostas, thank you.

Eric Dodds 56:17
I think my big takeaway from the episode is that I really started thinking about context, more verl mentioned context. And he talked a lot about people who are doing certain type of certain types of work, right, they’re producing work. And in that context, it was, you know, marketing assets, right, a piece of content or a campaign. But I just thought about all the touch points across an organization where people are producing work. And the amount of context that’s in their head is unbelievable. And in many ways that sort of is what brings value to the work. And so that whole concept is just fascinating to me about how you sort of mine, like how can you mine that context and actually turn it into actual physical data? You know, in a defined schema? I think I’ll be thinking about that all week. Because just, you know, from a philosophical standpoint, it’s pretty, it’s pretty interesting paradigm.

Kostas Pardalis 57:22
Yeah. One of the things that I’ll keep from this conversation is that first of all, spreadsheets are still the king. Like, they’re never going away. Yeah, like cockroaches man, like you cannot get rid of.

Eric Dodds 57:36
Is that why they named cockroach dB? Cockroach dB.

Kostas Pardalis 57:39
Ah, no, I mean, that’s. Yeah. We had an episode about that. But we didn’t discuss the name because a little bit controversial, but—

Eric Dodds 57:47
Oh, that’s right. That’s right.

Kostas Pardalis 57:49
For me, like, what was like, super interesting is that there are there are roles inside organization like really big ones that we didn’t even think about, like having people that they have to build and maintain data. taxonomies, for example, which is pretty amazing. And together, like with data architects, like you have the people who are creating the end, let’s say, data representation of the whole organization that needs to be communicated to everyone. What I’ll keep from the conversation that we have is that these problems on the end, and I think this is like, not just it has to do with data in general. At the end success is like figuring out the right balance between how much technology can do and how much humans have to agree, and how we can do both and do both well. So that’s what I keep from the conversation. And I’m really looking forward to see when similar products will also hit the market, like for smaller companies and medium sized organizations.

Eric Dodds 58:54
Yeah, I mean, certainly a multinational corporation, the pain point is severe, simply due to size and complexity. And so the problem is exacerbated. But it’s been we’ve had similar problems that every company have I’ve ever been really small.

Kostas Pardalis 59:14
Yeah. 100%. I mean, yeah, I agree. I don’t think that this is a problem for the organization, like very large corporations, just that they cannot survive without solving these problems. That’s different. Yeah. That’s why I’m saying that we are going to see at some point products that they try to address these problems also liking, like stop-outs, or like for smaller companies or like medium size.

Eric Dodds 59:38
I agree. All right. Well, thanks for joining us on the data sack Show. Fun topic for you today, and we’ll catch you on the next one.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 77:

Standardizing Unstructured Data with Verl Allen of Claravine

March 2, 2022

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter