Episode 227:

The Art & Science of Marketing Attribution: From UTMs to Machine Learning with Lew Dawson of Momentum Consulting

February 5, 2025

This week on The Data Stack Show, Eric and John welcome back Lewis Dawson from Momentum Consulting as the group delves into the complexities of data attribution. They discuss the limitations of traditional UTM parameters and the advantages of using hashing methodologies to enhance data quality and attribution accuracy in marketing campaigns. The conversation highlights the importance of stable identifiers, the integration of dynamic parameters from advertising platforms, and best practices for managing metadata in data systems. They also explore various attribution models, including first touch, last touch, and multi-touch, and discuss the challenges of identity resolution and maintaining data integrity across different platforms. Don’t miss this great conversation. 

Notes:

Highlights from this week’s conversation include:

  • Welcome Back, Lew (0:14)
  • Recap of Previous Discussion (1:03)
  • Benefits of Hashing Information (2:33)  
  • Using Hashes for Data Context (4:24)
  • Hashing and Query Parameters (7:24)
  • Static Values for Hashing (11:10)
  • Identity Resolution in Data Attribution (14:36)
  • Methodologies for User Tracking (16:37)
  • Combining Data Sources for Attribution (21:13)
  • Understanding Data Gaps (25:25)
  • Defining Objectives and KPIs (27:50)
  • Identity Resolution Challenges (28:46)
  • User and Session Stitching (32:01)
  • Trusting Ad Platforms (35:23)
  • Defining Attribution (38:09)
  • The Credit Dilemma (40:18)
  • First Touch Attribution Explained (41:47)
  • Linear Attribution Model (43:21)
  • B2C and B2B Attribution Scenarios (45:22)
  • Timeframes in Attribution (47:29)
  • Understanding Lookback Windows (49:34)
  • Google Analytics Changes (51:20)
  • Attribution After Conversion (53:26)
  • Online vs. Offline Attribution (55:49)
  • Discipline in Tracking (58:52)
  • Challenges in Coordination (1:00:12)
  • QR Codes and Data Integration (1:01:55)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

John Wessel  00:03

Welcome to The Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business and human challenges involved in data

Eric Dodds  00:13

work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Blue. Welcome back to The Data Stack. We ran out of time last time talking about attribution stuff, although we did make a lot of progress, but we’re gonna dive right back in. So yeah, thanks for giving us even more of your time. Yeah,

Lew Dawson  00:42

thanks. It’s good to see you again. And

John Wessel  00:45

This is the first three part show for the data stack. This is the first ever,

Eric Dodds  00:49

Yes, this is the first ever three part show, right? Yeah, we have the first one. We had to, we were, we knew it was going to be a big multi part show, which is super exciting. So, yeah, congratulations, Lou,

Lew Dawson  01:00

so long.

Eric Dodds  01:03

So last time we walked through, I was reflecting on this a little bit, really, an immense amount of work to get to the point where, you know, we have these various data sources coming in. So we talked about data structure, data from advertising platforms that have information about your campaigns, your ad groups, your ads, those all contain UTM parameters. We talked about behavioral data coming in so that you can see when a user lands in your website or mobile app and then all of the actions they perform, ultimately culminating, ideally in some sort of conversion event. We also talked about how UTM parameters are what now feels like a fairly primitive way of packaging information about your campaigns into a URL so that that you know metadata is observable by other systems, and we talked about a really clever methodology for hashing information so that you can overcome some of the limitations of that system. So why don’t we start there? So we had just started to dip into the world of talking about the hash. Just give us a quick refresher on why that is so useful, as compared with using the standard, you know, sort of, let’s say traditional taxonomy of the five UTM parameters. Yeah,

Lew Dawson  02:33

absolutely. So. In short, if you recall, there were a number of challenges that you know, highlighted with the traditional UTM prams, like, let’s say UTM campaign. Campaign is a big offender because it’s free form. So you could have a scenario where your campaign taxonomy has a space, or it has some sort of UTM character, or some sort of character in it that browsers, at times, mangle or browsers do differently. So spaces like they can be represented as sent 20 sometimes they’re represented with pluses. Different ecosystems do different things. So you run into the scenario first of all, where your UTM parameters get mangled, shall we say. And so now you have to go to the trouble just to have full proper attribution, to actually standardize those names. So 234-510-1520, variations you’ll sometimes see on a particular campaign name, like you have to go through and figure out, how am I going to standardize those? So those actually point to a single identity for that campaign. So that’s a big problem, right there. So the way that, one of the primary ways that’s solved is with an ID, so like you roll up all those distinct values, so UTM, source, campaign, term, etc, and do a single unique identifier is it also is not easy to manage by any sort of platform. And that’s two, two benefits. One I just described, less likely to be mangled. The other one is that’s now your join Key too. So instead of having to do that resolution and figuring out, like, the standardization of that as you back into your join Key, now your join key is just coming in as part of your data, and it’s much easier. So that’s the main reason why you’d want to, like, look into that new set. Yep,

Eric Dodds  04:23

Okay, I have a couple of questions here. I actually have one point that I realized we did not get to last time, yeah, which is another major benefit of using a hash in because you can package a bunch of information into the hash. And so the concept there would be, you know, you know, this is limit. You know, limited. You know, your mileage may vary, using a spreadsheet to do this. Most companies actually do use a spreadsheet. But for the sake of, you know, the example, let’s say, I guess what I’m saying is there are more stable ways to do this, and a lot of great tools out there that you know, you can build a hashing system with. That, you know, is a little bit easier to govern than a spreadsheet. But let’s say you have this in a spreadsheet, you can actually add as much information as you want, or, as we talked about last time, as much information as is helpful based on what you’re trying to discover, as far as attribution. And so you’re not limited to those five UTM parameters. You can actually, I mean, theoretically, you could actually just have the hash if you wanted to, but regardless, you can start to use it to create a context where you can free up one or more of the UTM parameters to use for other things. One of those is actually that a lot of ad platforms support dynamically pulling in the ID of the individual ad itself when the ad is clicked, which can be super handy. So, I mean, you could just append that into another UTM parameter, but that, you know, then it wouldn’t be packaged in the hash, necessarily. And so what we’ve seen a lot is you actually will use like UTM content, for example, to pull in the advertising ID using curly brackets, because you can package the actual content that you want in other columns in the spreadsheet that are not represented as values in the UTM keys, but are in The hash and so that actually can speed up ad level reporting downstream, because you have the hash and then you have the actual ad ID, you know, in the UTM itself, which is represented on the click, and all that sort of stuff. So I forgot that we didn’t talk about that Lou, but that’s another really clever thing that can speed up some downstream modeling. Yeah, absolutely.

Lew Dawson  06:41

I mean, that’s a great call out. And even another one, like just iterating through all those, it makes it much easier to have stable identifiers for particular campaigns, AD, set, AD, because that also allows you to make changes to that ad in your metadata table. So like, where you do that mapping while keeping the same stable identifier. So that’s another huge one, too, that we didn’t touch on yet, yep. So yeah, there are multiple benefits, for sure, that’s a great call. Yeah,

Eric Dodds  07:13

yeah, the quality control on the input is awesome, like being able to actually control a stable campaign name with a sequence number, for example. You know, it can be really helpful.

John Wessel  07:24

So I have a question, yeah. So when I think of a hash here, I’m thinking of taking some arbitrary amount of information, creating a unique ID that’s like, specifically linked to that information, yep. So in this case, we’re doing that and like a metadata table and then taking that and that, putting that into a query parameter, or we’re doing that somehow on the front end, like through a tool, or,

Eric Dodds  07:50

yeah, you want to speak to that? Lynn,

Lew Dawson  07:52

yeah, sure. So it’s kind of two fold to answer your question. Let me know if I don’t completely address it, but basically, what you’re doing is you’re going to pick the parameters up front that you want to hash. And that’s generally going to be like your identifier, some sort of unique identifier, whether that’s you created from scratch or, you know, we can get into solutions later. But what you do is you generate that unique identifier, like a SHA 256, so something of or even an empty five of a set of parameters. And then what you do is you take that and you put that into So, like in Facebook ads, you put that in for that particular ad as a UT and pram in that particular ad, right? And so whenever the user will click on that particular and as part of the query parameters, those will audit that will automatically be sent that particular identifier to ask your question,

John Wessel  08:44

yeah? So we’re not, we’re not, like, also somehow, because I was thinking like, how do you get dynamic parameters into the b5 hash?

Eric Dodds  08:51

No, yeah. Sorry, yeah, the dynamic parameters you would actually get from the advertising platform through their mechanism, like, on click right, they can insert dynamic parameters. It’s just the point there was that it’s nice to just once you start, you really don’t want to, in my opinion, for the purposes of attribution. Lou, tell me what you think about this. We’re probably aligned. But if you can stick within the five UTM parameters, there are a lot of benefits of that, where you don’t really want to go way outside of that, because then those aren’t honored by every single system. You know. You’re adding more complexity, right? Yeah. And so the way that you can pull in dynamic parameters without having to add a bunch of different, bunch of additional parameters is that you use existing UTM values, because they’re not required anymore, because you can hash all of the metadata.

John Wessel  09:44

So in value one, I’ve got a bunch of stuff crammed into doing a hash, and value two, I may have some dynamic stuff as well as three, four, yeah, five, potentially, yep, it and

Lew Dawson  09:54

it doesn’t really like it does, but it does not matter really what you put in the hash, because. As you both pointed out, it’s static, and it’s going to be stable, and it’s chosen by you in your system, where you’re going to track all these as just the campaigns. And then again, as you both pointed out, there’s the aspect of dynamic parameters which the ad platform will put you’ll configure it in your ad, and then they so, like Google Ads has value track, it’s a bracket, and then whatever parameter, dynamic parameter you want to put associated with that UTM pram. Now, the way, at the end of the day, all those dynamic params you’ll resolve are, in addition to your ad, you also have that identifier, right? So you’re going to use those identifiers, your join key, and then all those dynamic parameters which will come in on your tracking pixel, like your click stream, right? So like, if you use red or stack, you’ll be able to get those at run time, because the ad platform will substitute those in at run time. There’s

John Wessel  10:55

one other question, again, this is probably just from a software background. A lot of times. Like, if you generate a hash, like it changes, if you change the other data, in this case, you generate one time, and if you made some change, you’d probably be better. You’d want to keep it static, right? That’s

Lew Dawson  11:10

exactly what you’re spot on. What I was referring to earlier is another benefit of using the hash, is you pick a static set of values, and then you never change those values. And since it’s your metadata and the covers, like, you can keep those static while adding all sorts of other metadata that you can change while still having a staple ID. So exactly, yep, yeah. And usually

Eric Dodds  11:31

two other quick thoughts on that. And then I want to move on to talking about identity resolution, because that’s a lot juicier. But there are, like, these are, it sounds so simple, but having the nuances here, like there? Yeah, the URL tricks are fascinating to me. Generally, I think it can be a good practice to use, like, an arbitrary UTM value for the hash. I don’t know. Lou actually, do you have a Do you have? We haven’t discussed it specifically. In the past, I’ve used an arbitrary ID, or you arbitrary URL parameter for the hash itself, so that the others, so that there’s more flexibility in using the ones that most systems honor

Lew Dawson  12:10

out of the box. Yeah, I think it’s wise to use wherever possible, whenever possible, use a non standard one so a platform doesn’t step on it, right? Yeah, yes. Like, if you’re putting it in, like, UTM campaign, what if the platform that they put in on steps on it? It’s like, yeah,

Eric Dodds  12:29

right, exactly. So, yeah. One other nice thing, I think this is the last, the last point about the hashing in the URLs. One other benefit, and this is something that I just, I didn’t think about a ton until we, you know, just dug into this from a bunch. But the URL length can become an issue in certain cases, right? If you have a really long URL, I mean, it can often get truncated, or even the way that, you know, certain browser applications may capture it. They may capture like, a truncated, you know, version of it, or, you know, it can create challenges, right? And so the other thing that hash allows you to do is keep a pretty trim URL so that you don’t run into, like, string link issues. I mean, that’s not a big deal if you’re capturing, you know, a URL as a string in a rudder stack payload, for example. But if it’s going into other systems, or if a website or application is, you know, it’s doing something where it’s parsing or interacting with it, or another thing that you don’t think about a ton is, like, if an application or that I didn’t think about a ton, if you have an application or a website that appends a bunch of additional parameters to the URL to actually do things like filtering and other things in application, you get these, like, really long, gnarly, like E commerce search string, yes, yeah, exactly, right. And so again, it just sort of gives you the ability to have these really nice, you know, basically as long as you need, but as short as possible, you know, to sort of mitigate that, right? Okay, I think we have unhashed all of the hash Okay, Luke, let’s talk about identity resolution, and we started to touch on this last time, but I want to dig a little bit deeper. And so if we think about where we’re at in this journey, where we are in the data store at this point, right? So let’s say we have all of our data and we’re using all the URL tricks to sort of enforce quality control. Have good URLs, have our join keys, pick up extra bonus information, if you will, that big can be inserted dynamically from the ad platforms. And so we have all this data in our warehouse, and really what we arrive at is that before we can start really doing attribution, and we’ll talk about what that means, because that means a lot of different things. Actually I’m so excited to chat about that and hear your definitions, but we have a ton of data, and there are. Actually multiple identity resolution problems that we have to solve in order to produce let’s just call, or what would you call it like, a baseline data set? Or how would you describe like, the sort of, let’s say, the end point of prep and the starting point of like, now I can actually begin the work of doing some, you know, insights around attribution, is that a baseline table or data set or, yeah,

Lew Dawson  15:26

there’s kind of not, kind of, there are two things, effectively you need, and sometimes they’re packaged into one, but you need a, you basically need a, here’s a session. So something that tells you, here’s a session that occurred and how the user came in on that session, and then you need a How did that session convert or not convert? Effectively, is the other question you did answer so that it can be in the same data set. So like red, or using red or stack, for example, like their e-commerce spec, you have a page right? So it’s your initial page view event, and then the order completed would be your conversion event, like if it’s an e-commerce company who’s selling stuff, right? So you need those two at a minimum. Now that’s not to say you have to have those two. You could have, like a writer stack page view, and then you could join that with Shopify orders, as long as there’s a way, like some sort of join key that he joined us too, yep. But basically, those are, at a minimum, the two things you effectively need to attribute, yeah, or can I ask, yeah? Go for it. I was, I want to ask a question there, because I think this is and John, I’m interested in your opinion too, because you’ve done lots of that. Both of you have done an immense amount of that type of joining in the warehouse. Okay, there are two ways to do this, and I want to know the best way to do this, because I have tried multiple ways in the past, and I don’t have as much experience with E commerce. But okay, so,

Eric Dodds  17:01

and maybe there are other ways to think about this, but you have, let’s call it a session based methodology, which is where I have some way of following the same user across multiple sessions. You can persist the attribution data. So let’s call it the hash and whatever else, right? So I could persist that across sessions, somehow store that I could have some other way to tie, to tie the user’s behavior together. Right RudderStack provides an anonymous ID. Maybe you want to do both, but essentially you follow that user, and perhaps the actual attribution data itself through the sessions until there’s a conversion. But there’s another way, another methodology, which would be relying on user level identity resolution, so that it’s like, okay, as long as I can get the attribution data on the first on whatever session, I don’t necessarily have to persist it through if I have a way to tie the user to a Shopify order, right? And so then I’m actually, like, looking for some instance of attribution data in a session, and then I can see, oh, there’s an order at some point downstream with its own timestamp, right? And so then I could, I can say, Okay, well, that user came in here, and then they eventually, like, made this order. But in that case, I’m actually would be trying to join on tying the attribution data to the user from the page view event, or whatever that behavioral event is, then I need to tie the actual user data to the Shopify orders table, which means I’m using email or some trait of the user, and I’m running the join that way. Does that make sense? Those two like route methodologies, like I follow a session through or like I capture the attribution data at some point in time, but I have a way to know that it’s a user, and then I tie the user to some conversion data, like an order stable downstream, right? Am I thinking about the broad methodologies of doing that right? Or is there another way to I can speak

John Wessel  19:14

a little bit to Shopify, and I’m sure you know this changes on a regular basis, and we were using it fairly early on, like when Shopify was first, like, bringing on large businesses, large view count, lots of traffic, so over, I don’t know, seven years ago, five years ago, something like that. And, Oh, that’s right, yeah, it was interesting, because, and this might be different now, but Shopify will, like, attempt to do this for you. And there will be, you know, pulling UTM parameters, giving you some session information, like, but it always felt pretty incomplete. And, Lou, I don’t know if that’s been your experience too. So there’s that part of it. And there’s a part like, well, I can do my own, like, a rudder stack type thing, grabbing anonymous ID, etc. Do some SQL gymnastics. Yeah. To get it to work. So we went more that route where, like we did both actually, for a while, it’s like, okay, we’ll just pull the data out of Shopify. How did it attribute it? Let’s try that. And there’s gaps, and we’re not sure you know how to fill the gaps. Yep. And then we went the other way. It was like, okay, fire anonymous ID, collect email address in RudderStack at checkout, which associates with Anonymous ID. And then we did pretty simple attribution models that would run first or last, usually click, usually first and and just use that information essentially, yep.

Eric Dodds  20:36

So kind of both,

John Wessel  20:38

yeah, yeah, honestly, just kind of

Eric Dodds  20:41

my hypothesis about Lou, yeah,

Lew Dawson  20:43

yeah. So you definitely don’t have to connect user data, right? It just has to be user level identity resolution. As you pointed out in your first one, it’s like it can be just session, and, let’s say order, but you are correct, there also can be session user order depending on what kind of metrics you’re trying to derive, which can be a topic we could talk about later. Or, I told these separate data stack conversations right around, you know, like your customer Feature Table. So that’s one thing, and then the other thing I’ll point out, John, you kind of allude to, like, I don’t know, it feels kind of incomplete, if you want to do attribution as well on sessions that don’t convert, right? So let’s say come in really at the end of the day, like you want to include direct Right? Like you want to know how much direct traffic you’re getting through, specifically, both for attribution, like you can attribute to direct traffic conversions. But if you want to see how much direct traffic is coming through, including traffic that’s not converting for like your ROAs calculations or whatever, you can’t get that through Shopify alone, because you can only really tie sessions to conversions versus seeing all your traffic come through. So if that’s one area where Shopify sometimes falls over, like Shopify attribution, as I’ll point out, too, is the attribution recently discovered is different depending on where you get the attribution data from. So rest is the landing site. So the Shopify REST API, the landing site entry like in the order of attribution is different than if you looked at the GraphQL API and you look at resistance, oh, that is awesome. So the Sun does need to keep an eye on it too. Is like attribution actually is different, and it’s not clear what models are being used, like, what attribution models like. It’s not clearly documented. So that’s the pitfall you run into when you want to start getting more advanced, as it’s unclear how things are being calculated, right in different scenarios. And lastly, like, if you combine them, or if you try and combine them, or like, compare them, they’re always going to be different. To be different, which is saying the same is going to be true for things like Google Analytics. Or let’s see, even Facebook ads, like the conversion metrics, are going to be very different in those platforms versus if you were to properly calculate them on your own, which is another thing you know in chat about, yeah, we definitely.

Eric Dodds  23:19

I chat about

John Wessel  23:22

that. So I just thought of something. I think we skipped our URL stuff, but I think you’re talking about direct traffic. We didn’t talk at all about, like, ad blockers or other reasons like, if you know why we might be missing attribution. I mean, you talked about mangled URLs, but, yeah, you’re right, but I feel like that’s one that, I mean, the ad blocker thing comes up a lot, especially like, if you’re in tech or, you know, advertising something with very technical users. So I guess let’s just apply it to the ID resolution. Any thoughts around that? I mean, because Shopify is not going to be immune to that, however they attribute, nor are most solutions. Yeah,

Lew Dawson  24:02

It’s a really good point. And this is, this, is this actually plays into what the comment I just made of. It’s a little challenging to, like, use multiple sources of data like Shopify combined with click stream, because, again, like they they attribute differently, but to your point, which is, so the valid Shopify will capture, will capture more data. So sometimes there’s absolutely a level of data you’re going to lose with Qlik stream to your point ad blockers, pixels will get dropped, like they just won’t fire, they don’t render. They’ll fire too late in the page life cycle, like as the user is leaving, right? Developers didn’t know there’s, like, a, yeah, a pixel API that will fire in the background if you use it properly, so things like that. So if you truly want, like, the most accurate picture, yes, you really do need to meld those data sets, and you will be able to get, you’ll see a subset of Shopify orders that do not have click change. So, like, Completion, yeah, absolutely. And you. You should be able to generally get the attribution data from those, because that data generally will be in the UTM brands, which will go to Shopify, so that’s if the request is going through Shopify, so they’ll be able to capture those, right? But, yeah, it’s a really good point. That’s an even bigger challenge on top, go ahead, because that’s

John Wessel  25:18

essentially what we ended up doing. Was one like, resign, the fact of like, okay, we don’t know how Shopify is doing this. That’s like, we don’t know what model they’re using. The other side too. To your point, like, we’ve got more data, especially on non conversions with RudderStack. It was like, Okay, well, if we have gaps that Shopify can fill we don’t have from RudderStack. Would we rather it be blank, or rather be what Shopify said, and that was pretty clear, like, even though we don’t know the exact model Shopify is using, like, we’d rather know and have Shopify is, you know, yep, data, anything, yeah. And

Eric Dodds  25:52

I think, Lou, I appreciate it so much how you, throughout this whole conversation, have returned us to the just wonderful reminder that it kind of depends on what metrics you’re trying to produce. That’s all, give two examples here. So one would be, let’s say, you know, you’re a business that doesn’t have a lot of repeat purchasers, right? Okay, so people tend to come in, they buy like, one item, and, you know, they don’t ever, you know, or you’re not necessarily building a relationship with them, because it’s highly transactional, right? I mean, transactional, right? I mean, whatever you know, there are businesses out there like that, sort of at one end of the spectrum, right? The other end of the spectrum might be a game where sessions really matter, because you want to understand what was unique about a session in which someone you know clicked on an ad, and then eventually, like, made a purchase, right? And so the difference in how necessary persisting all that stuff throughout the sessions and getting really tight on multiple sessions over session, is really important. That’s also a lot heavier duty modeling to do user level sessions, you know, reporting, you know, that includes attribution data. There are considerations on the front end around persisting that data across those sessions. And you know that that sort of gets heavy handed, right? You don’t necessarily have to do that, but there are situations where that can be extremely helpful because it reflects, you know, the insight that you want to actually uncover about your particular business you know, in the context in which someone you know, clicks something or does a conversion, yeah, absolutely,

Lew Dawson  27:23

yeah. And you know, one other thing, which is small, but we didn’t really talk about, we don’t need to highlight too much. But, like, you might have multiple properties, the funnel into this data too. So, like, you have a mobile app, sure, a web app, yeah, multiple web apps, right? Like, so there, as you said, there are so many variables. So like, this goes back into the people’s part and again, and it’s before people do all this. Like, my strong urging is to define what you’re trying to accomplish and define, like, how deep you want to go, but most importantly, define the KPIs that you’re trying to look at and you’re trying to measure against, and then you can back and do the best solution to measure those KPIs.

Eric Dodds  28:09

Yep, then Yep. And last time we talked about this concept of altitude, which I think is really helpful, right? Like, determine your cruising altitude before you know before you start barreling down the runway. I mean, I

John Wessel  28:23

I think that’s the problem with all of this. Is like, I could totally picture jumping into this and then somebody getting really deep on, like, we’re gonna solve device stitching between desktop and, you know, and mobile app, you know, we’re gonna solve that and just like, really myopically focus on that, yep and miss, like, an end to end solution for just, like, attribution engine, you know, yeah, aside from that,

Eric Dodds  28:46

Okay, I want to talk. I don’t want to dig too deep into identity resolution, because that is, we could do a three hour show, literally just on that, which actually is not a bad idea, because that is a really fascinating topic. Yes. That gets back to the, you know, the customer Feature Table I mentioned, which might be a good yeah, segment altogether, right? Yeah, that is a data resolution so, so, so that we don’t go down a rabbit hole, because I want to make sure that we dig into we haven’t even gotten to attribution models, and so we’ve got to go there. We’ll get to the baseline data set first. But give us just a quick rundown, Lou of how are you? We have all this data. You know, we have all these disparate data sets. Not only do we need to join them using the join key, which is at a very high level, again, as you have a behavioral event that’s tied to a user with a hash value, you have the hash value in your data from the ad platforms. And so you have a join Key where you can pull this together. But the reason identity resolution is a big deal is, actually, I’ll say this, the most immediately apparent reason it’s a big deal is because you have the initial visit from the user that contains the hash that represents, okay, they clicked on an ad. Uh, or came from some source. And then often the distinct time stamped behavioral event that represents a conversion is separate from that, right? It happened, you know, there’s some purchase event or add to cart or subscribe, or whatever that you know downstream event is. And so you need to make sure that you can say, like, this is actually the same user, in order to associate whatever value the conversion is to, you know, that campaign and that it was actually the same user who performed that to avoid, you know, double counting and all that sort of stuff. The other thing that’s actually less apparent, but I would classify as an identity resolution. A related identity resolution problem is that you actually, if you are running a campaign across multiple different platforms, and the concept of a campaign transcends, which is usually the case, right? Let’s just say spring. You know, spring sale 2025. It is my campaign, and I actually want to push that campaign out across multiple different channels. You have to build an identity for that campaign from multiple different data sets, right? And so this is like, which is we, again, that’s one of those things where, if you don’t think about that going in. You think about, okay, I need to tie these user events together. But you also, in a lot of cases, have to tie disparate data sets for campaigns together to create an, let’s call it a campaign entity that includes data from multiple different platforms. And that kind of has to be normalized. Because, you know, let’s say you want to look at how much, what was our return on ad spend across every single platform for spring 2025, sale? And so you have to aggregate that. So that’s my conception. What am I missing? And just give us a high level of how do you begin to approach this again without taking us down another three episode rabbit hole? If that’s possible? Yeah,

Lew Dawson  32:01

totally No, it’s totally possible. Great observation. There are a couple things I’ll clarify here. So for a complex user, yes, you’re right, that definitely starts becoming a challenge, stitching multiple data sets. So a more advanced user, like you said, is going to want to know effectively, like a campaign across multiple platforms, possibly retention, acquisition, engagement could be all those, yes, I first

Eric Dodds  32:26

come in on the web, and then purchase later on mobile, and all those, like, different, you know, ways. Yeah,

Lew Dawson  32:31

challenging. Yeah, exactly. So just to give a concrete example, like, you’re going to want to know how many emails this in a monthly video, how much ad spending did I, you know, did I have on that campaign, etc, right? So, yep, at the end of the day, you’re right. You don’t want to stitch multiple data sets together. So that is challenging. But I would say for the simpler users, this is a little bit less of a challenge. And this, again, goes back to which we will beat a dead horse, but goes back to what are you trying to accomplish? And for simpler users, I don’t think you necessarily need to stitch together all of those channels. In most cases, it can be mainly orders, click stream and possibly depending, again, on what exactly you’re trying to measure, possibly a couple like add channels to look at like you’re spending now, one other thing I’ll point out too, if you are, you’re absolutely correct, correct that this problem is one or more identity stitchings. And that is, you talked about stitching a user, which, in some cases, yes, like you’re stitching a user together and A session. You don’t have to always stitch a user together. It could just be a session, but, oh yeah, to your point again, it still is the identity resolution problem, even for sessions. And then it’s a temporal problem, so you’re stitching one to end sessions over time. So there’s your temporal part together. So you’re effectively going, what’s you know? What are all the sessions that point to a single version, right? So that’s your node. You’re pointing all yours. Yes, that you’re resolving, right? So it definitely is still an identity resolution problem, but it’s, it’s somewhat of a different identity resolution problem, depending on how you’re looking at, how you’re looking at, sorry, depending on what you’re looking at, to measure, yep, as we would say. So yeah, yeah. Go ahead. I was just gonna

Eric Dodds  34:21

say, I you the way you describe that is great, because you have the campaign, you know, let’s say campaign platform ID, res problem. You have the user ID rest problem. Then you introduce the idea of sessions. You could actually just look at sessions or user but then in some cases, you may want to look at both, and that’s when you get, you know, things can get really gnarly, because then you’re looking at tying sessions, not only tying sessions to the attribution data into a conversion, but then also tying users to sessions themselves, right? And that’s, you know, potentially you’re getting into some pretty serious modeling, which I think, you know, does. Zoom out. Is why it’s easier said than done to just say, Oh, well, just tell the marketing team that they can switch over to use, you know, the data that we have in the warehouse, right? Because they’re doing some, like, really helpful things under the hood. You know, we could argue about the accuracy of that, but the sort of session level, user level, campaign level, stuff you get out of the box is like, you know, it’s very hard to handle roles, yeah. And I

Lew Dawson  35:23

I think that’s part of the reason why people, a lot of times, will fall back to the platform to get conversion, which I think is okay, like for a user who’s just starting out, they don’t, there’s a point in time and the life cycle of a business, for sure. That’s fine, right? That’s you just, you broadly care about how much you’re spending, how much you’re converting. Your business is super small, but there’s and there’s a point pretty early on where it’s like, Okay, I can’t trust ad platforms anymore because I don’t know if Facebook is, you know, attributing over the last year. We’ll talk more in a second in our attribution models. But like, shooting over the last year, if that user ever came to my site, it’s counting as conversion, right? Like, yep, yep. You just don’t know. So, yeah, that’s a very easy pitfall to fall into. And you’re like, Oh, this is too challenging. Now we have the idea, but it’s too challenging. Let’s just fall back to the platforms, right? Yep. So go ahead. Okay,

Eric Dodds  36:16

Identity resolution is hard. We’ll do a separate episode on that, by the way, amazing job threading the needle on, not, you know, getting us down a 30 minute rabbit hole there. Okay, so I’m so glad we’re here. It only took us two hours to get to the point where we have, let’s call it a baseline data set for attribution. Okay, so we have joined, we have joined, you know, campaign data from a platform with some user level data or and, or perhaps some session level data, and we’ve done the appropriate level of identity resolution across those different areas that we talked about, you know, appropriate to our cruising altitude for the metrics that we want to produce. Okay? So now we have, like, a table, or, you know, maybe more accurately, like a couple of tables, you know that are joined to produce different metrics and different reporting for attribution. But now I think we have a bunch of decisions to make, because attribution can mean many different things. So I have a ton of questions and thoughts here, but this question is for both of you. So where do you start once you have this data set? And, I mean, of course, like, like, where you want to measure, but like, you mentioned the first and last touch, right? We haven’t even really talked about multi touch. We haven’t, you know, there’s a machine learning aspect. Okay, so actually, maybe we start here. Lou, can you give us a breakdown of, like, what? What are attribution models? I know that may sound silly, but like, especially for the listeners who haven’t done a lot of you know, research on this, or haven’t built a lot of this, what are attribution models and take us from like very basic to, you know, maybe the more extreme end of the spectrum in terms of complexity,

Lew Dawson  38:08

absolutely. Yeah, so just to recap real quickly, attribution, it’s at the end of the day, you’re trying to figure out what channel or channels and my marketing ecosystem contributed to the conversion. So like in E commerce, for example, what channels contributed to the sale of a product to a user? So you converted them for a prospect to an actual customer. So establishing that, now it’s a set up scenario of we have multiple different channels that we have campaigns going on right now. So let’s say, for example, we have Google search ads, that we also have Facebook ads, and then maybe we’re using Klaviyo, so a user setting the scenario, a user searches for Michael company’s product and sees a Google ad. Google ads are super prominent these days or somewhat hard not to click, so you personally or you intentionally click on one, right? So now you go to that website and you establish that, okay, me as this anonymous user, I’ve come to this website having clicked on the Google ad. And you’re like, Ah, crap. You go back, I didn’t mean to click on an ad. So later, you’re on Facebook, and you see my company and add again for the same campaign that they’re running on Facebook. And you say, click on that. Well, now you’ve come to the website again, but this time, instead of coming from Google ads, you’ve come from Facebook ads, and you’re like, Okay, actually, maybe this product is cool. I’m gonna buy it, right? And so you actually do go and buy it. Well, now who gets the credit? Is the ultimate that’s a nutshell behind, you know, like application models. So like, yeah, to your point. So there’s meta conversion now, but there’s been two distinct events on two platforms that have contributed to the sale of this product. Yep. So who gets the credit? That’s where attribution, the various attribution,

Eric Dodds  40:17

yep. I just wanted to say, Oh, of course, marketing gets the credit for that.

John Wessel  40:24

Yeah, that’s cute. But, I mean, think about it like, it can get really wild if you’ve got, like, if you have, like, a sales team involved too. And like, we’re talking about not coming anymore, but maybe, like, SaaS, well, the sales talk to them, and the marketing did this. And like, I mean, you can, yeah, wild with an attribution model. So

Lew Dawson  40:40

goes back to the people problem I alluded to. Yeah, people are defensive about their KPIs. Everybody wants credit. Yeah, right, yeah, when they’re tied to their budget and their bonus.

Eric Dodds  40:51

So first touch, what are the various basic levels? Yeah, first and last touch, which is sort of the most basic. So, yeah, go ahead, yeah. So can you unpack those in the context of the scenario that you just gave? Yeah,

Lew Dawson  41:08

exactly. So last touch is the most common at the end of it’s probably one of the most common. But basically, in the scenario I laid out, the user first clicked on Google ads, then last second right before the conversion, they clicked on Facebook ads. So in a last touch paradigm, Facebook ads would get 100% of the credit for that conversion, for that sale, because that was the last thing that the user clicked. Conversely, if it was first touch, Google Ads was the first thing they clicked on, yep, so that will get 100% of the credit for the conversion, because that was the first thing they clicked on. And

Eric Dodds  41:47

so just to play that out, when we’re calculating return on ad spend or row as in last touch, you would basically say, okay, Facebook has a really good row as but Google doesn’t, because we are running a last touch model, and Facebook’s getting 100% of

Lew Dawson  42:05

the credit, yeah, so in that particular scenario, just like if you were just doing those two things for that, that one user, yeah, exactly, Facebook would have 100% and Google ads would have 0% yes. Okay,

Eric Dodds  42:19

now multi touch.

John Wessel  42:21

Here’s a funny question that I’ve never heard any stats on. So, you know, you know, like back in the day that, like almost everybody did, the little question, how did you hear about us? Question, right? So what do you think the stats are? If I asked that user, I saw Google, saw Facebook, clicked on Facebook, and he said, How’d you hear about us? Google’s a choice, Facebook’s a choice. And maybe you could be fancy and dynamically only populate those two choices. Oh, what do you think? What do you think the stats are on something like this? Do you think most people are gonna go with like, Well, Facebook, or they won’t know other, other. That’s just that laziness. I’m saying you are dynamic. Yes,

Eric Dodds  43:00

yes, yes, yes, yes. This is maybe a product

John Wessel  43:03

that we’ve just invented here.

Eric Dodds  43:05

This is a good product.

Lew Dawson  43:07

That’s really interesting. I bet it would, no, I’m willing to bet money it would not be accurate to what actually happened. Yeah, right,

John Wessel  43:15

right. Yeah. People are notoriously Yeah. And even when they’re trying to be, like, inaccurate about that? Yeah, totally multi touch.

Lew Dawson  43:22

So linear attribution is probably one of the more common of the slightly less common Commons, and linear attribution is everything that was touched gets equal credit. So in this case, now with linear attribution, Google ads would receive 50% and Facebook ads would receive 50% so the thing I’ll add to this that seems like the way to go, on the surface like it’s like, oh, well, that’s way better, right? And actually, I believe it was. I had a conversation with Eric a long time ago about this, and asked him, like, which one do you recommend? I think it was you, Eric, and you’re like, we recommend, we don’t recommend linear attribution. I’ll just throw that up front, because that ultimately leads to infighting among business people. Yes,

Eric Dodds  44:12

I do remember this conversation, yes, yeah, right,

John Wessel  44:17

as in, the other ones don’t lead to infighting. Well, bold,

Lew Dawson  44:21

right? Like they don’t exactly like they all do, but this one in particular, because people start like, people start thinking they don’t get the proper credit, and search more than ever, and people start fighting over it. And sure enough, yeah, like, I have seen that happen before now, where it’s like, even though it seemed good on the surface, like, at the end of the day, like it’s not such a good idea, yeah, and a couple more complex to calculate too, which go ahead, yeah.

Eric Dodds  44:47

Well, I want to get into that, but just a couple examples I remember in this conversation. Then let’s take a B to C, and A B to B example. So in B to C, let’s say you have, you know, a paid search team. Team. Let’s say you have, you know, a team that is doing paid social, and let’s say you have an email team, right? And so you can imagine that, you know, the paid search team, you know, let’s just imagine a sequence where the paid search team is getting a bunch of initial clicks, you know, following what you said. You know, maybe paid social is actually driving sign ups for the newsletter, or, like, sign up for a coupon, and then the, you know, life cycle team, or the email team, is, you know, actually sending messages to this user to stay top of mind, and they eventually click on a link in an email and they make a purchase, right? And so the challenge is, like, you know, the Google team saying, like, Well, I mean, they wouldn’t have purchased if they didn’t know about us, and we’re creating all this awareness, and we gave them the first brand experience, and the email team is like, well, we’re optimizing to the point where they actually convert, and if we weren’t doing that, they wouldn’t actually make a purchase, right? And it’s like, well, the challenge is both of those things are technically true, but if you have different teams optimizing towards different KPIs within that framework, that’s hard on the B to B side, you know, it can be tricky, especially when you have a sales, you know, supported motion where, you know, maybe you are serving much of ads, maybe you have a free trial in your product experience driven by the product team, but then you have an SDR that reaches out and actually books the meeting, you know, with the sales person who closes it right? And so it’s like, okay, well, it’s the same scenario, right? Right, exact same scenario, but let’s talk about calculating. Like you said, it’s really hard to calculate, so dig into that a little bit for us. Yeah,

Lew Dawson  46:30

so it definitely creates a lot more work, and there’s a lot easier to get wrong, and creates a lot more testing to try and do multi touch, because you no longer just at a high level, you’re no longer looking for a boil as down to you no longer looking for a min timestamp or a max time stamp, right? Effectively, that’s

Eric Dodds  46:56

such a good way to describe how it gets more complex. Yeah, exactly.

Lew Dawson  47:02

So now you’re looking for a distinct set and remember there’s a temporal problem. So you’re looking for a distinct set of a distinct set of attribution traits over time, and then you have to aggregate them all together, and there’s a temporal problem again, so you’re doing that over time. So it really just is a lot more complicated to calculate, and

Eric Dodds  47:31

you introduce a lot of decisions. So I just, I hear you talk about that, and you say, you know you have to have, you have to pull together a sequence of distinct timestamps over some period of time, right? And so the immediate question that comes to my mind is, what period of time, right? Exactly day is that an hour? Is that a year, right? And I mean that, so talk through that a little bit right, because that’s non trivial, both in terms of, you know, the actual reporting that you’re going to produce, but also, if you think about longer time periods, you could have an immense number of touch points you know, which you’re talking about, large data volumes, all that. So how do you know? Walk us through those questions, and actually, Lou walks us through those questions in terms of, there are some established time periods in the ad platforms themselves, which can be initially helpful, but generally becomes problematic pretty quickly.

Lew Dawson  48:34

Yeah. So the biggest one, which I believe is you were being kind enough to set up at Lutoo, was the look back window for a particular model, right? So it’s as Eric was alluding to, okay? It’s a time based problem. So how far do you look back? So If you know of that conversion from the Facebook ad of conversion, I converted at a specific point in time. So how far back do I look to attribute because, let’s say, for example, that Google ad I clicked eight days ago, right and then that Facebook ad, obviously I clicked when I converted. So do you include or exclude that in linear Do you include or exclude that Facebook attribution again, like it’s you have to answer the question of, what’s my time frame, because you included if it’s within the time frame, and you exclude it if it’s outside the time frame. And I think I said Facebook there, but I meant Google, sorry. Oh yeah, yeah, in the original Yeah. My apologies. Yep. So that’s the biggest issue. I look back. Now, if you think about that in terms of each, think about that in terms of linear attribution. Now you have to figure out what are all distinct points of attribution within that time window, and you have to take a snapshot at each conversion. You have to look back that many days, right? So it’s a primary conversion. And you have to look at the time window for that commitment, right? So it becomes computationally pretty complex very quickly, yep.

Eric Dodds  50:06

And so the ad platforms, like you said, you can go in and look at conversion data in the ad platforms themselves. And maybe this is a good opportunity to talk through. One, there are sort of built in look back windows. And then two, why do you eventually not want to rely on the conversion data in the ad platform.

Lew Dawson  50:26

Yes, great call. Sorry you mentioned that. I haven’t touched on that yet. I packed

Eric Dodds  50:30

a bunch of stuff into that one question, and you had trouble going and doing attribution on the original first touch question. Yes, most.

Lew Dawson  50:37

There are some more common ones so that 714, and 30 days are the more common ones I believe I’ve seen. I think probably 14 or 15 days, they’re usually the ones I’ve seen most people settle on. So like the last few weeks. Yep, there are benefits and pitfalls to each one of those. So the further back you go. So it’s like one caveat, one side note real quickly. This was one of the reasons why Google Universal Analytics, so Google three, Google Analytics, 360 was terrible at it by default. It was six months, right? So it’s basically covering everything, yeah, by default, wow, yeah. They changed GA four, yeah. So, yeah, wow,

John Wessel  51:22

no, yeah, six months.

Lew Dawson  51:24

I’m pretty sure it was pretty long. I think GA four went to 30 days, if I remember correctly. So it’s better, but basically, you could argue, and everyone has a different opinion on this, but like, there’s a certain point time where, like, you should not be attributing a like, 369, months back, visit to a bot, right? So you have to make that decision and calculate that. And those are, I would say that’s a decision, and those are some of the more common ones. So,

John Wessel  51:51

and then in E commerce, you can have multiple conversions, right? So, like, you could, if it’s like, if you’re set to first and then, like, you know, got a first impression or first click from Google, yep, then they buy like, 10 things in six months, like you’re just racking up on that one Google you know impression, as far as your return on investment, right,

Lew Dawson  52:11

right, yeah, yeah, that’s a great point, yeah, which I

Eric Dodds  52:15

I mean, actually, it’s an interesting point. You may want, you may actually want to have that view when you think about something, if we think about, and maybe I’m, I’m getting a little ahead here, but if you think about answering your question like, okay, which channels bring in more users who are high lifetime value users over a longer period of time, Right? So we’re not trying to answer what’s driving the conversion. We’re just saying, okay, when someone first experiences our brand, which channels are the ones that tend to produce high lifetime value users over time, right? You actually do have to look over a long window. You know that can be problematic in the ad platform itself, but again, I’m probably jumping the gun on metrics and reporting. That can be a challenge in the ad platform itself, right? If you’re trying to look over a longer period or even get the lifetime value data, you really have to do that in your own data store. Yeah,

Lew Dawson  53:08

for sure. And this sort of get, I’m glad you brought that up, John, I did. I didn’t even highlight that one too. And that’s actually another decision point, right there is you have the option to include or exclude attribution once a conversion has occurred, right? So, like, that’s another, that’s yet another decision points,

Eric Dodds  53:25

deciding what are the distinct timestamps, right?

Lew Dawson  53:29

Right? Yeah. So, like, if I convert and then, you know, to John’s point again, like I convert in a day or two, or I buy another product in a day or two, and I technically have nothing new in there. Like, do my old attribution points count if they’re still within the window, like, Am I still attributing that to, you know, the Google ads and the Facebook ads, or is it once I get a conversion? Now that would be direct, because there was nothing new in there. So you also have to make that decision when writing your model too. Now I will say the last point real quickly, like I’ve generally seen it, where once a user converts like that, you don’t attribute things in the past again to that, but you can, right, it’s been business, but go ahead. John, yeah,

John Wessel  54:10

I was, yeah, that’s super interesting, because I was thinking by channel. And I guess I’m just wondering out loud, Have either of you seen any like, robust studies on like, multi touch attribution, where somebody’s actually trying to study like consumer behavior and understand like, you know, per channel or per you know time frame, like, what actually you know, makes more of a difference versus in aggregate? Yeah, right, yeah. In aggregate, yeah, not Yeah. So I just don’t know if there’s any models out there that claim to have, like, we studied consumer behavior, and this model is like, you know, yeah, more accurate because of that, yeah, I’ve,

Lew Dawson  54:49

I don’t know. I don’t know, but I feel like letter stacks are in a pretty good position to study that if they can get access, you know, like, work with enough of their customers to look at that data like you could, you got. To start figuring that out. Yeah, yeah. Got 20, 3050, 100 customers on board to study that’d be interesting.

Eric Dodds  55:07

Yeah, yeah. That is really interesting. We’re just coming up with product ideas. You know, number two here, I will say it does also get interesting. You know that when you know, generally, if it’s worth it to understand that for a company on a fairly detailed level. They tend to be a larger company, and they have a lot of channels. And then you introduce a host of other challenges right around things like television advertising, right sure, which, yeah, you know, you start layering in those components. And then, you know, the situation gets even more complex, well. And then at that

John Wessel  55:38

point, from a consumer behavior standpoint, do you care? Or do you just go into ML and AI stuff?

Eric Dodds  55:43

Yeah, yes, okay, that’s a great, that’s a great segue. Or, actually, I mean,

Lew Dawson  55:48

you know, before you go off that, yeah, real quickly. I mean, now you’re getting into, which is a good point, right? So it’s like, online versus offline distribution is, would be the official term. And you’re right? Like, there’s marketing mix modeling and then, and it tries to model for some of that, right? So that’s a whole nother paradigm which companies potentially try and get into, too. If they do, like, print ad like, customer walk-ins at their physical stores, they try to keep track. Like, yeah, that’s a whole difference that has a whole another layer of complexity to the soul paradigm too, yep, for sure.

Eric Dodds  56:23

So we talked about linear multi touch attribution. Let’s quickly talk about weighted multi touch and then, and then dig into machine learning and, you know, more probabilistic components.

Lew Dawson  56:42

Yeah, so, so weighted, weighted V generally, what I’ve seen is you would weight the more recent ones in terms of percentage with you’d give them a higher percentage. So the last click wouldn’t get 100% but it would also get a larger percentage, a higher weighted percentage, then, you know, so, like, Facebook ads would potentially get a higher weighted percentage than Google ads in going back to our example again, and a weighted percentage. And you know, that becomes challenging once again, if, like, you have 23456, different channels or campaigns, right? Like, yep, well, like, people will get angry if they were earlier in the cycle, but get less credit. So I think that’s again, just highlight, like, one of the challenges of some of these more exotic, shall we say, calculations, in addition to the fact that, like, that’s yet again, that’s even more complex to calculate, because now, like, how do you choose percentages for each one time, right? Like, it’s, you have to come up with some sort of mathematical model or buy one, yep.

Eric Dodds  57:51

Okay. One brief side note, I did think of another URL tip, which is actually, I mean, I guess tip isn’t necessarily there we talked because I think it’s the most, you know, it’s on the surface, just the most straightforward use case, because you have to put a URL and the parameters into the ad platform, you know, so that you can track that when someone clicks an ad. But there’s also a huge benefit to being disciplined about doing that on all of your own channels, right? So you could, like, two main ones are email or SMS, right where you’re sending a message to a user through your own platform. Now, a lot of those tools have some level of attribution with it, but if you want to do multi touch attribution or, you know, explore machine learning, having the same join Key makes things way easier. And then another big one that is so easy to miss is things that are like in app type things as well, where you may consider that an experience, an experiment, or a touch point, or something like that, you know, you can include as well. So that’s another thing. It’s not just like a push notification, sure, yeah, something that’s, you know, sort of going out from the app itself. Or, you know, maybe it’s like a section of the app that’s, you know, promotional, or whatever that is, right, ubiquitous tagging, I guess would be the concept there.

Lew Dawson  59:07

Yeah, that’s such a really good point. I didn’t touch on that at all. Fantastic point. I, when I was talking earlier, mentioned doing it unique to, like, add campaign ad set level. Like, I guess I briefly touched on with the campaign like that is that would be the campaign level, right? Pretty much. Yes, okay, I have a unified campaign, like new product X, that I want to advertise across both retention, email, SMS, etc, and new customer acquisition, select prospects. Yeah, you might want to, you’re right. You might want to track that as a single identifier across multiple channels exactly, and join that later.

Eric Dodds  59:50

Yep, affiliate as well can be helpful too, right? Because again, like it kind of goes back if you think about a campaign as abstracted across. As a channel agnostic, having the hash join key is really helpful, but it’s easy to forget, and it takes a lot of discipline. But if you are disciplined about it, it can be really good. And

John Wessel  1:00:13

The challenge too, is in a smaller scenario, like Lou was saying, you know, you probably just start with the platforms, but then you get to a larger scenario, then you have a scenario, then you have more teams. So now you’re trying to coordinate this stuff across teams. You’re not just standardizing one team, like your, you know, team that’s working on email, yep, like you’re saying, standardizing a bunch of teams to all do it the same way that itself is a challenge.

Eric Dodds  1:00:35

I’ll tell you one thing that I’ve done in the past that’s really, I mean, and this is probably a good insight into me, into the, you know, into me as a person, and probably actually both of you as well, because I know both of you pretty well, but events are actually pretty tricky, because it is actually something that happens at a distinct point in time, but is like very manual to, you know, it just, it’s essentially manual data, even if you digitally, you know, scan someone’s badge or whatever it is, I mean, right? They put their name in on an iPad or whatever. But I’ve actually generated synthetic events, you know, to send into the data store that has a tagged link with a hash, because it’s so much easier, you know, to represent that as a time stamped event, right? Because if you think about, we just talked about, multi touch attribution, it could be, you know, they click on an ad, maybe they get an email, maybe they come to an event, right? And so synthetic events can actually be really useful for representing things that are really hard to timestamp, or offline, data that doesn’t come in a format that is easy to timestamp. And so, yeah, that’s another,

John Wessel  1:01:47

That’s the beauty of QR codes that everybody discovered in 2020, right?

Eric Dodds  1:01:51

That is true. I mean, it’s so funny. That’s, yeah, QR code. QR

John Wessel  1:01:55

codes can also have, you know, hashes and URL parameters added to them. The

Eric Dodds  1:02:00

Data Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com