45: Open Source and Attribution with Ophir Prusak of Codesmith

On The Data Stack Show this week, Eric and Kostas are joined by Ophir Prusak, head of growth at startup Codesmith, a boot camp for software engineers. For nearly a decade, Ophir has been the initial marketing hire for a number of startups and has a variety of experience working with open source tools. He contributes his thoughts on the open source ecosystem and offers perspective on data attribution and the resulting conversations between engineers and marketing.

Highlights from today’s conversation include:

  • Ophir’s decision to switch from software engineering to marketing and riding the startup train (2:39)
  • Open sourcing in the world of software (5:55)
  • How open source has changed Ophir’s life as a marketeer working at startups (10:28)
  • Chartio’s sunsetting drove Ophir to search for a data tooling replacement (27:27)
  • Discussing trends in adoption of tools for small scale and large scale companies (35:01)
  • Data challenges related to attribution–how wrong do you want to be?  (44:07)

The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription

Eric Dodds  00:06

Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to the show. We have another interesting guest this week, Ophir Prusak. And he started his career as an engineer, but has worked in scrappy marketing roles, doing data-driven growth at early stage startups. And I love that role, because I have played it myself. And I know how much you have to really have a hybrid approach, both in terms of actually doing the tactics of marketing, and getting very technical with the data. So I think he’ll have a really great perspective. I think the thing that I really want to talk to Ophir about is that he has a lot of experience with attribution. So if we have time in the show, I would like him to give us just a little primer on marketing attribution, because I think about working with engineering and just the discussions around attribution, and so many things that I know now that I wish I could have articulated to the engineering teams I was collaborating with. So that is my burning question. Kostas, what do you want to talk to Ophir about?

Kostas Pardalis  01:32

I know that Ophir has done a lot of research around data related tools. And it would be great to hear about this experience. And the reason that he did that, especially because he’s a marketer, right? Like his background is in marketing. Of course, he has started as an engineer. But I really want to see the perspective of someone who is not, let’s say, involved in maintaining, or having to set up and all that stuff like the actual software, but he’s the main user and to what’s his experience around that? And how he sees the landscape today, and what changes have been happening these past couple of years.

Eric Dodds  02:16

Great. Well, let’s dive in and talk to Ophir.

Kostas Pardalis  02:19

Well, let’s do it.

Eric Dodds  02:21

Ophir, thank you so much for joining us on the show. In prep, we had way too many topics to cover that were interesting. So why don’t we just start at the beginning, where we always start. I’d love a brief background on you, your history, working with data, and then what you’re up to today in your day job?

Ophir Prusak  02:39

Sure. Thanks a lot, Eric. So I got into computers at a relatively young age; I got a computer when I was 13. I studied computer science in college, and started my career as a software engineer. I worked for a few years, and then I made the interesting switch to marketing back in 2005. I simply wanted something more creative. So I’ve always considered myself kind of more of a technical-oriented marketer and looked at marketing problems from more of an analytical perspective. And then around nine years ago, I joined a small startup as head of marketing, and really, I’ve been riding the startup train ever since. So for the past, I want to say nine years, I’ve really been working as kind of head of marketing or initial marketing hire at a lot of startups. And as part of that process, it’s always been about, well, how do we make sure that we have in place the infrastructure to be able to track what’s going on, measure what’s going on, and ultimately become data-driven. Especially for me as a computer science person, it’s always important to be data-driven. And along the way, I’ve just had to solve so many problems about, how do I make this organization data-driven? It’s obviously different for different types of companies, and learning the different options in terms of tools out there. Currently, I am head of growth at Codesmith, which is a software engineering boot camp. And what’s led me more recently to really get more into this is that we have been using a tool called Chartio, which is going to be sunset in a few months. So while looking for a replacement, I simply found there are just so many new tools and options out there. And it’s a little overwhelming. So I created the website DataToolReview.com, and it’s just been kind of a fun ride.

Eric Dodds  04:32

Very, very cool. There’s so much to talk about, and I cannot wait to talk about your search for data tooling just because it is a crazy world. And I think hearing your perspective as someone who’s recently gone through an evaluation process, it will be really cool. One thing I want to hit before that though, is something we chatted about before we hit record and that is a trend that you’ve seen around the progression of software being open-sourced in different parts of the toolset. And I think your perspective’s really interesting, as a marketer doing growth at early stage startups, especially in technical marketing, or some people might call it growth hacking, etc. It’s all about being scrappy, and using tools to get the data in and have visibility into what your experiments are doing. And you sort of have to play the role of data engineer and marketing ops and the one leading marketing, because you have to be small and scrappy. And I just think that’s such a wonderful experience. But getting back to the topic of open sourcing in the world of software, tell us about the trend. I think this is such an interesting conversation.

Ophir Prusak  05:55

Sure. So, I am thinking back to 2005, actually, when I got into marketing, and at the time, the only open source tools that I remember, were definitely kind of for developers. And it’s always been, I feel, the case where developers help each other, and it’s really the more people using the same tool, the better it is for everyone. But on the marketing side, and even more so on the sales side, I want to say it’s somewhat of a zero sum game, I like to call it where if one person is getting a sale, that means another company isn’t making the sale. So there’s always been, I feel, more competition, and a lot less sharing of, well, we should all use the same tool. But there has been more of a looking for the differentiators and kind of what’s different and what I can do to get ahead opposed to the other person. But what we’ve seen, or what I think I’ve seen, at least over the years is more and more tools, which ultimately are becoming open source. And I think the natural progression of purely developer tools to kind of analyst tools or data stack tools makes a lot of sense for two reasons. First of all, I think more people want to help each other in terms of having great options. And I think looking back, I remember Redash and Metabase being two of the first tools that I remember being open source, which were actually really good in terms of solutions. And at one company I even used Redash for a bit. It’s great that it’s open source, and anybody can use it. And what I think we’ve seen is that over time, there’s kind of a shift, more and more into the world of let’s say, the marketing or I don’t know any sales tools that are right now open source, but really, there’s there’s a shift more of let’s try to help each other. And the other aspect, I think also is in terms of purely a distribution model, from a sales perspective, if I’m going up against some established players, in terms of whatever software it is I’m trying to sell, having an open source solution allows, I think, a way to get into the market much easier than just being another competitor.

Ophir Prusak  08:05

So in some ways, if there’s let’s say, maybe look at Looker as an example, which I think is an example of a very established open source tool. And there is a “Looker open source alternative” I know that came out recently, again, in order to kind of gain more momentum, and it just makes so much sense from the people trying to create these new tools both from … it’s a win win for the developers or for the analysts, and also for companies trying to kind of get into the market, it’s just a great way for people to break in. They can start small. Usually, it’s a service, which you can pay a little and there’s a cloud version. Once you get a little bigger, you can use the open source version. And once you get really big, you’ll usually use the enterprise version of it. So it just makes sense to me that there’s this kind of movement. And I think over time, we’re going to see more and more tools having the open source model, because it’s kind of one of those win-win situations. It just makes so much more sense.

Kostas Pardalis  08:58

At the start of the conversation, you mentioned Chartio and that the tool is like right now in like a period where it’s going to sunset pretty soon. So I think that another benefit of having open source something out there, it’s also what happens when the team behind or the company behind the product decides to take a different course right? Companies have the opportunity to use the open source self-hosted and even keep maintaining it. There are a couple of cases like this, especially in the database space, because databases are like pretty hard pieces of software to build. But a very good example of this is like RethinkDB, for example. That’s exactly what happened. The company decided that they cannot move forward. They open sourced the code and the community decided to continue supporting CouchDB, it’s the same thing. So that’s another thing and I think Eric, you have also seen especially large enterprises, which sounds a little bit controversial, to actually be interested in a company that has an open source project because they know that they can pay for the product and there will always be continuity in the product if something goes wrong with the company. So I think that’s another important reason why open source is important. But Ophir, you as a marketeer, and you’ve been in this space for quite a while now, how do you feel that your life as a marketeer has changed because of open source?

Ophir Prusak  10:28

That’s an excellent question. So, I want to say as a marketeer, or I should say as a marketeer who has been working at startups. So for me, it’s not just a question of being a marketeer, but also having to be scrappy, and find solutions where either me or me and kind of one other developer can really, you know, lift this off the ground, I want to say, a lot of the more advanced tools really have two options, you can either pay for an enterprise tool, and I’ve seen a lot of tools, which do some really cool things. But the entry level for paying for them is, you know, its annual contracts only starting at 10k a year and above, which is just out of the budget for a lot of the smaller startups. So your option is either to go with a very simplistic tool, there’s a lot of things which might cost you, say, $99 a month or something. But again, you’re limited. And so, in many ways, going down the open source route is kind of your only solution of getting a more flexible or more mature product without having to pay an arm and a leg. And because it’s also open source, having a model where there is a cloud version, which is relatively inexpensive, makes a lot of sense. They’re not going to charge a ton of money if there’s an open source version, because simply they can’t. So I do think it’s definitely and I think Redash and Metabase are two tools that I’ve used in the past, which are good examples of this. I know even though Redash was acquired by Databricks, it’s still a great solution where you can kind of start easy, you can play around with it yourself. And also, as you’d mentioned, because there’s none of that worry about, well, what happens if they raise their prices or something, I can always use the open source version. So I definitely think it’s helped me a lot in terms of seeing the different things out there. And definitely for me, I do see more and more over time, seeing open source, especially I want to say open source projects, which have a cloud version. I think another good example is Preset and Superset, Apache Preset, you know, that’s an open source project. Superset, which is a relatively new service, is basically Preset as a service. But it gives you that, “I want to upgrade” path if you can play around with it, starting with relative ease to see if it works for you. And then if you do grow, you can always have the choice either to pay a little more, or to host it yourself. So I think that’s another thing in terms of new tools being out there that you always have the option, you simply have a lot more options. I think when you go down the open source route.

Kostas Pardalis  12:59

Yeah, it makes sense. I think you put it very well. I think one of the main values of open source in general is choice, either as you described it, or as the choice as a developer from the other side, right, to have access to the source code and even extend it if you want. Like what a big part of the value that open source brings is this kind of flexibility and the choice and options. Eric, you as a marketeer also, do you remember what was the first open source project that you used?

Eric Dodds  13:34

That’s a great question. Let me see here, any sort of technology would probably be WordPress, but that’s one of the most pervasive open source projects in the world and drives a huge amount of the internet. In the data space actually, it had to have been Analytics.js from Segment way back in the day, when they open sourced it. And we’re doing data integrations. And they obviously built a very large successful company in a really short amount of time. But there was certainly a period there where Analytics.JS was a very interesting, useful technical tool for data integrations back when they were Segment.io.

Kostas Pardalis  14:22

That’s interesting. So it was a data related product, which I think makes sense. I think, in general marketing, let’s say, it’s somehow the function that drives a lot of innovation or it’s like let’s say the first function to adopt and try things, right, like even with new things like reverse ETL. I would say that the most common use case around it is marketing and then might be like sales ops, right? So that’s something very beneficial that marketing is bringing to the industry and I don’t think we recognize it enough.

Ophir Prusak  14:56

If I may, Kostas, I do want to add a little point to what you’re saying. Going back to what I was saying beforehand, I think marketing has to try new things. I think the whole point of marketing, it’s so crowded today that if you’re just doing what everybody else is doing, you’re not going to get awesome results. And I think the only way is to try something new. And very often to try something new, you do need those new technologies. Which is the opposite I want to say for developers, where if the more people using technology, the more stable it is, the less risk it is. So definitely marketing is always going to be at the forefront of let’s try something new, because it’s different. And definitely, whatever new tools are out there we’re gonna try.

Kostas Pardalis  15:39

Yeah, absolutely. But Ophir, I have a question. We are talking about open source. And we’re talking about marketing being like in the, let’s say, the forefront of like trying new things and all that stuff. Are there any marketing platforms that are open source right now?  Does this thing even exist?

Ophir Prusak  16:03

The short answer is, yes, it does exist. But the fact that I don’t remember the name of the platform, just to show you how, unfortunately, I think for an open source project to really succeed, you need a lot of people whose day to day job depends on that technology, who are themselves highly technical. So if you’re a developer, then yeah, it’s fine if I do some of my development on this open source project, if I am a data analyst, but I know Python, or I know software development, then sure, if I work on that project, that’s fine. If I’m a marketer, then I’m not going to be contributing to that project. So I think the big problem with that, and I forget the name of it, because I looked into it, and it just was not very mature, and it didn’t seem to really be taking off. It’s because of exactly that problem. But I don’t believe we’ll ever see a truly open source solution, which is going to replace a HubSpot or a Salesforce simply because there’s just not enough people whose day to day job is working on that.

Eric Dodds  17:08

Yeah, I think Pimcore is one of the big ones that comes to mind. And I know that off the top of my head, just because I recently did some research on open source, like marketing and sales type SaaS tools. And there are a handful of other ones out there. But I think you’re right, Ophir, one way that you frame this, and I really liked the way that you described the arc of what’s happened is, a lot of open source stuff was initially tooling for developers, right? So you think about things like Git and the whole ecosystem around that. And it was really developers working together to figure out, okay, we’re all doing very similar things here. How can we create some sort of standard and the frameworks and then tools that result from that, that benefit everyone and make everyone’s job easier, so that we can, you know, focus on stuff that matters more. And then you start to see that happening in the data space. And I think it makes a lot of sense in the data space, because by nature, data requires a lot of integrations, there’s a big ecosystem around data, which I think lends itself to open source. And then also, especially when it comes to analytics, everyone’s reporting is a little bit different, depending on the business, but people are trying to build the same fundamental reports to understand how their business is operating. And I think those conditions create a healthy environment for open source. Whereas if you think about an email marketing tool, it just doesn’t seem to me that there’s ever going to be a commercial or an open source tool that achieves the level of commercial success that Salesforce or Marketo, or other sort of traditional marketing and sales SaaS tools. But what do you think, Ophir? I mean, you seem to think that that’s never gonna happen?

Ophir Prusak  18:57

Yeah, I think this kind of goes back to what I was saying, where, in certain fields, differentiation is a competitive advantage, which I think sales and marketing are those fields, while in something like software development, or even your data stack, I don’t believe that’s a competitive advantage, in terms of the way to develop people working on it actually see it. From their perspective is that the more people working on it, the better. So I think that’s why we’re not going to see something like Salesforce … let me rephrase that, there is SugarCRM, and there have been people who have tried this, but I don’t think it’ll ever be able to compete with kind of the elephants of the industry, the way that other tools have been able to compete.

Eric Dodds  19:41

Yeah, it’s really interesting. And I think your point about marketers and salespeople not being able to contribute back to the core product, I think, is a really defining characteristic there. Okay, one more question on open source. And I’m going to direct this both Ophir towards you and Kostas. Kostas, I think you have some strong opinions here about some patterns that we’ve seen, but when you have an open source tool that also commercializes, at scale, it can kind of become controversial. So the one or a couple examples that come to mind of late, you have Elasticsearch. And then MongoDB changed their licensing, both huge adoption among the products, but also experienced some turbulent times trying to navigate being a large commercial entity or connected to a large commercial entity, and also open source. So it makes total sense at the small end, like you said, especially in the startup world, but it’s not always an easy path to navigate when you’re at scale. So what say you, Ophir and Kostas on the challenges of being open source as a huge commercial enterprise?

Ophir Prusak  20:53

Yeah, that’s a great question. And coming more from just the startup world, I want to say of marketing and sales and product development in general, I personally do think having, and it depends I want to say to your point, the whole MongoDB and changing of different models, I do believe having what I want to call a “core product”, which is free and will always be free. But differentiating, I want to say on the functionality side, I mean, I think that’s totally legit. I mean, I think it makes sense. Will there be problems sometimes? Sure. But if the core product is always free, and you can do whatever you want with it, and then adding on top of it, and this is what I almost always see the case is things like Single Sign On, or granular access levels to who can do what, as well as scalability to some extent, that makes sense. Ultimately, the people who are working on the project, they do want to be able to make money. And I think it’s fine, as long as the core product is still open source. And I do feel it’s kind of the best of both worlds, nobody’s forcing you to use the kind of enterprise version, you could figure things out yourself tonight, if even seen some open source projects, which actually tried to replicate some of the stuff that kind of the enterprises are doing or kind of the enterprise version. So I still think it’s a win win, to have open source even for enterprises and for companies to ultimately take funding and try to contribute to the product.

Kostas Pardalis  22:24

Yeah, I agree. Eric it’s a bit of a complex situation, to be honest.

Eric Dodds  22:32

Yeah, of course. It says what I asked you and Ophir.

Kostas Pardalis  22:38

Yeah, I mean, it heavily depends on the product itself, and also on the monetization path the company wants to have, right, like, if you think for example, about both the cases that you mentioned Elastic and MongoDB. The problem these two companies have is that you or me, we will start like a company, and we use MongoDB as the back end of our system, right? And use the free one. That wasn’t like the problem that they had. The problem they had was that Amazon could come and be like, okay, now I’m giving Elasticsearch as a service, right. And that creates a conflict with the business model that Elastic has. So it’s a bit like, let’s say, a game in the battle between, like, the big companies and the bigger companies in a way, I don’t think that like any startup right now starting and open sourcing something, they are going to have to face that problem. On the other hand, we have cases of companies like Databricks, for example, or Confluent, that both of them have open source projects. And the core project is open source, they are offered as a service from big cloud providers, but at the same time, they also want to be successful, right? Databricks is super successful. Okay, they haven’t gone public yet, but they are on track to do that pretty soon. And one of the main rivals of companies like Snowflake, Confluent just IPO’d, so it depends. I mean, I don’t think at the end that these kinds of behaviors that the big cloud providers might have are going to hurt the companies that much. Maybe they would like to change their business model a little bit or their licenses, which it’s fair. I mean, it’s not going to hurt you that you’re going to use the product for your back end at the end.

Kostas Pardalis  24:31

So yeah, I think that things are a little bit better than we tend to think about that. And don’t forget that open source is literally like the core of the internet and all this. It’s that revolution, right, like Linux is open source. Like without that we wouldn’t have servers. Yeah, right. Yeah, nothing. I just remembered, okay, it’s a little bit irrelevant, but I find it funny. So, Linus Torvalds, okay, the guy who started Linux, right? He’s famous for being very aggressive, and almost, let’s say a little bit abusive towards developers. He’s very opinionated and very protective of his child. And I was reading lately that at some point, he decided to go and do therapy for that reason. And now he has changed his mind completely and tries to have more empathy. By the way, the guy’s the inventor of Linux, and the main maintainer of the kernel of Linux, and also of Git, right? I mean, we’re talking about a person who has contributed like, a lot using open source,

Eric Dodds  25:47

That is unbelievable.

Ophir Prusak  25:48

I want to add one other thing Kostas, you raise an excellent example of kind of different licensing that in a lot of different worlds, and even in the software world to some example, when I was working at different SaaS companies, we had a different pricing model for if you use a product in house, or you’re an agency, and you were basically reselling it in some way. The same thing is applicable for media rights, if you’re in the media world, if you’re selling a picture, it’s different pricing, where if you use it yourself, or you’re reselling it, so I think it’s totally fair to say, hey, just because it’s open source doesn’t mean you can do anything you want with it. And all companies are equal in that, in that sense. So I think it’s fair to say, if I’m creating technology and using it in house, then the rules apply to you, but if you want to actually resell it, then it’s different rules. And I think that’s totally legit.

Eric Dodds  26:38

I was gonna say Kostas and Ophir, that was a very, I think, thoughtful and balanced response to a complex question, maybe a little bit more so than the most impassioned commenters on Hacker News when some sort of open source, some sort of open source news, like Elastic or Mongo hits the press.

Eric Dodds  27:03

Okay. And one thing I did want to mention, there’s a really interesting site out there called OpenSource.builders, and you can go see alternatives to tons of different types of tools from CRMs, and email tools, analytics, and you name it, that came to mind Ophir, when you were talking about open source email marketing tools.

Eric Dodds  27:27

Let’s switch gears a little bit, Ophir, you recently, with the announcement about Chartio sunsetting, went on a search for data tooling to sort of rebuild your go-to stack and ultimately concluded that it’s kind of complex, and you ended up putting a website together that collected a lot of your findings. Can you tell us what were the requirements around your search? And then you have a very fresh set of eyes, looking at all sorts of components of the data stack from pipelining type solutions all the way through to BI solutions, and would just love to know, what did you learn in that process?

Ophir Prusak  28:06

Yeah, no, thanks a lot, Eric. So yeah, I’ve been a user of Chartio since literally they launched and have just been a huge fan of the product, not to mention seeing how it grew over the years. And I think for me, it really solved quite a few different parts of the puzzle that I needed. Its ability to pull data from different places, and blending the data or federated queries as it’s called. And being also very easily within kind of a nice GUI interface to be able to really kind of do the key part of ETL not through queries per se, but ultimately give you a really nice solution. And it’s funny, because I’m a SQL guy, and I love writing SQL, but what I found was actually having kind of a GUI front end to do the SQL manipulation, just giving me the ability to make modifications really quickly and easily. And even though Chartio is a closed tool, I still was able to do probably 80% of what I needed to do, like, as long as the data was in some SQL database. So for me, it was in terms of stuff I needed to do, first of all, to make sure the data is in the SQL database. Chartio does also pull from Google Analytics in terms of one of the very few kinds of non-SQL based as well as CSV files and Google Sheets. And so the only other part I really needed to take care of was to get data into a SQL database. We use HubSpot. So for us, we were using Segment–and I played around, there’s a lot of tools which do what I was using Segment for though just to pull the data into a single database–but really Chartio gave me the ability to do most of what I need to do. And I do want to bring up one other thing that I found is that within the realm of BI tools, and Chartio is definitely in that realm, I find there are two different types of problems that people are looking to solve. One is, I want to say the kind of the day-to-day, week-to-week reporting, and that is, I just need to see, okay, how many people have signed up to my service, kind of where are they in the pipeline? What’s happening with all the internal database I have, and it’s really kind of just to get an idea of what’s going on and still be able to slice and dice the data to some extent, I want to do in certain segments or certain timeframes, but it is about that kind of ongoing reporting. And then there’s the discovery/ad hoc. Well, I have a question that nobody’s asked before, or I’ve never asked analytical people before, and I want to answer that question. So Chartio is definitely in the first group where it’s great for reporting but I do agree, very often people have come to me and say, Well, I have a question and I’d kind of just do a one-off report for that simply to answer it, because it doesn’t give the analyst’s kind of truly an easy way to just go in and ask any question. So just a little more about the kind of tool and what I was looking for. I was definitely looking for a solution for more of that reporting side. And what I found when I started looking for other tools is, first of all, I didn’t find any other tool that was kind of a one to one solution that I could easily do. And I really probably would need to now put together a few different things, I did see some specific tools. I even talked to Holistics. I remember in one of your other episodes, which actually looked very close to what Chartio does. But I was also a little concerned to go down the well, this is another tool, which is not open source. So I was definitely looking more towards, can we solve this with just open source tools? For what it’s worth, I’m still looking for a perfect solution. And I haven’t decided yet. But definitely, to your point, it’s just really taken off the past couple years. I haven’t really looked at new tools. So recently, because I’ve been using Chartio, the whole explosion of like, reverse ETL tools. And just I want to say a lot of tools, which each do a really, really good job of solving one specific part of the problem. I mean, there’s like some tools that just pull data from whatever data source you want, and put it into a file. There are some tools which just take care of pulling the data from whatever sources you want, like a Fivetran or whatever it be and push it to whatever other solution you want. And then there are CDPs, like RudderStack, and reverse ETLs. You know, I was saying, I feel like there’s so many tools, which are really trying to solve one specific part of the problem, which is great when you have a slightly bigger team. But as a team of one for me specifically, it’s been a little more challenging, because I’ve realized I need now to okay, maybe I need now to actually look at a few different tools and see how we can put them all together. So I think we mentioned this also before the recording, it’s a little more challenging when you need to kind of do everything open source because it does require, it is a little more complex, and it does require a little more work to kind of get things up and running. So I would say you have a lot more choices than before and a lot more specialization in specific tools. But in some ways, we don’t have that kind of one thing that does everything solution that a lot of smaller companies need in order to get started. So that’s kind of where we are today.

Eric Dodds  33:17

No, that makes total sense. And that’s it is really interesting to think about the data space and Kostas, I would love your thoughts on this as well, as tools have progressed, it makes a lot of sense that there has been some specialization, right? Where data pipelines are really hard. Pulling data in is a non-trivial problem. And there’s always new sources and everything to maintain. And then doing analytics really well is also hard in its own right. And so it makes sense that there’s a specialization. But to your point, when you’re a really small data and analytics organization, having tools that can accomplish multiple things in one system is way more convenient, potentially, because you’re not dealing with multiple vendors, multiple processes, etc. But Kostas, what do you think about that? I mean, do you see any trends? Of course specialization with different pipelines, etc. But we’d love your thoughts on that as well.

Kostas Pardalis  34:17

Yeah, that’s a very good question. First of all, I have to say to Ophir that he made me really happy with what he said about Chartio. I’m very good friends with Dave, the CEO of the company, and I’m pretty sure that he’s going to be very happy to hear what you said about the product. He’s one of the most obsessed in a good way, product-driven person that I have met, and he put a lot of energy into building this product. And so it’s good to hear what his vision was, at least for the experience outside of the company. He managed to deliver it at the end, so that was great and I’m sure you will make Dave really happy if he listens to the episode.

Kostas Pardalis  35:01

Now, going back to your question, Eric, you know, they say about software, that a common pattern to build a new product and a new company is going to start with small enterprises or medium enterprises, iterate the product on them, and then go to enterprise, right? That’s like a very common pattern of innovating on an existing problem, creating something that it’s better as an experience, using the smaller companies as a visual, let’s say, to figure out what’s the right way of solving the problem today. And then at some point, go and sell to the enterprise and increase your margins and all that stuff. Now, that might be true for the SaaS space, right, where we are building a CRM or marketing platform. Now in the data space, I think that what is going to happen is the opposite. And there is a reason behind that. And the reason is that building technology around data is really, really hard. You have to scale from day one. And it’s very, very expensive. Going out there and building a new database system, for example, it’s crazy hard. And the big companies have both the scale and the money to fund this product. So my feeling is that in data, we are going to see the opposite, actually, we are going to see the products that are going to be built primarily for the enterprises, and then they are going to scale down in a way to the smaller companies. And I think we see that happening in a way, especially with companies like Databricks and Confluent. Right, they started first of all, with an on prem solution, the traditional like enterprise sales going there, these things that like a small company would never like pick up the phone and call them for a quote for the price, right. And then you see them going down market instead of going up market. And they open something like a product as a SaaS solution. And then you have tiers that are consumption based. And it’s very easy for someone even as a small company to go and afford to use the solution. There is one exception there. And this is Snowflake, which started with smaller companies and then started penetrating their larger enterprises. But I think that this is a kind of pattern that we are going to see a lot happening in the data space. At least that’s my opinion.

Ophir Prusak  37:44

Interesting thought Kostas. I actually want to say something, which is, I’ve seen something which I understand where you’re coming from, though, I want to say in terms of just product growth in general, what I’ve seen often the case where what’s the difference between products that really, at least in the SaaS world, what’s the difference between products that really, really kind of take on and go viral versus products that just never make it to that kind of super large adoption? Is it something which I can just go in, and within 30 seconds, start playing around with it, I don’t need to talk to anybody, especially if you’re talking to developers or analysts who don’t want to talk to a salesperson. I have to say, if I need to actually talk to somebody to even play around with a product, then I don’t think it’s going to kind of really gain huge adoption. And again, that’s at least in the SaaS world, for developers and for analysts. And what I’ve seen is if you start by creating a product, solving for enterprise, you’re not thinking about the self service model, first and foremost. And I’m seeing a lot of, at least when I’ve talked to a lot of companies, they actually do what you’re saying, Kostas, that they solve problems for these enterprises. But they don’t have an actual demo you can play around with like, Oh, I need to set it up for you. So I actually want to say the, at least from a product perspective, if you’re not able to accommodate the self-service model, I think you’re gonna have problems with growth, even if you are an enterprise product, because it’s the individual developer, the individual analysts, who wants to go in and play around with it and doesn’t want it or doesn’t have the resources to install it themselves, but wanted to play around with the cloud version. That ultimately is what causes a lot of products ultimately to go viral.

Kostas Pardalis  39:38

100%. Ophir, I’m 100% with you on that. And I think that that’s where open source is also extremely important. Like the companies that I mentioned, like Confluent and Databricks. They started first of all as an open source project. And because there are tools that are used by developers. Right, like Databricks is not something that I don’t know, like a marketeer or salesperson will take, although the output of the work done there might be used by them, developers are fine to try solutions that are not that easy to use, right? Like they can take it, set it up, play around, see how it works, make it work all the part of like the developer experience, which is different from a SaaS user. And yeah, I totally agree with you. This kind of experience is important, the difference is that it is a little bit different with data-related products, especially infrastructure products, because these are going to be used and maintained and viewed by developers, right by engineers. So the experience there is a little bit different. So that’s at least my experience so far. And I think that that’s another also added value of open source at the end on the side of like the business models and how you can use it to actually build a company and the products’ experience.

Eric Dodds  41:02

It’s interesting, there’s also another way that innovation at the enterprise impacts technology. And that is when a problem is solved in the enterprise. And then either the pattern for the solution or the actual solution itself, is published, oftentimes as open source. So if you think about it, I mean, Netflix is a classic example of this. Really interesting technologies have come out of Netflix, in ways that they’ve solved data infrastructure problems at scale, and they’ve open sourced some of those. And I think it’s also interesting to think about how per what you said, Kostas, like open source being a way that you drive adoption at the bottom of the market, which is also your point Ophir, that some of the patterns for that can actually emerge from the way that enterprises are solving data infrastructure problems. So what a fascinating ecosystem.

Kostas Pardalis  41:57

Yeah, and also Eric, adding to the open source, because you mentioned Netflix, for Netflix, open source is also to recruit the best possible talent. And that’s another value. Because if you think about it, Netflix is not a software company, right? It’s not their primary product, their primary product is like content, they are content creators, but they operate at such a huge scale where they need the best people out there to build and maintain their infrastructure. And open source gives you a path to go and get these people, which is another benefit of open source. I feel like I’m evangelizing open source a lot today.

Ophir Prusak  42:44

The Open Source show.

Eric Dodds  42:45

Yes, very eloquent evangelism. Well, we’re getting close to time here. There’s one more subject I wanted to cover. And Ophir, you have a lot of experience doing attribution in marketing. And attribution in marketing is a tricky thing, right? It’s basically trying to answer the question in any number of ways. I try something and I get these results. And then how can I tie the results back to this specific effort, right? A classic example is paid advertising, right? When I spend money on paid advertising, I want to see whatever it is, how many customers actually came from that. And attribution is a classic challenge when it comes to data for a number of reasons. paid advertising is just the tip of the iceberg there. But I would love to know, I’m thinking about, especially our listeners who are on the technical side, probably work with a marketing team, or have projects related to marketing, but maybe who just aren’t as familiar with attribution on a tactical day to day basis. Since you have played the role of marketing and marketing ops and data engineering, could you just give us a breakdown? Give us your basic definition of attribution? And then what are the data challenges related to different types of attribution?

Ophir Prusak  44:07

Sure, I’ll start with just a quote that I heard that I simply love about attribution. Attribution is simply a question of how wrong do you want to be? And the reason I say that, I actually learned about attribution the hard way, almost a decade ago, I want to say, when I was running a campaign, and there was at the time, this was like, I wasn’t doing multi-touch attribution, it was very straightforward, what Google Analytics was telling me, there was one campaign, we were spending a lot of money, and we were seeing activity, but we’re just not seeing conversions. And ultimately, we made the decision just to drop it. And two months later, our sales dropped drastically. And that was the only explanation. And looking back it was clear that it was simply the multi-touch attribution part of it. But I think attribution at the end of the day comes and there’s an analogy used very often for soccer players, when you make a goal, you can say it’s the person who technically hit the ball into the net, which made the goal. But if you look at who gets credit for making that goal, it’s not just the person who actually kicked it in. It’s the whole team and everything is coming up. But it’s really hard to say, well, what percentage of each one of those people ultimately played a role? Another way to look at it is that if I would have taken out one of those people from the series, or if you look at it all in the world of marketing, if somebody had, let’s say, five different touch points, if I were to take one of them out, what can I say about how much it would have impacted the ultimate attribution? How much would have impacted the actual revenue? So I think a few things, which are, I think, important to understand is, you’re never going to get 100% attribution first of all.

Ophir Prusak  45:54

In other words, you’re never going to know for sure exactly where somebody came from. Or even if you’re doing multi-touch attribution, you’re never going to know for sure, the impact of each touch point. Attribution is something which is really directional. If you think about things also, like marketing, in general, if you’re tracking people on your website, and a lot of people are going to have blockers for being able to track, I think attribution is good to understand not how many people are coming from exact numbers of this ad versus that ad. But maybe when I compare this channel to that channel, what do I see? When I compare first touch slash what do I see? So I think it’s really kind of a-directional, it’s a great way to do it. And the other thing I want to say, which is something that I’m seeing more and more people do recently, in terms of attribution, is something called incrementality, which is, you know, similar to just split testing, or A/B testing in general. But instead of having version A versus version B of a specific type of copy, you basically kind of have a control group, which doesn’t get the ad at all. And that way, you can kind of say, well, all things being equal, if I didn’t serve a specific ad up to a group of people, how much did it impact the percentage of those people, which ultimately made a purchase or made a conversion. So a few things, ultimately, attribution is hard. I will say there are definitely companies which you know, are doing a decent job. And if anything, it’s better than nothing, especially today, in the AI world and machine learning, there are a lot of companies which are able to put together the data using things like linear regression, and are able to give you more than just what a tool like Google Analytics is going to give you. And I would even say, if you’re spending a million dollars or more on paid advertising a year, you should definitely look into kind of a dedicated solution and not just to pay, don’t do it yourself with just SQL and try to figure this out or Google Analytics, you definitely want a dedicated attribution solution. If you’re doing less than a million dollars a year, I feel like you’re gonna get some benefit, but it’s just not going to be as much of an impact. And also there’s a big question of how many channels, what we found is once you go beyond just Facebook and Google, and you start doing things like maybe TV advertising, or OTT advertising, or you have a lot of coupon codes, or whatever, then using a third-party tool definitely helps.

Eric Dodds  48:29

Sure. Yeah, I think back on my background in marketing, and I don’t know if a lot of people explain it this way. But marketing and engineering have, you know, historically had somewhat of a tenuous relationship, in large part due to marketing’s demands. Ophir, you have the benefit of both being the one making the demands, and the one that needs to deliver them. So, the expectations are always clear, which is definitely not always the case, especially as companies scale. But I’m just thinking back on times when I’m running marketing, and I’m getting together with the head of engineering to talk about data. And attribution really in many ways was a large part I think of what I was trying to accomplish with just a lot of asks around data from the engineering team, because you’re trying to triangulate what’s going wrong. And because marketing is so dynamic, and you’re constantly trying new things, and you’re constantly running tests, and your requests are always changing, and your needs around data, on the sharp end of things are always changing, which is just interesting. I never really looked back at my interactions with engineering around marketing data through the lens of attribution, but I think that’s a huge driver.

Ophir Prusak  49:48

Yeah. One other thing I’ll add is, we say, What, as in like what happened? That’s a relatively straightforward thing to answer. I wouldn’t say it’s easy, but it’s pretty, you know, straightforward. Why it happened, that’s where the fun is at. And that’s really where it kind of gets a lot more complex. And you need to be also thinking not just about data, one of the biggest mistakes I’ve seen a lot of companies make is they’re looking just at the data, but not looking at the context of what’s happening. So you might be looking at, let’s say, people who are clicking on ads, and you might say, Okay, well, I see this ad versus that ad. And this ad is doing better than this ad. But what you might not realize is that one of those ads is from a display campaign and the other one is from a search campaign. So the whole context of where the person is in the user journey can be totally different. And that’s why I think data without the entire context of things like, what happened before, what happened after, and what segments these people are from, that’s where I find a lot of people also make mistakes based purely on data. And that’s why I think marketing and data together really is both science and data. It’s not just one or the other.

Eric Dodds  51:04

Yeah, absolutely. And I think, one, one thing that we’ve seen over the course of doing the show for, I guess, a year, wow, that’s amazing. I hadn’t thought about that. We’ve heard more and more really cool structures of teams where there’s a very tight relationship between engineering and marketing, or data engineering and marketing, where it’s very collaborative, because you see, so many times that problems arise when marketing gives a vague specification for something that they need. And the engineering team will deliver that to spec. And it lacks context, which to your point is so key in trying to understand why things are happening. And I think the more you can have a really robust collaboration, where both the context of marketing trying to drive a customer journey or explain things, and pushing that context to engineering. And also engineering gives marketing the context of here’s what’s going on under the hood, as far as the data, maybe there’s limitations or decisions that need to be made, really can create a powerful dynamic for figuring out what’s actually working and continuing to invest in those things for growth. Yeah, definitely.

Eric Dodds  52:18

Well, we are at the buzzer. This has been a great conversation. We got to hear a lot about open source and Kostas’s evangelism about open source, both from the startup and enterprise levels. And Ophir, we’ve learned a ton from you just about your unique perspective on data tooling, and especially in marketing, and thanks for the quick crash course on attribution. And that was really helpful for me, and I hope helpful for our listeners as well.

Ophir Prusak  52:44

My pleasure, my pleasure.

Eric Dodds  52:46

My big takeaway, and I’m still processing through this, but I think the conversation around open source spreading to different parts of the tech stack is just such an interesting conversation. And I think it, I think, Ophir’s observation around a starting open source really having heavy influence in developer tooling. And that being a huge wave of adoption, and then that spreading to data tools makes a ton of sense. And then I’m still ruminating on whether a sort of marketing or sales SaaS tool could make a run at it as an open source tool. And I’ll probably be thinking about that a lot over the next week.

Kostas Pardalis  53:33

Yeah, my main takeaways are that I might have to change career paths, Eric, and become an open source evangelist or something.

Eric Dodds  53:42

That really was … we had a little aside there and you gave us a very passionate speech on multiple levels of open source.

Kostas Pardalis  53:53

Yeah, yeah. It’s probably the effect of jet lag from what it seems. But outside of this, actually it was very, very interesting to have a conversation with someone who is not traditionally exposed to open source, talking about open source with the passion that Ophir had, and how important he thinks it is. Again, I know that he has an engineering background that this might affect it, but still, seeing like a marketeer talking about the importance of open source today in 2021. I think there are good signs that in a couple of years, we might see maybe a successful open source CRM, who knows?

Eric Dodds  54:38

Yeah, I agree. I think it was a really, really interesting perspective. Maybe we can get Marc Benioff on the show to give us his perspective on whether he thinks an open source company will disrupt Salesforce.

Kostas Pardalis  54:51

Am I going to be part of this episode? Maybe I’ll start preaching to him that he should open source Salesforce.

Eric Dodds  55:03

I think it’d be very receptive to that.

Kostas Pardalis  55:07

Absolutely, let’s do it.

Eric Dodds  55:10

Alrighty, well, Kostas and I are gonna go try to figure out how to get Marc Benioff on the show. And until next time, we will catch you later.

Eric Dodds  55:21

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds at Eric@datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.