This week on The Data Stack Show, Eric and Kostas talk with Ruben Ugarte, principal at Practico Analytics. As a consultant, Ruben has worked with a wide variety of companies who are emerging from the pandemic with wildly different data landscapes whether they’ve seen tremendous growth or have seen their business struggle. For both kinds of organizations, Ruben has helped them make sense of their data. Eric and Kostas also discuss Ruben’s new book, The Data Mirage.
Highlights from this week’s episode:
- Ruben’s background (2:36)
- Massive shifts in data caused by COVID (4:47)
- Big Tech is no longer untouchable (9:54)
- Accelerations in the BI space (15:17)
- A focus on people and on trust (23:43)
- Numbers are filtered by the biases of the people viewing them (28:46)
- AI trends and adoption (38:06)
- Using qualitative data for insights, particularly at early stages (40:56)
- Recommendations for taking stock of who is using the data and assessing what their skills are (50:06)
The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:06
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.
Eric Dodds 00:26
Welcome back to the show. Today we’re going to talk with Ruben Ugarte. And he is a data professional. He’s worked in the data space for many years and does all sorts of projects. One thing that I think is so interesting is that, along with sort of doing the technical side of things, and helping companies understand how to build their stack out, he also helps companies and data teams learn how to make decisions with data and sort of operationalize data across teams within an organization. So tons to talk about there. One thing that I think we should ask Ruben about that I’m really interested in is what he’s seen as a result of the pandemic. That’s not a subject we’ve covered extensively on the show, but Ruben’s dealt with some companies and industries that both were experiencing unbelievable growth and industries like travel and tourism that faced some really, really hard times due to the pandemic and there are massive data implications in both areas. So I’m really excited to chat with Ruben about that. How about you Kostas?
Kostas Pardalis 01:31
Yeah, that’s going to be super interesting I think. One of the things that are very unique with people like him is that as he’s consulting with different companies, he has a much broader, let’s say, view of what’s going on in the industry. So that’s something that I want to discuss with him, like what kind of patterns he sees out there in terms of other companies doing, what are the problems overall in adopting data related technologies? And possibly also what are the solutions based on his experiences. So I think this is going to be like a couple of different areas that my questions are going to be focused on.
Eric Dodds 02:07
Alright, well, let’s dive in and chat with Ruben.
Kostas Pardalis 02:09
Let’s do it.
Eric Dodds 02:12
Ruben, welcome to the show. We are so excited to chat with you about all sorts of different things data related, especially your new book, The Data Mirage. So thanks for taking the time to join us on the show.
Ruben Ugarte 02:24
It’s a pleasure to be here, Eric.
Eric Dodds 02:25
All right. Well, I really enjoyed getting to know you a bit just chatting before we hit record. But I’d love for you to just give a background to the audience on who you are and what you do.
Ruben Ugarte 02:36
Yeah, my background is, of course, in data and decision making in particular. And I work with companies of all sizes, from startups to public companies. And we’re typically trying to figure out how to use data to make better decisions at its core. And that may mean that we’ll need to select technology, we’ll need to implement it, we might need to design a strategy around how to use data. And of course, working with the data itself, to come up with insights and answers and next steps. And this is something I’ve been doing now for just a little over six years. And just see how there’s different use cases and things that come up as a company tries to be more data-driven.
Eric Dodds 03:20
Very cool. I love this topic. Because there are just a lot of … it’s really hard to be data-driven. That’s kind of like the idea of just getting if you think about customer 360 are sort of moving towards warehouse-based analytics, and some of these, some of these trends in digital transformation. Just doing those things is really hard. And then you oftentimes see companies will get to a point where it’s like, okay, we have all the data in the warehouse. And then you say, Well, now what do we, what do we do with it? You accomplish this really, really difficult task technologically, and then you realize, okay, well, the real work actually is now just beginning. So really excited to get your perspective on that. But one thing we were chatting about before we hit record was the regulations for COVID are lifting in various regions. And we saw just an unbelievable sort of change in the digital landscape in so many areas. But you, working with so many different companies, tell us what you are seeing on the ground across companies and across industries, that has resulted from COVID. And now that we’re coming out of COVID, what types of things are you seeing related to data and what companies are doing?
Ruben Ugarte 04:47
Yeah, so I think there is an even faster push to get data adopted, to make sure it works, especially for remote teams, that is, it’s accessible by anyone. wherever they are, it doesn’t require you to perhaps go to someone physical, like a data analyst. So lots of companies are trying to say, hey, how do we build dashboards, how do we build reports? How do we make sure data doesn’t have a lag, right? Then we can look at numbers somewhat real time, perhaps last 24 hours, last day, and not. We have to wait two weeks, so we can get the performance from the last month. So that’s one thing.
Ruben Ugarte 05:27
The second is, of course, this massive digital transformation of businesses. Some of them by force. Their industry might be hard hit, like tourism. And others, they realize that there’s this huge potential to go ecommerce, to do more digital products, to really take advantage of digital channels. And companies, I think, the best companies use the pandemic as a great opportunity to undertake this. And the third thing that I think will be, perhaps that should be seen how it plays out, is how just communication takes place in a more remote environment. This is now perhaps the big debate in a lot of companies, do you go 100%? remote? Do you do two days, three days? And then from there, whatever you decide, then it’s a matter of Okay, how do we make sure we share data and information with everyone to keep everyone in sync? And perhaps not overload people with meetings just because they are remote? So all these different things? I think there’s an undertone of data and how it might play out or not play out for companies as this next period of what seems to be really high consumer demand across probably every industry industry takes place.
Kostas Pardalis 06:50
So I have a question about that. You mentioned that it’s across like every industry, do you think that there are specific industries that are already adopting these new trends around data? And if yes, why do you think this is happening? Is it because like, COVID affected them differently? Or are there some other reasons?
Ruben Ugarte 07:13
That’s a good question. I mean, the ones that come to mind, of course,are the ones we might think about technology companies, ecommerce companies, for example, that weren’t really affected by COVID. They just sort of ran through it. I think even where low tech industries are starting to change. We look at car dealerships for now, right, and there’s no supply of cars in countries like the US and Canada. So how do they adjust to that approach? Do they go to a more digital way of selling cars? Restaurants, for example, I think are really interesting, right? They all had to implement booking systems for restaurants. Here in Vancouver, there were a lot of restaurants that didn’t have reservations, right? You showed up, and if there was a line, you waited. That’s what you did. But a lot of businesses had to implement some kind of reservation system to make the social distance regulations work. And those things will probably stay. And then once they have that, I think they can lay over things like takeout and other digital ways of interacting with the restaurant without having to be physically present. So those are the ones that come to mind. I would say those industries that were either really hard hit by regulations, or they were hard hit by success, there was just so much demand for their products as people were at home and they were shopping online, that they had to completely change. And the change that we’re seeing today is not a … this happened last month … is, it’s been going on for a year, you know, a year and a half. So there’s no going back for larger businesses. And I’m sure you might be seeing similar things where you’re coming from.
Kostas Pardalis 08:51
Yeah, absolutely. And I can’t stop thinking about some industries that traditionally were lagging behind in terms of adopting technology. Two of them that come to my mind are shipping and the other one is anything that has to do with supply chains. And I think that they really had during this crisis with COVID to catch up and do it really fast. And we are talking about some very critical industries, right? I mean, I think everyone hears about all the issues that we have with supply chains right now. And of course, shipping is part of that. So it’s very interesting to see how the next couple of years will be more or less in a position to evaluate the impact that like COVID had. And I think in most cases, it’s going to be positive, like it’s going to be an accelerator actually. Do you think that there is also a negative impact in some cases when it comes to the adoption of technology and anything that has to do with data in particular?
Ruben Ugarte 09:54
Well, we were coming off 2020 right, where data was playing this big role and I think that the social dilemma came out on Netflix that talks about the usage of data for marketing, especially around the US presidential campaign. And now we’re seeing this backlash, I guess, for, like big tech, right? Big tech used to be untouchable. And now they’re almost on the other side, where everyone wants to regulate them, no matter what. So I think individuals in particular, became more aware of the data that’s going around them, and especially more of the sensitive data, such as health data, right? We now have debates around vaccine passports. Do you do them, do you not, it’s about a sort of break in privacy. A tricky thing for businesses is that every business knows they need to track data, they need to store it, they need to use it. But they’re not at the sophistication level needed to protect that data correctly from any threat. It’s really hard. It’s not really something that I think businesses are completely familiar with yet. I think I was reading the Wall Street Journal today or yesterday, right about the major hack that’s going on with hundreds of companies. It’s just really hard to protect data. So businesses of all kinds have been put into that position. It’s no longer just the government or Fortune 500 companies. And it’s gonna take some time, for companies, I think, to really get a good grasp as to what that looks like, and how they’re protected properly among their employees, which then naturally transitions to consumers. And do you trust this company that you’re giving your data to, whether it’s just an e-commerce store, right, that you’re providing your credit card, and your billing address, and your your name, or more involved companies, perhaps the health companies that you’re providing blood tests, or health markers for personalized diets? So it’s this entire world where we know now what our data is worth, or what it could look like in the wrong hands. And it’s not quite clear how ready we all are to protect that data.
Kostas Pardalis 12:06
Yeah, absolutely. I think that something 2021 and 2020 too taught me is that actually data technology and infosec technology, they’re probably going hand in hand. And if we see progress in one, we will definitely also see progress and change happening to the other. And I keep saying usually that like this, like the decade of the 20s is going to be all around data. But actually, I think I should change that a little bit and be more like it’s going to be about data and it’s also going to be about security. I think these two are going to be both super important. And there are going to be also, I think, very interesting, like, ethical conversations and conversations on a collective level that we are going to do; anyway, it’s going to be very, very interesting. But let’s try not to make the whole conversation too philosophical. Cool. Let’s go back to technology, Ruben. And let’s talk a little bit more about parts of the data stack, or like the technology, you think that got boosted by this whole adoption of data and digitization that COVID brought. Which technologies do you think benefited out of this?
Ruben Ugarte 13:19
Customer systems, so anything like a CRM, of course, email marketing tools, I think digital advertising got a boost out of this, and Facebook and Google, of course, take the big chunk, but any kind of digital advertising, and BI tools in general. I was listening to a podcast the other day about data box as a business intelligence tool and how well they were doing and how well they were growing, and I think it’s just a reflection of companies who are looking for easier ways to visualize their ever growing volume of data that doesn’t stop it just keeps growing and going, in some cases exponentially. And trying to visualize that into some kind of usable format is a really big challenge. So I think all those things got a boost from the pandemic as they couldn’t reach customers in person, and you had to reach them where they were, which were their phones, their emails, their social media, and how companies can get their message to the customers in those channels.
Kostas Pardalis 14:28
Make sense. In terms of BI, because you mentioned BI tools, my feeling with BI is that actually before the pandemic or like the beginning of the pandemic, we were pretty much like at the close of the end of like an innovation shard in the space. We saw the acquisition of Tableau from Salesforce. We have the merger of Sisense with Periscope Data. Do you see these markets as like a space that has space for more innovation? Do you feel like there’s new stuff that we need there? And, of course, do you think that this whole situation with COVID and all this obsession around data is going to accelerate innovation, particularly in the BI space?
Ruben Ugarte 15:16
I think that’s a good question. And you’re right, that a lot of major players were acquired. There’s always space for smaller companies. And the SME market that they might be one, switching quickly between tools, but also coming across new tools. To me, I think that the most interesting element of BI isn’t so much the building of dashboards and reports, I think that is a problem that has been effectively solved, right? There’s great ways of building charts, there’s great ways of doing it, that it’s user friendly, it doesn’t require SQL. And if you do know SQL, or something more advanced, you can use that. So that problem seems to be solved by pretty much every player. And I think the remaining problem is still, how easily can you integrate data sources into your BI tool, right? You might have five or 10 advertising channels, your CRM, your email provider, maybe some custom sources? And how many clicks does it take to sort of bring all that either into a central data warehouse, or to just bring it into like a virtual space, and then visualize it? And that seems to be the trend for if you look at Domo, right? A lot of integrations. And that could probably be the future of the BI world, just more and more integrations. So it’s point and click, and then anyone can just sort of plug in their data sources, and sort of get up and running with reports and dashboards. And in some cases, maybe even just templates, right? Because if you know what the data schema looks like, when you bring data from Shopify, and you bring data from Facebook, it’s easy to create a sort of pre-built templates, that you can just create in a few clicks. That to me is the future of the BI world, whether that’s new players, or some of the older players take this on. That will be interesting. I’m not sure about that.
Kostas Pardalis 17:18
Yeah. That’s an interesting point. Do you think all these data accessibility problems, and by creating these connectors and getting access to all the different sources the companies have, do you think this is like a BI problem? Or do you see a different category that’s going to exist out there, that’s going to focus mainly on that? I’m a little biased because I started the company around this, Blendo, but I see that like this particular category, right now, we see more and more companies appearing. We have FiveTran, which has become like a pretty big company. And we have many open source solutions that are appearing. And we keep seeing more and more companies who are trying to solve the connectivity, the data connectivity and accessibility problem. So what do you think about this? Do you see a consolidation there? Do you think this category at the end is going to merge with a BI category? Or do you see two completely different categories to keep growing?
Ruben Ugarte 18:21
I think from a technology perspective, I could see the separate categories going … we might be talking about different buyers here, right. In a company, the more data engineers side, or just an engineering department in general, right, when it comes to moving data from point A to point B. And I work with clients where I’m usually brought in by the marketing team or a sales team, and they just tell the engineering team, this is the data we want, and this is where we want it, just move it. Don’t worry, don’t worry about the data schema, don’t worry about what we’re gonna do with it, just move it from place A to point B. And that’s the data engineer and moving of data, right, the FiveTran world. In the BI tool, or typically I find on the non-technical side, we want to build dashboards, we want an executive dashboard to summarize our KPIs. So I see those worlds being somewhat separate. There may be some overlap, right? You know, more companies take that sort of Domo approach where they made the connectors into one tool. But my second point in general was, I think a lot of these issues are becoming less technical as time goes on. And more people related. That is, when I look at technology, just even the past five years and the data technology, it is significantly much easier today to take data from common data sources: the Salesforce, the Marketos, the Facebook ads, Google ads, like if you used a common data stack, it can be really quite straightforward to plug things in and get data into a warehouse or BI tools on. And I think that trend will continue. More vendors, more SaaS companies are making data visibility really easy.
Ruben Ugarte 20:05
But what’s not getting easier is what companies do with it. So they collect it all. What do you do with it? Right? How do you analyze it? How do you turn into insights? How do you deal with political issues? I worked with a few finance companies and crypto companies. And I was working with a client once where we had this great plan, we were going to have a CDP and this entire data stack, very modern, very advanced. And the entire plan was vetoed by the legal department. And it just said, you cannot have data in a cloud environment that we don’t control. And the whole plan just fell apart. So these are political issues that can affect how data flows from one place to the other. And those I think, will continue to be trickier, especially as companies, and legal departments, or compliance departments realize that if some of this stuff leaks out, it’s gonna be a big issue, like, we’re gonna have fines, reputation damage. So the safest thing possible is to not let the data flow freely, really have tight restrictions, and then that limits how companies use it. And that’s a tricky problem. I think that’s not something you can solve with technology as easily. But nonetheless, companies will have to find a way to sort of get their head around it.
Eric Dodds 21:24
Yeah, yeah, I think that’s super interesting. And jumping back, just a couple of points. It’s interesting, you see players in the space, sort of approaching the same outcome from different angles, right. So you have, like you said, Domo, which is sort of building the connectors in Metabase, recently spun out of Git Lab. And they are building some, you know, sort of connectors for the ingestion piece, so that they’re starting to dabble in sort of adding pipelines, and not just sort of being a BI tool that sits on top of the warehouse. And then you have companies from the other side approaching it, right. So Kostas mentioned FiveTran. And of course, you know, obviously RudderStack sends data. And from that regard, you sort of have a lot of times just the raw pipeline piece, but then close partnerships with companies like DBT, where you’re sort of crossing the bridge between just being a delivery pipeline, and sort of enabling analytics, but not delivering the last mile of actual dashboards. So it’ll be interesting to see how the dynamics within an organization change depending on what tool they’re using and where it comes from. Because if you start from the BI and then sort of move towards the pipeline, as opposed to starting with the pipeline, and moving towards BI, there’s very different team dynamics involved in that, any sort of the marketing or analytics org, the key leader of the project versus maybe someone more on the engineering side. So yeah, it’ll be really interesting. And I think the owner of the project internally has a really big impact on what the sort of the political implications are of what the final outcome is within the organization.
Ruben Ugarte 23:12
You mentioned something very interesting, I guess, from your perspective, what do you think are some of the toughest challenges around data pipelines in the future? Based on the context that we’re discussing? What are some of those things that are just going to get harder and harder, despite some of those improvements in technology that we’re debating here?
Eric Dodds 23:34
Kostas, I’m gonna let you handle that, because you work on pipeline products every day, from a product perspective.
Kostas Pardalis 23:43
Yeah, that’s an excellent question. I think that as we solve, let’s say, the problem of accessibility, when it comes to data, I think the next big question around data is going to be can we trust the data? And how we answer this question. And that doesn’t have a simple answer. Actually, there are like answers actually on a very different level in my opinion. So I see that what is going to be like a great effort from now on, is how we can separate the noise from the signal in all these huge amounts of data that we can collect today and very cheaply put them in a data warehouse and ask questions. So I see a lot of space for innovation when it comes to anything that has to do with data quality, data exploration, but not in terms of like just doing the BI reports that we typically have seen, that are reports for business decisions, and alerts around data governance, who accesses the data, why they access the data, what was the lifecycle of the data, how they move from one place to the other and how they have been transformed. And what’s the lenience of the data?
Kostas Pardalis 25:01
I think that we are going to see … by the way, these are not like new problems, right. So problems that large enterprises have been dealing with for quite a while. But I think that the problems that a lot of enterprises are dealing with, especially from heavily regulated spaces, like banking, we are going to see it happening to pretty much every company and everyone who wants to do business out there and wants to be data driven, as we said at the beginning. So that’s, that’s how I see it, at least the next two or three years, what I expect to see happen out there. What do you think, Ruben?
Ruben Ugarte 25:35
Yeah, I mean, I agree with one of the very first points you made about trust. I remember the first time I came across a trust issue. And I was working with a team. And we went through, we checked the beta, we made some fixes to it. And then we presented the numbers. And the executive team was like, those numbers make no sense this data is completely incorrect. And we had triple checked the numbers, we knew it made sense. We had gone sort of column by column and made sure the things were adding up. And I realized very soon that they were not having a technical issue, they were having a trust issue. The data had been incorrect for so long, in this case, a year plus, that they had very little trust in anything that came from it. And we had to build the trust back up. And I learned that you sort of lose trust one report at a time. And you have to build trust one report at a time again. And trust, to me is a fascinating problem that I talked about in the book, because it’s to me fundamentally psychological, it’s something that you have to work with people, understand where they’re coming from, how much data expertise they have versus they don’t have, are the expectations correct with what this number is supposed to be? One of the most common questions I get is when companies are looking at comparing their paid spending, a conversion, that’s something like Facebook attribution, versus some external provider, like a mobile attribution provider, or even just a web attribution provider. And they’re not, they’re not the same, right? They almost never match. And that can cause a lot of stress and which numbers are correct? Which numbers should we trust? And it’s a matter of training, re-shifting expectations, and getting people comfortable with the data they have, and making sure they’re using it in the right context.
Kostas Pardalis 27:21
Yeah, 100%, I agree with you. And I think it’s something that you also mentioned a little bit earlier. And I think it’s a very good opportunity to get started talking a little bit more about your book. Because I know that, like, one of the things that you deal with a lot in your book is about people, and how important dimension it is when it comes to data and how we use data. And of course, that’s where trust comes into play. And I think that’s something that we forget, especially like people that are coming from more of an engineering background. But I think that this is also kind of a perception of like, let’s say the whole humanities building out there. Numbers are something objective, right? You come up with the numbers and that’s it. Your work ends there. I mean, everything has to be told through the numbers. But actually, I don’t think this is true. Because you have the numbers, you might have your visualizations, you might have, like, built whatever you want to build, but at the end, you need the people there to tell the story of the data, right? And this story is super important. And it’s also what is going to build or rebuild the trust. And that’s my take as a person who worked in this space for like, a bit more than 10 years now. What do you think about this? And can you tell us a little bit more about the importance of people?
Ruben Ugarte 28:46
Yeah, we, of course, started talking about the pandemic. And I think it’s a fantastic case study for how numbers get interpreted or misinterpreted. Pretty much all over the world we were all seeing COVID case numbers and things like that. But it became very clear that not everyone was interpreting the numbers in the same way. Here in Vancouver, we had protests, anti-lockdown protests, as many countries did. And it was a clear distinction between people who you know, would see the daily or weekly cover numbers. And they thought, Okay, this is what we should do. And an entirely different group of people have the same numbers and took a different decision. And that’s the same thing that happens in companies, right, any number, any report gets interpreted by the biases and preferences of the people who are running them. This element of people then becomes the most unknown, perhaps the most volatile variable in data. We can get the right technology. We can build the right pipelines, we can get the sort of the best ways to build dashboards and things like that. But then how do those numbers get interpreted? That’s the thing where the people element comes from. And when I wrote the book, I realized that most of the books on the market on data, which there’s not that many, it’s maybe five or 10. But most of them were really focused on the technical side of things, how you build reports, how you run queries, how you analyze numbers and statistical models for analyzing numbers. And I thought, you know what, books like those are useful, but I think they’re missing the huge element that if you teach someone basic probabilities and statistics, but then they have a bias in some way or another, the results they’ll get are completely different from what you might expect. And because, you know, we’re talking about this before we started recording here. You’re an engineer, Kostas, and I have a slight engineering background as a front engineer. I work with engineers, and I realize engineers can sometimes see the world as very mathematical. It’s like step one, step two, step three, and you take the numbers through a clear logical calculation. And there’s only one answer, right? This is like math, grade three, there’s only one answer, you can get to only one way to get to it. That’s not really the case, for a lot of data, especially the toughest decisions around strategy, and what a company should do, what products they should develop, what markets they should go into, how to build futures into a product. And that needs to be recognized, and they deal with it, right? It’s not something that can be a complete disrupter to a company’s approach to data, you just have to understand that and deal with it. So in the book, I talk about trusts, I talk about expectations, I talk about training, and how to make sure people have the basic skills needed to work with numbers to understand them. And that has to provide a really good foundation for all the other technology stuff that companies are going to do really well.
Eric Dodds 31:54
Ruben, question for you. We talked about patterns, the idea that the sort of very common data integration problems are going to be solved in sort of an elegant way. And be very accessible, I think is accurate, right? We sort of see commoditization there. So if you sort of remove that element, have you seen similar patterns on the people side as it relates to data, almost like, if you think about architectures, from a data stack standpoint, you have a constellation of people in the company working with data? Are you seeing patterns that are proving to be really successful, sort of across teams, between engineers and those sort of consuming the data? I’d love to know what you’re seeing there.
Ruben Ugarte 32:44
Yeah, one of the most interesting patterns, or perhaps trends, actually relates to data, and specifically to machine learning and AI, but not in the way that perhaps companies think about it, where you’re building it out yourself. I think we’re starting to see that AI and machine learning is being built into specific SaaS tools, right, so you have an email marketing tool, like Salesforce, or Pardot, specifically, marketing, cloud, any of those. And it has a built in way of running A/B tests, right. So you can take two subject lines, you test that, and I’ll tell you, which was the best one. But I think what’s interesting to see now is a lot of that has been taken to the next level. And all this machine learning is doing all this analysis sort of behind the scenes, and then giving some kind of insight to the user. So instead of asking them, look through the past 100 emails, and then see what kind of patterns exist among those 100 emails that you can see subject lines and content and open rate and so on, the tools just do that automatically. And then just spit out some kind of insight, right, say, you know, what, typically, when you send an email around 8AM, and you include this in the subject line, and you have two images, those tend to do better than your other ones. We see it in Google Analytics, right? Little surface insights. And not all the times are useful, sometimes it’s just really random. But I think over the interesting pattern for the people component, because instead of expecting them to be able to run very, you know, sophisticated pattern analysis and take it into Excel, the software will do it for them. And all they have to do is just try a bunch of stuff, right? Just try a bunch of subject lines, try a bunch of types of emails, maybe they have to do some kind of set up to get the AB test going or make the test work properly. But there’s gonna be a lot of heavy lifting done for it. And I think the same things apply even when looking at a field like product analytics, right? So like the world of Mixpanel and Amplitude and Snowplow and all that. And you see a lot of companies are investing really heavily in their machine learning and reports. So instead of saying product companies For example, I really want to know what futures tend to correlate with conversions like signups or people becoming paid subscribers. And that’s an analysis, you can run and you can sort of run in different ways to get the entire picture. Or the software vendor can just build it in or build the algorithm and you feed the data, and it does it for you. For the most part. It’s not perfect yet. But those are also things that I think we’ll continue to see going forward. And if we go back to the BI tools, I think, perhaps they’ll be an element of BI tools, where it’s not just about displaying that data. It’s about doing something with it and trying to highlight insights around segments or specific attributes or something that you miss. But the software is able to surface automatically.
Eric Dodds 35:47
Yeah, super interesting. It’s almost like if you think about, okay, we have all this data across the company, then we want to do AI, right? That’s a very sort of ambiguous like challenging, what are the inputs? What are you defining all that sort of stuff. But if you think about AI, almost as a localized service, within a particular tool that a specific team is using to sort of accomplish a specific or drive or understand a specific part of the customer journey? Makes total sense. And I agree, it’s definitely getting better. It’s not perfect, but it’s definitely getting better. Kostas, I’d love to know what you think about this.
Kostas Pardalis 36:21
Yeah, my approach with that stuff is a little bit more influenced by engineering, in general, to be honest, in the sense that you, we should always start from the problem, try to solve the problem and find the right tools. And AI might be or machine learning might be the right tool, or might not be the right tool, right. This is something that I think it’s a journey that as engineers, we always have to take when we try to solve a new problem. And I think this is like the approach that we should approach like everything when we’re building something like a company, for example, trying to build a product, I understand that as humans, we always want to not miss an opportunity, right? Or we don’t want to, or we want to work with the latest shiny toy out there. But in the end, we might just not need it or it might not be suitable, right, and it goes back again, back to trust, we tend to trust new things a little bit easier, at the end, compared to how we should. And we have much more elementary problems that we have to solve when it comes to data and the culture around using the data. I think that Ruben has said that in a very good way by talking about people, like when we are talking about how we can educate all the stakeholders inside the company to become more data literate with stuff like elementary statistics or understand what bias means and all that stuff. I think there’s a huge gap there between doing this, which is a necessity, and actually putting an AI black box there that magically is going to solve everything. So yeah, that’s how I see things. I don’t know, what do you think, Ruben?
Ruben Ugarte 38:06
It’s funny, you mentioned that, I think we had a period, which perhaps has ended now, but maybe from 2010 to 2020, where it seemed like any company just added AI onto their name with a product description. And that all of a sudden made it really interesting. But it turns out that one, it wasn’t really AI, it was perhaps at most machine learning. And most of the sort of algorithms were just being reused, right, there were things that were built for other purposes, and they would just find a different use case for it. So I think it became a bit of a crutch for a lot of companies. And I saw all kinds of companies, companies that were going to write copy for you, or were going to write like Facebook ads for you and all this kind of stuff using AI. And I’m not sure how many of those are actually useful. So we’ll see, we’ll see what happens. I think SaaS companies will continue to add this machine learning. AI, and they might call it AI, maybe it’s not truly AI, as easy ways to make the product more useful. But as you mentioned, when we look at the best companies out there, when it comes to being data driven, you likely have examples. For me, I think about companies like Spotify, of course, Amazon. I think a lot of the companies have done really well, which is hard, is they have done really consistent training or education or coaching at a company level at a cultural level. And people in general have a high level of comfort with data and whatever that looks like. It may even be fancy, it may just be very simple excels and things like that. But they have the ability to work with it and get insights and then try things and then innovate on it. And that’s hard, right? The AI might help a little bit with the insights, maybe with experimentation and making sure you can experiment faster, but there’s still things that you can’t really get around, you have to really train the people or hire the right people and build the right company culture.
Eric Dodds 40:06
Company size, I think, is a really interesting component of this conversation. Because speaking in general terms, you really exclude large swathes of the market, depending on company size, right? So for example, a really early stage startup company may not even have enough data for AI or ML to be applicable, right, there’s just not enough there. They’re too early, the products changing really quickly. So I think that’s a really interesting insight around the needs of the particular and really is what you said, Kostas right, like, what’s the problem? And what are you trying to solve? And that can vary significantly, depending on the stage of the company, the complexity of the stack, the size of the data set? There’s just so many variables in there.
Ruben Ugarte 40:56
Yeah, you know, you mentioned a good point, I have a lot of startups who reach out to me very early talking about pre-product market fit. Three people, pre-beta even, might even be no product out there. And they reach out because they want to set up this entire data stack, right and say, Hey, we know we want a CDP and we want product analytics and we want a BI tool and we want something for surveys and five or six tools. And they have almost no users. And it’s just too early. Even if they were to have those things, it wouldn’t be enough data to make it useful. Right, they might be looking at, I mean, literally 100 users trying to understand how 100 users are using the product. But instead of trying to look at those users through charts, funnels, and line charts and things like that, it’s probably more valuable to just talk to them, just have a phone call, do an interview. And this is where I think companies, especially executives, need to understand the context of data. If you’re in that stage, it’s not really a waste of time, but it’s not a very good use of your time to try and set up this very advanced ways of visualizing data, when you don’t have any, instead of just talking to people and going low tech, low data, right? More qualitative than quantitative. Now, if you have an extreme, you’re a public company, you have millions of users, then there might still be a role for interviews and qualitative data. But you want to make sure you have the amount and the quantitative component as well. So it depends on the situation and where you’re coming from but trying to find the right, or the best use of your data for your purpose. I think what Kostas was saying, what’s the real problem here? And what’s perhaps the most effective way of getting there, not the most sophisticated or the most exciting.
Eric Dodds 42:47
Yeah, we I was talking with someone who is a data engineer at some really incredible companies like Heroku, and some other companies, from very large companies to startups. And he was talking about the stages you go through in terms of data engineering, and he talked about your three people in a garage startup. And he said, you want to go out and buy fancy analytics tools. He’s like, just query your production Postgres database. But he’s like, you just don’t have enough data for it to be meaningful. And then it reminds me actually, Kostas, do you remember, we were talking with Alex, from the Pool app. He’s a founder who went through YC. And we asked him about analytics at a very early stage, because he had, you know, 100 users, I don’t know how many, but in the hundreds of just very, very early stage. And we asked him how he was leveraging analytics. And his response was so great. He said, I used my product analytics tool, I don’t remember what one he was using, but he said, I use it literally just to figure out who I should talk to. So what are anomalies, or people who adopt really quickly, I thought that was really, really interesting and aligns exactly with what you said.
Kostas Pardalis 44:00
Yeah, 100%, like, at an early stage, where people, I mean, sometimes like the number, it’s like, it’s also you are at the stage where also as a founder, you educate yourself, you need to build your intuition, right? And you are not going to build this intuition fast enough. If you just look on the screen and try to figure out what’s going on with the numbers. Probably you are going to just waste your time to be honest, because there’s too much noise in this data instead of signal. It’s much, much better to go out there and pick up your phone and talk with someone. And I think this is something that it’s relevant for a much longer time when it comes to B2B companies, because numbers grow much slower. But of course, like with B2C, you reach a point where doing analytics and trying to aggregate the data in an automatic way is necessary because you just have too much data, right? Imagine a company like DoorDash, yeah, like, of course you need analytics there, of course, you need advanced analytics, because otherwise you have so much data that a human being cannot interpret them. Right. So yeah, I think that anyone who has started a company, and they went through that, they have come to this realization that I can build whatever model I want with the early stage data that I have, but 90% of the time it just fails you.
Ruben Ugarte 45:27
Yeah, I’ll give you an example of where this plays out. Last year I worked with tourism agencies, here in British Columbia, in Canada. And, of course, hardest hit industry by the pandemic, unprecedented drops in volume, and visitors to the country and so on. I worked with one in particular that was really quite fraught by all this. It was a tough situation to be in. And they really wanted to look towards the numbers to kind of figure out what to do next. And at some point, I remember telling them, I mean, like, I mean, the numbers are not gonna change, like, we know, they’re not going to change, like, no, regulations are not going to change that quickly. We know you’re sort of 90% down from regular averages, historically, we know this. So the question is just what are we gonna do about it? Right, is there local travel we can encourage, is there a financial decision to have to be taken at a company level, and these were hard decisions to be taken, but I noticed that they felt quite stuck and paralyzed, and they wanted numbers and reports and dashboards, specifically quantitative numbers, to tell them where to go and where to move. And I think that was one of the weaknesses they had to deal with. It was hard to overcome. And the same thing could happen to early stage companies, one in numbers, one in machine learning to tell you something about those 100 users, instead of just talking to them. And in tough decisions, that can be a bit of a crux. And I think that that’s one of the challenges I was seeing with companies that really want to be data driven, or even just individuals, that want to be data driven, that at some point, whether in the crisis or not, they’ll run into situations where they don’t have enough data. It’s just not enough data to be any kind of sort of statistical validity. But you still have to make decisions, nonetheless. You can’t just wait until all the numbers are in.
Ruben Ugarte 47:17
And I think having that comfort and being able to go both ways, effectively, right, being able to use data to analyze patterns and make decisions and being able to make decisions, despite a lack of data. I think that’s what kind of builds resilience among companies and individuals.
Eric Dodds 47:36
Yeah, I agree. One thing that I was talking to someone about recently, is that the one thing that I’ve noticed, I think, as I’ve had the opportunity to work at companies at various stages of maturity, but really, with teams that have sort of deep experiences, entrepreneurs, and sort of taking companies to market and of course, leveraging data to do that is, when it comes to decision making, I find that people who are really good at it, they’ve built an incredible amount of muscle, at breaking down numbers into sort of the simplest form, and sort of only looking at the necessary components, right, as opposed to saying, let’s try and build some very complex model, which is necessary sometimes, right. But in many cases, you’re trying to make a high level sort of almost directional decision. And it’s really interesting to see, really, really smart people actually break numeric things down into pretty simple stuff that makes decisions a lot more clear, right? Because, you know, I think clarity is a huge deal when it comes to making decisions based on numbers. I cannot believe that we have run through the entire time, we did not get to the third topic we wanted to discuss which is CDP. So Ruben, we’ll have to do that on another show, because, boy, that’s a loaded term, and you see so much of it. We’re gonna have to end the show soon, but before we go, I’m thinking about our listeners who are considering the people side of the equation, which you mentioned. And as I’ve reflected on the conversation throughout the recording, you know, I really think that there’s this element of building technical discipline around getting the data correct in the systems. And if I could describe it this way, we need to put a much bigger emphasis on building muscle around the people side of interacting with and actually using data and making decisions around data in the organization. So for our listeners who are actively working on the technical piece that maybe want to get started with just some really practical things, maybe they could start working on this week to build muscle on the people side, what are the top two or three things you would recommend they do in terms of getting started?
Ruben Ugarte 50:06
Yeah, first, I think they need to take a stock of who is going to be using the data? And what their skills are, that is, do they know SQL? Are they comfortable with technical topics? Or they’re gonna have lots of questions around what may seem like basic things, how does the number work, how’s the data collected? What’s the formula here? Things like that. Based on that, then you have sort of different paths, right? If you have a highly technical team, then your rollout, how you get this in the hands of people would be slightly different, you’re likely gonna want to make sure that you allow people to run their own SQL, that you have a lot of diagnostic information so they can explore the data on their own, and that everyone just has the right permissions and access for it, highly technical teams. If you have a low technical team, then you need to make sure that there are nice interfaces for interacting with the data that doesn’t require a knowledge of SQL or similar things. And you want to kind of get a gauge as to what skills might be best suited for kind of training, whether it’s, again, basic statistics, basic probabilities, how to make decisions, how to read charts, what sort of the difference between the line chart, bar chart, how the chart design might completely change the meaning of a KPI or a number, things like that. And then the third step, kind of figure out the ways to close those gaps. And the topics we’re talking about here, they’re not University PhD level topics. So they can be covered in you know, and formal workshops, one-on-ones and things like that. But you have to know what you want to teach. And if you’re dealing with some of those skills, there may be some discovery of research that’s needed. And I mean that as I’m talking to people, because if you ask someone, are you comfortable with numbers, they might just say yes, but if you dig a little deeper, you might find out that you know what, like, probabilities, it’s not something that comes natural to you. Statistics was not your favorite class in college. So you start to kind of figure out what are some of those skills that you start to work on as a team or as a company?
Eric Dodds 52:23
I think that’s incredible advice. And I’ll actually say, I, I actually, I really did enjoy statistics in college. Math isn’t my strongest subject. But it’s been really helpful. And Kostas I will say some nice things about you, because I’ve had the opportunity to actually work with you on some internal reporting. And with your engineering background, thinking about things like cohort analysis and some other components where you have the ability to explain some of those concepts to me in a very practical way as we’re looking at a data set together has been really helpful, especially for me, not necessarily having a non-technical background. So I can say that if you’re listening to the show, and there’s someone who’s non-technical but who interacts with the data you’re producing, please take the time to help them because it’s been hugely helpful for me.
Kostas Pardalis 53:15
Thank you so much, Eric. I really enjoyed doing it. So I’m happy to do it anytime.
Eric Dodds 53:21
Great. Well, Ruben. Thank you so much for joining us. And if people are interested in your book, where should they go to get more information on it?
Ruben Ugarte 53:28
Yeah, The Data Mirage is available everywhere you can buy books, so Amazon, Barnes & Noble, Chapters, if you’re in Canada, Google Play. And they can also go to my website at RubenUgarte.com and you’ll find links to the books and blog posts, videos, and other free resources.
Eric Dodds 53:46
Great. Well, thank you so much for joining us. And we’ll talk again soon. Thank you for having me. I really loved the part of the discussion where we were talking about the ways that the pandemic accelerated so many things. Obviously, it was a hugely tragic and challenging event in so many ways. In terms of data and digital transformation, though, it really forced a lot of companies to do just a lot of different things to update the way that they are dealing with data and creating customer experiences. And I think, you know, as I was reflecting on the conversation you were having with Ruben about that Kostas, which was really, really enjoyable to listen to, there’s this phrase that every company is becoming a software company, and I’m not going to trademark this, but in many ways, the pandemic forced every company to act like an e-commerce company in the way that they deal with data. E-commerce many times, it’s sort of on the sharp end of trying to figure out how to leverage data to grow and really drive the customer experience with data. So I think that was my big takeaway. And I think that’s something I’ll be thinking about in the upcoming week. What stuck out to you?
Kostas Pardalis 54:57
Oh, that’s a good point, Eric. For me, I think it’s the validation of a kind of a recurrent theme that we see in our conversations, which is the relationship between data and people. I mean, Ruben said that a big part of his book is actually dedicated to how important people are when we are trying to be like a data driven company. And how many things are missing there. And I think this is like, as I said, another validation of the concept that data is not here to substitute people, right, it’s here to be another tool. For people. I know that we said that many times before, especially with people that are coming from the ML space or the AI space. And I think the most advanced, let’s say use cases where everyone is afraid, like the AI overlords will come and take our jobs, but at the end, from what it seems and what becomes more and more obvious is that data is just another tool, right. And it’s another tool that augments the capabilities that humans have. And it happens at every stage and so with almost like every problem out there. And we don’t only need to build new technologies, we also need to educate people on how to use these technologies if we want the technologies to succeed. So that’s what I keep from our discussion. And I’m looking forward to chatting with him again in the future.
Eric Dodds 56:23
Kostas, that was a very succinct and elegant summary of a philosophical perspective on data. And I always appreciate your ability to do that at the end. So if you want more concise philosophical predictions about the future of data from Kostas, and maybe me, join us on the next show, tons of exciting episodes coming up through the rest of the summer, and thanks for joining us.
Eric Dodds 56:54
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at Eric@datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.