Episode 54:

The Center of the Modern Data Stack with Neil Rahilly of Mixpanel

September 22, 2021

This week on The Data Stack Show, Eric and Kostas are joined by Neil Rahilly, VP of product and design at Mixpanel. Rahilly has been with Mixpanel since 2012 when he started there as a software engineer. Their conversation dives into product analytics and takeaways from his time at Mixpanel.

Notes:

Highlights from this week’s conversation include:

Neil’s programming hobby turned into a career and how he cold-contacted Mixpanel for a job (2:28)
Lessons learned from nine years at Mixpanel (5:05)
Defining product analytics (8:06)
How Mixpanel has evolved into the product it is today (10:56)
The importance of Mixpanel’s real-time analysis (19:52)
Looking at Arb, Mixpanel’s own arbitrary segmentation database (23:44)
The business impact that the rise of the cloud data warehouse had on Mixpanel (34:56)
Sub-second latencies and real-time use cases (49:05)
Career advice from Neil (1:02:02)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:06

Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Eric Dodds 00:27

We’re back on The Data Stack Show. And today we’re going to talk with a company who has been in the data space for a decade. And that’s Mixpanel. And we’re going to talk with Neil Rahilly, who has done a variety of things there, but now works for the product and customer experience teams. And he has been at the company for almost a decade, which is going to be a really interesting conversation. And that’s what my burning question is about. When you think about Silicon Valley companies and startups, there tends to be a shorter tenure than a decade. And so I think his perspective on staying there for that long and seeing the market, the products and the company changes is going to be fascinating to hear about. How about you Kostas?

Kostas Pardalis 01:16

Yeah, I think I used Mixpanel, for the first time in 2011, or 2012, something like that. So I’m super excited to hear from you about the evolution of the product this past decade, many things have happened in terms of products around data. And yeah, I’d love to hear the stories from him about how the product’s changed, how the company’s changed, and all that stuff. And that’s one thing. The other thing is, he’s a very, very experienced person when it comes to products. And I want to see what kind of advice we can extract from him around product strategy and product execution.

Eric Dodds 01:58

Great. Well, let’s dive in and talk with Neil.

Kostas Pardalis 02:01

Let’s do it.

Eric Dodds 02:03

Welcome back to the show. Neil, we’re super excited to talk with you. You have a lot of experience at Mixpanel. And a lot of experienced data. And we are just super pumped to chat about all sorts of things with you.

Neil Rahilly 02:17

It’s great to be here. Love the show.

Eric Dodds 02:20

All right, well give us your background, I would love to know kind of where you came from. And what do you do today at Mixpanel.

Neil Rahilly 02:28

So I joined Mixpanel about nine years ago. Prior to that I was actually in law school, and had just started programming as a hobby and because I was really interested in sort of creative change happening on the web with web 2.0. And it was the very beginning of mobile. And then I made this kind of sharp turn, where I decided this is something I actually really love. And so I reached out to a few companies at that time, you know, I was reading Hacker News all the time. And I’d actually started to use Mixpanel for an app that I had built, and really loved it. And then I know that Mixpanel initially just sort of dismissed my email, but eventually was searching enough for engineers that they were like, all right, go back to the inbound pile.

Eric Dodds 03:21

Such a good story!

Neil Rahilly 03:24

And so I came out and interviewed and thankfully, got the job. And then for the first few years I was at Mixpanel, I worked as an engineer. And at that time, I was just really learning from the founders and learning from the CTO at the time, getting sort of a real education in software development, software engineering, and eventually started managing the infrastructure team and then became the sort of director of the infrastructure and then eventually the VP of engineering. And then two or three years ago switched over to lead product and design. And then more recently, I also have the support team and the customer success teams report to me, so we run product and design, support and customer success as one thing that’s really focused on the the user experience, the customer experience, at Mixpanel. And I do that in a close partnership with the VP of engineering. And yeah, that’s, that’s, that’s what I do today.

Eric Dodds 04:29

An incredible journey. Okay. Lots of things to discuss. But I’d love to know, nine years at a formidable company like Mixpanel is incredible. Could you just give us a little look back at sort of the arc of your experience there. Is there anything that sticks out? Because a lot of times, people do a couple years of tenure and I think the perspective you have being at the same company for nine years is just really interesting. Is there anything that sticks out?

Neil Rahilly 05:05

Yeah, I mean, a lot, lots of things stick out, I think. And yeah, that’s been part of the sort of great thing about the experience that I’ve been part of Mixpanel going from six or seven people to hundreds of people. AndI think that along the way, companies change a lot. And so I’ve had to be intentional about wanting to be adaptable enough to not get frustrated by change and embrace what the problems are and learn from those at each sort of stage.

Neil Rahilly 05:44

Some things have been consistent, though, I think. One is that it’s just always been a company that’s attracted a really eclectic and awesome group of people. And so I stick around for that. And then the other is that it attracts a really amazing group of customers. So we often are helping startups and so seeing so many interesting companies that many of which become huge companies, when they were just a handful of people and so our customers are often these really innovative, creative founders or teams doing new and interesting stuff. And I also think that Mixpanel is lucky, it’s at this sort of intersection of a bunch of really big trends. And so I still think it has a lot of unrealized potential left to make even more of an impact and be a bigger company than it is today.

Neil Rahilly 06:40

So for those reasons, I’ve just kind of stuck around. And some people definitely have the reaction of like, What? You stayed at the same company for nine years? But it’s been great. And I’ve learned a lot. So no regrets.

Eric Dodds 06:57

Yeah, I think that’s awesome. And I think just over a nine year period, just responding to changes in the market and changes in technology, going through those changes repeatedly, in the context of a startup is just hard. And you sort of have to kind of rebuild things that you spent a lot of time building, because a lot of times have outside forces, right. And so that’s pretty hard. So good on you. That is really awesome.

Eric Dodds 07:25

Let’s step back a little bit. So Mixpanel is a product analytics tool. And one thing that we’ve been doing on the show is sort of revisiting the 101 definitions, but we did this with the data warehouse of all things recently, on an episode. And it’s really helpful for me, and I think our listeners as well, just to get a crisp definition of what a subject is that we all kind of think we know, in the back of our minds. And product analytics I think can mean a lot of different things to a lot of different people. So I’d just love to know from you… Could you just give us the 101? What is product analytics in five minutes?

Neil Rahilly 08:06

Sure, yeah. So product analytics, the purpose of the tool is to help product development teams. So engineers, product managers, designers, people who are making changes to the product every day, and are trying to improve the product, to help them inform their decisions, which are primarily prioritization and design decisions with data. And the primary input is user engagement data collected as an event stream. So all the interactions that your users are having with the product. And then it makes it really fast and easy to explore that data, and to visualize the types of metrics that are really useful for products. So growth metrics, retention, metrics, conversion funnels, and we’re really trying to deliver a productivity gain to the product development team. So helping them figure out what are the most important problems to work on, to be able to measure how effective the solutions you’re building are and actually catch when you make a mistake and actually make things worse for users. And do I really understand all the changes that you’re making to the product and trying to improve the products? Are they having the intended effect? Because you want to get out of this sort of mindset of shipping and celebrating shipping features and really start measuring what are the implications for users in the business, right? So you make changes to the product, but is that actually making users happier and making them come back more often and driving more revenue? And so that’s what that’s what product analytics is for.

Kostas Pardalis 10:08

Neil, I have a question for you. I was listening to the previous question that Eric asked you. And you were saying about the change that you have experienced in Mixpanel. You’ve been there, like for nine years, pretty sure that you have experienced a lot of change. And usually we, when we talk about companies and change, we focus more on the people, right, in the organization. But products also change. And they change a lot. And I think outside of like some very core people in the product teams or the leadership who stay, the rest are forgetting about that. So would you like to take us through the journey of the Mixpanel product in this past nine years, how it was at the beginning, and how it changed until it became the product that it is today.

Neil Rahilly 10:56

So Mixpanel started, Suhail, the founder had been using something at Slide which was, which was a gaming company that was acquired by Google, and he had been an intern there. And, they had really great analytics, and I only know this secondhand from him, but let them sort of understand users in the game and how they were, how they were interacting with the games. And, and I think he had that sort of moment of, hey, this would be really broadly useful. And so started building Mixpanel. And the early product really, it actually had very early on, I think in that first like year two, and it’s really just the the founders and one or two people had the … so Mixpanel was started in 2009. But pretty early on it had the like, key reports, which are you need the report that’s kind of event segmentation, lets you sort of do general analysis of the event stream that’s coming in, funnels to understand conversion and retention, to understand retention. And then the early running, I think that the the issue was, the thing about user engagement data is it tends to be one or two orders of magnitude bigger than probably at least your application data, right. So if you like you have an application, you’re storing, like your users table. And let’s say you’ve got a million users, but then you want to start tracking every time a user does something, then you’re actually now gonna be collecting millions of data points per day. And so that rapidly becomes a really big data set. And, and then the thing about product analytics is that you, you want to collect all that data, but then you want to look back over it, and really slice and dice it kind of arbitrarily and decide, hey, I want to I want to create a funnel that goes from signup to payment, or actually, now I want to make a thing that goes from sign up to these three things, and then payment. And then the query workload is really complex and unpredictable. And so in that early time, I think the challenges the company went through a bunch of different options, I think initially it was on mySQL, and then there was some Mongo and Redis, and then tried building it on Cassandra and, and kind of kept coming up against these walls from like a performance and flexibility standpoint. And so then turned to building our own database, which we call Arb, which is an arbitrary name for arbitrary segmentation. And that had just shipped, like right around when I joined the company. And I think that was a real inflection point for Mixpanel. Because it really allowed the company to handle much, much higher scale data volumes and then also deliver this really, really flexible interface. And it really had that kind of magic feel at that time in sort of 2011-2012. And so Mixpanel just took off. And then I think another interesting moment is when their mobile really started gaining steam. And we used to, I can think of a time when we went to Open Table, for example, on this like a small company at the time. And most companies were using Google Analytics using Adobe Analytics, Omniture. We’d go to pitch them on Mixpanel and they were sort of entrenched in those products, or would ask how are you different from Google? Of course, we were different from Google, but it was sort of hard to explain. And as mobile came along, we said, well, what about mobile? And it was, Oh, well, yeah, there’s I think there’s like a couple people down the hall working on an iOS app or something, go talk to them. And then you’d ask them, Hey, do you have analytics and no and and because Mixpanel, that event model I was talking about, there’s nothing platform specific about it, you can track an event from a mobile app just the same way you can track one from a website. And, and that was really a way for us to get a foothold in a lot of places. We sort of repositioned as mobile analytics, even though really were completely sort of cross platform, we’ve since positioned back to just product analytics. But that gave us this kind of explosive growth with the iOS App Store opening up and just all those apps coming online.

Neil Rahilly 15:37

And then I think our product evolved a bit based on the fact that we had an SDK that was in basically all apps. And because of the scale we could handle, because of the interface we could build at that point, by far the most popular product analytics tool, we got into all sorts of high growth apps. And so from there it’s like, Okay, well, we’ve got this SDK, we should just add more value. And so why don’t we help our customers send push notifications, and why don’t we help them send surveys, and why don’t we help them run A/B tests and, and so we offered more and more of these products that were, it was cool, too, because you could kind of configure them in our website, and then boom, it would be on because our SDK was already installed in your app. And I think what happened there was that we inadvertently spread out into a lot of different use cases, different end users. We’ve now sort of spread from just serving product teams, and engineering teams trying to make product decisions. And now we’re helping marketing teams send notifications and in surveys, and then we got into some more sort of data infrastructure type stuff. And we got spread a bit too thin. And we were in too many different types of products for too many different types of people, which made it hard for the company to focus and get to all the stuff that customers wanted across that wider surface area. And so the next phase was to really refocus and go back to our core, which is, as I described, product analytics and helping engineering product design teams, like leading digital companies and startups. And so that was a tough transition. Because of course, we had both people who have been working on those products and customers who love those products. And refocusing has been huge for us, because I mean, it’s that old saying, like just do one thing really, really well. And, and then you see it like in our, our NPS has like tripled and our customer retention has gone up by 35%. And, and so I think it was a good decision, but it was hard. And that’s probably the most dramatic part of that journey, at least in the last four or five years. And Arb, just on the infrastructure side, we’re on like version three of that. There’s kind of some interesting stories on the infrastructure side as well.

Kostas Pardalis 18:28

That’s super interesting. And actually, I think it’s amazing and very, very useful advice for everyone who is into product. And what I take from what you say like product management and product leadership in general, it’s not just about building, it’s also about making the decision to kill products, or features, right. And this is like a very important part of the work that product managers are doing or like product owners, like everyone who is involved in product, and we shouldn’t forget that. And I really appreciate that you’re sharing this with us.

Kostas Pardalis 19:04

I remember the first time that I used Mixpanel, I think it was in 2013 or something like that. I was impressed by the real time nature of the product. I remember for example, I was very interested to figure out what the users of my web application were doing. But when I figured out that not only can I see that but I can see it in real time. Like someone was doing something and I could see an event popping up there. And I was super impressed and especially when you’re at the beginning of a new company and a new product where you don’t have that much traffic. It’s an incredible feeling to see someone interacting with what you’re building, right? Yeah. How important is the real time nature of the product for Mixpanel?

Neil Rahilly 19:52

I think on the sort of ingestion side like that, that as events get sent, they immediately show up in the UX. I think it’s really important for sure in the case that you’re talking about when you’re a smaller company. And really you don’t have the kind of data volume or the user traffic to … you may as well look at what’s happening to every user that signs up because there’s not so many of them that you can’t and i think that just from like a product experience standpoint, there’s like an emotional component there that part of what what product analytics tools do is they give people who are making digital products that give sort of visibility into what what’s happening with what they’ve built. And so when I say like, it’s different from a restaurant. At a restaurant you make the food, you sit there and you watch people eat the food, and you can see if they liked it or not. And whereas you put an app in the App Store, without analytics, like you got no idea what’s happening. And so when you turn on a product analytics tool, it’s kind of a “sight to the blind” moment. And it’s thrilling, it’s thrilling to watch people all over the world signing up and using it and you think this thing that I just built in my desk is … there’s someone in Russia using it. And so I think it’s important for that, I also think it’s important for actually the implementation side, like so, when you are the one who’s connecting tools to Mixpanel or instrumenting, your application your servers and sending events to Mixpanel, we can really tighten that kind of feedback loop. If you curl an event to us, and then it immediately shows up. And you can see if it’s the way you want it. And then you can keep editing your code until you’re tracking things correctly. And so for that workflow, the real time ingestion is key.

Neil Rahilly 21:50

Once you’re a bigger app, or you’re making decisions about product features, and in that sort of thing, I think it starts to matter less and actually being too impatient will start to work against you. You need to sort of let some data get collected and see some trends over time and then look at the sort of aggregate stats. And so in that case, you’re probably not making moment to moment decisions, you’re really product feature prioritization, design decisions are not super urgent. Prioritization is probably happening, weekly, monthly, quarterly kind of thing. And design decisions will be getting made every day. But you don’t need instantaneous ingestion latency like real time. On the query side, I think it’s always super important like that. The queries are at an interactive speed, because that’s what encourages people to explore the data and makes getting answers really quick and easy. Which means people will do it because it’s not painful.

Kostas Pardalis 23:03

Absolutely makes total sense. I think it’s a very important part of the experience that someone gets from Mixpanel, at least based on my experience and working with it. You mentioned the database that the custom database that you build is called Arb. Right? Can you tell us a little bit more about it? I mean, you mentioned that the existing technologies that the existing storage engines out there and query engines out there couldn’t scale. So you decided to build your own database system in a way? Can you tell us a bit more about it? Like, what is it first of all, is it a database like Postgres, like MySQL? Or is it something else? Like, how does it look?

Neil Rahilly 23:44

Yeah, so. So I think Arb fundamentally is one of these, it’s more purpose built for our kind of workload, and that lets us make a bunch of trade offs. The key thing is that we know that the data is coming in as a user event stream. And so when you set up an instance for a customer in Arb, the event table is the sort of core table. And so we can make an assumption that there’s going to be a timestamp column, and that it’s actually technically optional, but like any user ID column. And I think the first thing to sort of understand about it is that it lets us partition the data by user ID and by time, so we can distribute the data across a lot of different shards. And those are distributed by the user ID, and which in a typical application means it’s pretty even distribution. And then it also lets us make this very critical assumption that all events for any given user are going to be in a single shard. And then second, with a timestamp, we can then within the shards, we can partition the data by time. And that really works for our query load. Because if you think about wanting to do sort of behavioral analytics, right, where you want to say look at a funnel, and you want to see what percentage of users went through step one, step two, step three, what you really need to go do like an analysis like a query level is go look at each individual user’s journey and see which steps each individual user made it through. And how many users made it to each step. And that means that we can do that completely distributed, like you can do that independently in parallel on each shard. Because you know that all the steps for a given user in that one shard, and then what you pass back up to serve the aggregation step is the aggregate from each shard like this, many users made it to step one, this many to step two, which is very small piece of data to send over the wire. And if it weren’t that way, then you would have to scan every shard to put together each user’s history.

Neil Rahilly 26:13

And the same is true of retention, the same is true of a uniques query, if you’re just looking at totals and uniques kind of thing. The key is that by sharding on user ID, you can just kind of sum up the values from each shard, and you’re not double counting anybody. And then, of course, a lot of the time in archive analytics, you’re like, looking over the last 30 days, looking over the last week, and so it’s not, as often people want to look back two or three years or whatever. So by being sharded by date, it means that we don’t need to process all the data each time. And then from there, it’s like the rest of the properties you send with the event are just you can send whatever you like. And, and that’s actually a really nice thing about ARB, I think, is that it’s schema on read, like so there’s this tension, I think, in data where you want to have governance, and you want to have schemas and so the dream is to have like a tracking plan and a schema that’s like tightly enforced and tightly managed, but it’s kind of at odds with, on the other hand, the delight of this user experience of just being able to like, write one line of code and start tracking your user events. And that’s really just being able to get tracking going quickly and send more data when you need it without a lot of overhead of managing the schemas. It is actually like a feature in some ways. And so that it’s like schema-less or schema on read allows us to be very kind of friendly, user friendly, and in how you set up tracking. And then I said it like Arb’s gone through a few iterations. The first one was to switch to the storage format for each of those event files to be column oriented. And then the third was to separate compute and storage, which we did, we had this kind of circuitous journey, where we started in Rackspace in the cloud, had too many like noisy neighbor type problems back in the day, switched to SoftLayer hat was on, we’re on our own hardware for a while. And then four or five years ago, switched to Google Cloud for, you know, developer productivity reasons. And when we made that switch to Google Cloud, we really re-architected Arb in a lot of ways so now that the storage and compute are completely separate and we’ve gotten rid of a lot of the like old scaling problems we used to have when we had sort of clusters running on our own hardware. I think, for our users, what’s happening as well, that’s really important is that we have 1000s and 1000s of cores deployed, and then that compute is shared. So you come to Mixpanel and you might do a query over a trillion events, right. And we might use 1000 cores to do that query. And you’re using them for three seconds and it’s pooled across all of our users. And so that allows us to also keep costs down and latencies low.

Kostas Pardalis 29:24

Nice, nice. They’ve done some amazing engineering work behind the scenes from Mixpanel. So, so far, you have described, I mean, we started the conversation talking about Mixpanel delivering product analytics. And that’s like the definition of the product. Let’s say, we continue the conversation. And so far we have talked about innovation that has been done on a visualization level or like at the point where the user interacts with the application. There is huge innovation that you’ve done on the storage layer. You pretty much had to build your own database system with, of course, like your own assumptions there and trade offs to make it work. And you also have an ingestion layer, right? Like you can collect data. Let’s say from the data stack, we are talking about database systems, JSON layers, processing engines, at the end, did you have to build pretty much a whole dedicated data stack to drive the experience of Mixpanel. Does this sound right?

Neil Rahilly 30:35

Yeah, yeah. I mean, I think things have changed over the years to like, so for example, in 2011, when Arb was first conceived, maybe Redshift existed, I’m not sure if it did. Certainly BigQuery and Snowflake didn’t. And now, we do run benchmarks still, like just recently, in the last six months, and for our specific workload, Arb is still 10, 20, 100 times faster, cheaper than running it on like a generic data warehouse, or like a cloud data warehouse. But basically, we try to avoid the kind of not invented here syndrome, right? Like we’re not trying to build for our own sake, we really find if things come into view, that would be we don’t need to build it ourselves, like we adopt them which is what we did when we moved from, from SoftLayer back to the cloud. But yes, same on the ingestion side. We have a bunch of servers kind of the edge around the world. So there’s like low latency from client devices to Mixpanel, and then those all independently queue data. And then that’s ingested. We use Kafka into our core data centers, one of which is in the US, and one of which is in the EU. And, and so yeah, we’ve built the full kind of event client data collection, ingestion system, and our own storage layer, our own query system in the UI. So we’ve been trying to leverage services in the cloud where they make sense for us. So we’ve been using Spanner for things and some of the other tools that Google has, but for the most part, the core system CRL is still custom built.

Kostas Pardalis 32:37

Do you think if you started building Mixpanel today that this would have changed? Do you think that you would be using some of the technologies that exist right now, are there for each one of these layers that we talked about?

Neil Rahilly 32:53

Yeah, probably. I mean, I think that I said, I think in terms of storage and query layer, I think that the cloud data warehouses are really great. And if you’re really just a small startup, you probably just start there. At the same time for the kind of scale a lot of our customers see, they still can’t do interactive queries over the types of workloads that we see. So for now we continue to invest in Arb on that basis. Ingestion, again, it’s like you were saying earlier, it’s like part of product is, it is deprecating things. And it’s also part of product is recognizing when things that you chose to build, you don’t, you can now use off the shelf stuff to replace it with and you can get leverage from that. So I’m always looking for that. Again, on the issue on the ingestion side is like, I don’t see some off the shelf thing that actually has all the the features that in a sort of schema-less approach like we have, and if we did, we’d use it. But for now, I think the things we do have are things that we need to build ourselves to deliver the experience that we want to give our users.

Kostas Pardalis 34:06

Yeah. You mentioned data warehouses, like a couple of times, I remember. I mean, Mixpanel has been around for quite a while. And I think at the beginning, there was just Redshift around and probably Redshift wasn’t that popular yet. I mean, it wasn’t yet. We didn’t. We hadn’t entered the era of the cloud data warehouse yet. When this happens, both like the explosion of users of Redshift, and then Snowflake and BigQuery. How did this affect Mixpanel? Both from, from a business perspective, like what was the experience that you had? And also from a product perspective, like, if you show things around your perception of the product, like if something changed there because you saw how things could work like with a data warehouse or how customers were interacting with it?

Neil Rahilly 34:56

Yeah, so I think cloud data warehouses and then kind of like the associated rise of the data engineer and analytical engineers and, and now everything that’s going on with DBT and reverse ETL is just basically this whole kind of modern data stack movement. I think it’s just like a seismic shift in the world of data. And it certainly affected us. I think one thing that we saw was, we would have larger customers … It’s funny, if I go back far enough before Redshift, and those types of tools existed, we had larger customers, and if they said, Hey, we’re, we have some engineers, they think we were going to, you know, work, we’re going to take this stuff in house, they’d often come back a year or so later and be like, woof, that was harder than we expected. But once cloud data warehouses were available, then, you know, it became much more viable to build really big event ingestion systems and storage and query systems for this kind of data. But then we saw people coming back again, and for a different reason. And the reason was more of the UX sort of end user experience, right was the the teams in that were product managers, product designers, engineers, the experience for them of setting up like doing exploratory analysis, setting up a funnel understanding time to convert understanding nuances of retention, being able to segment that by proceeding user behavior, that stuff was basically either extremely difficult and slow or impossible for them to achieve in like a BI tool or in writing SQL. And so they become reliant on analysts to do it for them. But even that is kind of slow, like the equivalent of a, what’s a few clicks, simple Mixpanel funnel might actually be like 300 or 400 lines of SQL. And even if you can, you can do that, doing it in a sort of consistent way and doing it in a way that enables it to be self-serve for everyone, and lets people really interact with it in this very sort of exploratory iterative way. Teams really missed that. And so I think it’s a case where you have this sort of workflow, vertical kind of product development, specific types of analysis and types of questions. And even going back to the kind of experience you were talking about, just being able to, from all of those aggregate reports, be able to drill all the way down to the granular user experience that’s underlying those reports to really understand the causes of things, that was missing.

Neil Rahilly 37:37

On the other hand, the data in the cloud data warehouse was one and this a big motivator, I think, for companies moving there, richer, right? You don’t just have the user event stream, from your product, but you also have your billing data, and you have your support ticket data, and you have your CRM data and so you have all this additional rich information that you can join to more deeply understand the user experience. And it’s also better governed, because usually, there’s a data engineering team that’s really focused on that and focused on producing these cleaned up, reliable, simplified tables for driving analysis or automation in the company. And I mentioned DBT earlier, but this is more and more trying to bring these sorts of software development kinds of practices like testing and so on to data. And so I think that, for us now, what we’ve been really working on for a year or so is, what we internally call modern data stack compatibility. And this is trying to bring product analytics to the modern data stack, how do we marry those two things? How do we give product teams the UI that they love, but point that at the data that they trust, and that’s rich, and that’s in the cloud data warehouse.

Neil Rahilly 39:13

And for us, what that’s been is, talking about the sort of hub and spoke model where you put the cloud data warehouse in the hub, and then you have all your other tools in your company as spokes, and you pull the data in front of them. And then you can push out this sort of aggregated data to make all those tools more useful. And I think product analytics is the approach we’re taking to really be a great spoke in that modern data stack. So if you’ve collected events to Mixpanel, they can be ETLed really easily into your cloud data warehousing. That’s been true for forever. And then now what we’re really doing is the other direction where you can use reverse ETL to push dimension data like users and accounts tables into Mixpanel to join with your event stream. And also modeled events where there are events that are occurring and in other systems that you model as a table in your data warehouse, and then you can pull that into Mixpanel and have that be available in your analysis and Mixpanel. And I think it’s, a case of where, when I go back to those times, where we would see customers churn to a cloud data warehouse, and we felt a bit threatened by it, I now see, it’s this amazing tailwind for us because if you stand back a little further, there’s just this huge movement going on, that’s been enabled by cloud data warehouses to really invest in centralizing and validating and cleaning and joining companies data. And as an analysis tool, data is our input, our grist for the mill. So the better and richer and more trusted the data that gets loaded into Mixpanel, the, the more value we can create for our customers. And so I’m really excited about the whole modern data stack and Cloud Data Warehouse change. But it’s certainly like a big shift in our product strategy, and sort of how we think about where we sit in the stack.

Eric Dodds 41:30

Yeah, I love this subject. And I’d like to dig a little bit deeper into it. And one thing we’ve talked about before is that when you think about the value that paid product analytics tour, you can really apply this to any team, right? So marketing analytics, or analytics around customer success. The point solution is really beneficial for that team. Because to your point about what you’re doing, and Mixpanel, you can really focus on helping a very specific set of users of your product accomplish a very specific set of things with your tool. And that creates a really good experience. And I think one thing that’s exciting about the cloud data warehouse, and what you just talked about is that value in a point solution, especially in the analytics space can often be trapped in that tool, right? Where it’s like well I uncovered this great insight, and then you under this challenge of like, okay, like, how do I take that, and then operationalize that insight across these other pieces of the stack? And so two questions for you, Neil, first of all, the cloud data warehouse, for sure, for all the reasons that you mentioned, I think is really exciting. I think one of the current limitations in some ways, which will lead to the second question, is that, from a cost standpoint, it is hard to create that loop in real time, from a technical standpoint, and like a cost standpoint, where you can create this amazing feedback loop. But it’s pretty expensive. And you have to sort of piece together several pieces, pipelines in order to create that feedback loop. So first, like when you think about the real time use cases where maybe routing something through the data warehouse loop doesn’t make sense. How do you view the role of a product analytics tool like Mixpanel, sort of serving those? And so one, just off the cuff example would be your product team uncovers insights around churn. And so you need to sort of enroll those people in like a win back campaign of some sort that’s happening in a completely separate tool, and maybe that’s run by marketing or some other team. How do you view that piece of the architecture?

Neil Rahilly 44:01

Yeah, I think we’re in this kind of transitional or I don’t know how long it persists sort of mode, though, where there are really two ways you can come at that. I mean, one is that we do have integrations directly to marketing tools. And so the cohorts that you create Mixpanel, you can hook up to those tools and push them there and act on them just kind of a direct point to point integration between Mixpanel in the marketing tool, and that’s kind of the way it’s been for a long time. The other way is that you pull those things so you pull the tables that are in like the events or the cohorts that you’ve built in Mixpanel. You pull those tables into your data warehouse, right, and you use a reverse ETL tool to push them on to that marketing tool from there. And in the first case, it can be lower latency. I think we can use many of the tools, we can push those cohorts like every 15 minutes. And on the other hand, I think that if they’re going more places, at some point, it makes more sense. I think, if you really are going to do this kind of hub and spoke modern data stack architecture, what I’m seeing is more and more where it makes sense for things that you want to federate everywhere for those to belong in the hub and you’re probably going to find that more manageable and less error prone versus having like a kind of … it’s great to have one point to point connection between two tools, it’s going to work great, but like once you’ve got 25 tools, and they all have two point connections, it can start to become really messy and hard to kind of reason through. So if I were building my own stack kind of thing from scratch, like I would probably just say, Okay, I’m gonna have my data warehouse be at the center of things. And I’ll pull data from all my different tools and tables from all my different tools, and then have it managed there with like, DBT or something and then push it back out with reverse ETL.

Eric Dodds 46:22

Yeah, it makes total sense. And it’s interesting. One thing we’ve talked about before is this daisy chain problem. And it’s interesting because the daisy chain problem in the boom of sort of marketing technology tools, or whatever it is like oh, direct integration with like HubSpot and Salesforce, and like Marketo, and Salesforce, and then you kind of like you would daisy chain, right, where it’s like, okay, Salesforce connected to Marketo, Marketo is connected to whatever other thing like ad platform or whatever it’s doing analytics tool, then you sort of create this daisy chain. And then it was like, Okay, well now like we’re collecting sort of the raw data. But then when you think about analytics, or tools, or sort of behavior-based tools that send the behavioral data, it’s like, Yeah, well, you’re sort of getting the raw behavior. And you’re sending that, but then you run into another daisy chain problem, which, for sort of the data engineering side of things, is pretty challenging.

Eric Dodds 47:20

Okay, second, second question is a follow on to that. And I’m gonna flavor this with my own, my own take on it a little bit. But so you see this trend, especially as it relates to the cloud data warehouse, where things are getting cheaper, and faster, right? So you can sort of imagine this future world, imagine a world where you remove all the artificial scarcity from the equation around moving data, and you sort of have just the lowest cost ability to move data wherever you want, right? Who knows if we actually get to that point. But in terms of, because you do this every day, and you’re thinking about these architectures, when do you think we’ll hit a point where the latencies, around the product analytics tool like Mixpanel, where you can complete the loop of saying, let’s dump a cohort into the warehouse, run processes on it? And then get it back out of the warehouse? When do you think that loop will make the real time point to point connection sort of unnecessary? And I’m interested enough from your perspective, like, obviously, with what I work with every day, I think it’s a very relevant topic, for me, but you in many ways, if you’re building for an architecture like that, you have to think about when those latencies are going to drop. So I just love to know, when do you think we’ll see a world where the point to point solution isn’t necessary because you can complete the loop as fast as you want at a reasonable cost on the warehouse?

Neil Rahilly 49:05

That’s a good question. I think the reality of these connections right now is that they’re not like real time in the sense that they’re, they’re like event streams, right? Listen, we’re not collecting the event streams, and then streaming them to engagement marketing tools and in real time, it’s still a batch system where it’s just like on some schedule, it’s like pushing here are the latest users in this targeted cohort to whatever marketing tool it is. And there’s no real reason that it can’t be just as fast to like, push that same thing to the warehouse. I think at scale where it can get the probably the thing is right now most of those connections are pretty naive, like they just re-compute the whole cohort and push the whole cohort and of course, past certain scale, you have some big gains by pushing like a delta of some kind but today it’d be fine like if you had from our perspective, if there was whatever ETL service was like hitting Mixpanel could hit us at that same interval would be no different than the way that we kind of hit ourselves to push cohorts to downstream tools.

Eric Dodds 50:22

Yeah, that’s super interesting. And I think one thing, hearing you say that that, that I’ve thought about a lot recently is when you say real time, and you actually drill down into that with a company who’s trying to do real time, there are absolutely mission critical things that you do want to do in real time or near real time, right? Where a user does x, and you want y to happen immediately. So for example, a new user completes the signup process for an app and you want to send them a congratulations message or push notification or something, right, like sure. I mean, of course, you want them to have that ratifying experience, that’s actually part of the onboarding flow. But when you think about cohorts, I mean, there aren’t a lot of situations where … well, I say that, but I’m sure I could conceive of many … that were like, immediate sub-second latency is an absolute requirement. Right?

Neil Rahilly 51:36

Yeah, I would say that there are lots of ways that you can use a cohort to simulate a transactional process. But you’re usually better off just using a transactional process, like having your signup code just like trigger an email to sendgrid or whatever sent that use. Yeah, like a lot simpler than this kind of roundabout thing of like going into a product analytics tool and setting up a cohort of users who’ve signed up in the last five minutes and then pushing that to like an engagement marketing tool to send an email. It’s just like a lot of like, you call it sort of daisy chaining and additional complexity to just like, I think you can kind of forget sometimes like, Oh, hey, we also do have our whole transactional system for doing transactions. And once you remove all those use cases, then yes, by definition, you’re only looking at stuff that’s usually, things like, well, you can’t do in a transaction. We’re not as easily where it’s like, well, users who did this, and then this and we’re in that state and the last five days and then did this other thing. Yes. And in those cases, they tend not to be so urgent. And we’ve seen that especially when we’ve had that messaging tool where that was a lot of like customers, we get those kinds of hey, could this happen at lower and lower and lower latencies. And then when you would dig into the problem, they were trying to solve it with stuff like you just mentioned, like we want to send an email when someone signs up and then it’s just easier to just put that in your code in the application.

Eric Dodds 53:16

Yeah, it’s interesting, I think, once you get outside of the sort of mission critical, sub second latency, whatever you want to call it, actual real time, real time use cases. When you start to get outside of that, in the customer journey, even companies that really large scale, you’re in the realm of testing, right, and, and so like building cohorts and sort of conceiving of like, different ways that the customer journey can happen, that they’re dependent on different user behaviors. And often it gets a lot more complex, because you’re looking at combinations of user behaviors that don’t happen back to back, like in a linear fashion, always. And in that context, you’re really looking at, okay, how do I build like a … the hard part is actually building the list of users that you want to test something against, right? And it’s more about the cohort, and combining those behaviors or traits that sort of build that cohort, and then saying, okay, how do I operationalize that and sort of enter these people into an alternate customer journey that I can test?

Neil Rahilly 54:22

Yeah, this conversation is making me think of two … I think there are two competing conceptions of the data stack. I’d say one that was kind of like, called the sort of the engagement stack or the growth stack. That was sort of like a CDP, pressive product, analytics tool plus a bunch of engagement type tools and, and that was this sort of all real time event collection, and sort of engagement tool type stuff with the feedback loop between you send this message and then you see, like what happened as a consequence and then it’s actually kind of a competing stack, the modern data stack built around the cloud data warehouse. And you see that within a lot of companies that they literally have both stacks, and they’re kind of siloed from one another, for the most part. And I think for a long time at Mixpanel, our conception of how things would go is great, because it’s like, great product analytics is going to be at the center of a company’s data stack and be the brain for every user interaction that happens in a company. And I think that that’s just proving not to be the right way to do things–that product analytics tools are really, really awesome at answering questions for the product team, about their users, and their experience in the product. But they’re not nearly as capable as a cloud data warehouse and the tools that are being built around them for being how you centralize and govern and route data at your company. And, and so I think that that’s where, like, even as we’re talking about these use cases, they’re more marketing, right, like they’re for, how do we build cohorts to send campaigns to? Yeah, and actually, once you kind of revert back to hey the cloud data warehouse is going to kind of be the center in the modern data stack. And if you go into a lot of these marketing and automation tools, they have incredible cohort builders, and they can take event stream data and, and do targeted notifications based on that. And you don’t really need to use your product analytics tool to do that. And there’s actually some simplicity in that you can use your marketing tools, marketing, use product tools to do the product work and, and this conversation is actually very representative of how when I was earlier talking about how we got spread out into a lot of different use cases. We were spending a lot of time trying to figure out how to move cohorts here and there so that people can run a campaign on them?

Neil Rahilly 57:13

And now it’s like, it’s been this very liberating thing to be like, that’s not really what we do. We answer product questions for the product team. Yeah. And there are awesome marketing automation tools out there that we partner with. And I think one of the cool things is that the modern data stack makes this kind of potential for this standardized way for all these tools to communicate, which is like ETL into the warehouse and reverse ETL back out. And as long as you support that, then you can kind of be integrated into anything in that clearing house.

Eric Dodds 57:52

Yeah, I think I have two comments on that. One, I totally agree. I think the sort of liberating individual tools to be the best at what they do, because you have the cloud data warehouse at the center, makes total sense. The follow on question, which is definitely for another episode, because we’re coming up on time as then you get to the question of data governance, right? Because you’re sort of like building cohorts and separate tools. And there’s crossover, like, is it the same data or whatever, that’s a whole ‘nother discussion, super interesting, where the space is going on that front.

Eric Dodds 58:28

But the second comment is, the idea that you can have sort of known models around data from different tools in your cloud data warehouse is really interesting, right? Where you say, okay, we have the, you know, product analytics from Mixpanel. We have marketing, website analytics, from whatever it is Google Analytics, we have engagement analytics from the marketing automation tool, and maybe lead to a sales outreach tool. And as sort of data models, if data models find some level of conformity, or people just crank out a bunch of DBT models to sort of figure all that out. You can almost conceive this world where like, Okay, I’m starting a company. And I just kind of already know what things need to look like in Snowflake or BigQuery when I spin it up, to sort of serve all of these different use cases, which is really interesting, I think, would just be just thinking about the experiences that I’ve had in the past like, man, that would be so nice. It just saves so much work in so much time across teams to have that approach from the start.

Neil Rahilly 59:44

Yeah, yeah. Yeah, I would be. It would be, I think, then all different tools could make more assumptions about the data and do more for the user automatically. Which would be great. This is like a perennial conversation for us. It is sort of like, Can we kind of standardize the sort of tracking plans? And sometimes it’s sort of like, Can we do that by vertical, like, if you’re an e-commerce company, you just need to track these things. And then we can read, while you’re reporting out of the box.

Neil Rahilly 1:00:19

One thing you do come up against, I’ve found, it’s just like, it’s amazing, how much like companies are kind of snowflakes, every company is different in important ways, even within the same industry. And so that standardization has been hard to define, and in some ways, like having to think through what data is important to your business, and what events are important, and what properties are important. And what are the right KPIs is also like, a side effect of having that. Doing that planning is really more deeply understanding your business and how to measure it and what the goals are and aligning around that. And so it’s also sort of like a useful process to go through anyway. But I genuinely don’t know which way it will go, whether they’ll just be kind of like a standard data operating system for running companies or continue to have just kind of freeform and each define its own.

Eric Dodds 1:01:27

Well, it’ll be fun to have a front row seat. As all this unfolds, we’re right at time, Neil. But just one more question. Thinking about our audience, you have played so many different roles at Mixpanel. And I’m wondering if you have any career advice, a lot of our listeners out there across the spectrum, some people starting out in their careers sort of working with data or, or engineering roles that are close to data, and some people who have been doing this for a really long time, like you, but any career advice that you could give our audience before we, before we hop off?

Neil Rahilly 1:02:02

That’s a good question. I think that specifically for data work, I think, you know, you can only really … We see this with our customers as well as set you there’s a sort of instinct to sort of begin with the data collection and, and the sort of technology side of things. But I think that like that I think this was an Apple thing, that you have to start with the user problem and then work your way to a product that would solve that problem. And then the technology that makes that product possible, you can’t just go build a technology and then search for what product you might make with it, and then hope that there’s someone who needs that product. And I think it’s the same with data, you need to really understand the business that you’re in and the product or service that you’re providing and the user experience and understand what are the most meaningful questions, and what are the most important things to have visibility on and then work your way backwards from there to the systems to provide that. And in that, I’d say there’s I think it’s Gall’s law. It’s like all complex systems that work came from simple systems that work. Keep it simple. And just get something working end to end for the most important metrics and to answer the most important question. And I think that that’s going to be what in the end is going to allow you to have the most positive effect on the company that you’re working at. And, and that’s what really drives careers forward in the end, in the long term, I think it is knowing what the most important things are to be done and actually getting them done.

Eric Dodds 1:04:05

And that is really good advice for everyone, no matter where you’re well said, Neil, well said and appreciate that insight. Well, we are at time. And this has been an incredible conversation, Neil. Loved learning all about Mixpanel from a technical side and a product side and really appreciate you taking the time to join us on the show.

Neil Rahilly 1:04:30

Yeah, this is great. Thank you so much for having me. Really appreciate it.

Eric Dodds 1:04:35

My big takeaway is kind of thinking through a summary of all the things that Neil talked about, and I think he just has such a mature perspective across a number of subjects. So number one, he reiterated a lot of things we’ve heard from other really good experienced engineers around keeping simple. How do you make the decision to build something yourself or buy a service? Just a very mature perspective on that, a very mature perspective on focusing the product that you build for a specific purpose for a specific set of users. And also a mature perspective on adapting to what are sort of clear sea changes in the way that people are using your tool relative to other tools. And so I just appreciated it. I think all of his experience and the fact that he’s just a really smart guy–it was just really fun to hear him talk with a lot of authority across a variety of disciplines.

Kostas Pardalis 1:05:39

I think the most interesting part of the conversation, and there are two things that I’m going to keep from this conversation about product. One is the focus. You also mentioned this, Eric, focus is super, super important. And I mean, I think Neil is one of the best authorities to talk about this. I think the examples from Mixpanel were amazing about what focus can do or what lack of focus can do to a company.

Eric Dodds 1:06:04

Yeah like the NPS score and retention numbers were crazy. And just as a consequence, I mean, not simply to change your product focus, but those are numbers that tons of companies are striving for.

Kostas Pardalis 1:06:20

Yeah, yeah. And the other thing is that product as a function is not there just to build, it’s also there to destroy in a way, right? Like, everything that we build at the end is an experiment, and we have to treat it like this. And when something in the product does not work, we have to get rid of it. It’s not easy, because the mentality of like people who work in product is all about building and executing and introducing new features. But sometimes even more important to the commission is a feature on building a new one on top of that. And that’s my other hard learned lesson that I think everyone who has tried to build, especially like people who have been around a product from the beginning, until it matures, like I think they will say that this is super important. The other thing that I found super fascinating was all the discussion around building a custom processing and storage solution for their specific use case, which I think is super interesting. And it’s something that if not like now, in a couple of years, we will be hearing more and more about, especially as all these platforms like Snowflake, BigQuery, they are trying to move away from being a data warehouse and actually become a platform where data applications can be built. And this is going to have some very interesting challenges on the technical level. And I think what Neil was describing is like a small glimpse of the future that we’re going to see. Now it remains to be seen how it’s going to materialize, but we’ll see.

Eric Dodds 1:07:56

Yeah, it will be really fun. Well, that was a great conversation. Many more great episodes lined up for you this end of the summer and going into the fall. So thank you Brooks for keeping an amazing list of guests on the roster. And we will catch you on the next show.

Eric Dodds 1:08:15

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at Eric@datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 54:

The Center of the Modern Data Stack with Neil Rahilly of Mixpanel

September 22, 2021

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter