This week on The Data Stack Show, Eric and Kostas chat with Vijay Ganesan, Co-Founder and CEO at NetSpring.io. During the episode, Vijay discusses product analytics, the maturation of BI, data warehousing instead of time-series databases, event-based data, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome to The Data Stack Show, today we are going to talk with Vijay who was one of the founders of ThoughtSpot, which is a hugely influential company in the world of BI kind of came of age, along with Looker in many ways. You know, somewhat of a different audience, Kostas. But obviously, he knows what he’s talking about when it comes to analytics. And amazingly, he started another analytics company, which is fascinating. We’ve talked to a couple of people now who came from a world of sort of decade defining analytics and have started like, subsequent analytics companies. And so part of what I want to ask is why. And, you know, what’s the motivation behind that? Obviously, there are still major challenges to be solved, major opportunities to be taken advantage of. And, yeah, that’s just fascinating. So that’s what I’m gonna ask about.
Kostas Pardalis 01:24
I want to talk with him about product analytics, specifically, because the new company is about product analytics, right. And it is like an interesting breed of analytics. There are products out there, right, like 150% sure that he has some very good reasons to start the product analytics company today. Right? So there’s probably some things that have changed, and they have been created like the makeup makes the day like a good time to do that. Yeah. It’s a little bit super interesting to see and hear from him. Like, what the reasons are for lots, what are the differences between like the previous, let’s say, wave of product analytics tools? And what’s the opportunity, because the opportunity also corresponds to a need. So let’s see what made him like to start combining product analytics.
Eric Dodds 02:19
Yeah, I agree. And we need to figure out how net spring works, of course, under the hood, because it sits in the warehouse. So yeah. Let’s dig in and find out. Vijay, welcome to the show. We are so excited to chat with you. Thank you for having me. I’m excited, too. All right. Well, give us your background.
Vijay Ganesan 02:37
Yeah, Chicken Nation. I’m the co-founder and CEO of net spring. We are in the early stage of our startup in the product analytics space. My background before this, my co founders and I did a company called thoughtspot. It’s now a leader in the business intelligence space. Prior to that, I spent some years in Oracle PeopleSoft working on business intelligence analytics systems. So my DNA is building race class data analytics products.
Eric Dodds 03:09
There are so many things to dive into? Thoughtspot has been such an influential company in the world of BI. Can you give us just a brief story of the founding? Were you at Oracle when, you know, the idea came about to found it?
Vijay Ganesan 03:28
Yeah, that was an oracle that was part of, you know, Oracle, which is, which is part of that, you know, the first generation of what people called Big di, right, these are very large, very complex, centralized bi environments where, you know, you bring all the data into a central repository, and then you write, you know, you have these armies of people building very complex analytics, very large, centralized team that is building analytics for businesses. And so that, that was sort of like the first wave of BI and then there was the second generation of systems that are what are called departmental bi where people said, you know, this centralized complex, large systems are too painful. You know, I’m gonna buy a desktop license of Tableau and you know, somebody writes a sequel, pulls some data out of Teradata. And I got on my desktop, and I got report scoring and I don’t care about the central team rake. And that was a lot of value. Actually, you know, it sounded like we used to play down but actually it was of huge value that these companies brought the yellows and the clicks with the departmental solutions. Simple and easy, very business user friendly.
Eric Dodds 04:35
That’s almost the complexes we see Salesforce right, like sort of the no software like, you know, the user can access this etc.
Vijay Ganesan 04:42
Well, it was still, you know, when they started it was all like Windows desktop installed and so on. So it wasn’t SaaS that yet but the thing was, it was easy. It was very easy for a departmental person who didn’t have to depend on ETL teams and bi doing any of that, right. They just All they needed to say give me just dump some data out of just the data I need out of your central system. And I can then do my own thing, right? So, that was a huge value. And so you know, the next generation of BI was bought by these folks, right? So that was the second way where you sold the problem was sort of these very complex centralized, very sophisticated systems are very highly scalable, very performant, highly sophisticated, you can do some incredibly complex analytics, but it takes months. And you know, and then you have this very easy to use very quick, get started, you know, simple, you know, visually very appealing and easy to use. There’s a second generation, right? And so when thoughtspot, when we started, we said, look, why? Why can’t you have the best of both worlds, right? Why can’t you have enterprise class systems that are centrally managed, but also make it very easy for business folks to get to the data and build analytics with it? And then it coupled with this idea that, hey, we use search and everything in life, why not for data, right? That’s when we hit upon this idea that every business user simply has a search bar like Google and they ask the question of centralized global data, and they get a report and they’re done. They, you know, they don’t have to go install anything on your desktop, or, you know, build a tableau departmental, it didn’t even now you’ve got the best of both worlds, you’ve got the enterprise class skill performance, and you’ve got your self service for business, folks. And so that’s really the third generation of BI that that we are sharing.
Eric Dodds 06:25
Yeah, fascinating. Yeah, that’s the premise, I spoke too early. That’s really the Salesforce, you know, when it became SaaS, and then truly the end user could access it. Now, when we were talking before the show, you made a statement that I thought was so interesting, you know, back in, I guess it was 2012, or around that time, when, you know, when you founded thoughtspot, which is a really interesting time, by the way, because like, you know, you have sort of data warehouse emerging, there are a lot of things sort of happening in that time that were nascent, but you described bi, as mature, you know, your BI was mature when we found it tosspot. And the more I thought about that, I thought, you know, maybe some people would maybe not disagree with that. But be surprised by that. Probably people who, you know, maybe weren’t doing analytics on a large scale back then. But can you describe that a little more like, what? Yeah, what does mature bi look like?
Vijay Ganesan 07:24
You know, so there’s, you know, this, when I say bi was mature, when we started, it was mature in the sense of the kinds of analytics that you could do in these systems was pretty mature. In other words, any kind of analytics, anybody wanted, you could do in these traditional systems, right? But it was just, it was very painful. It took weeks and months to do that, right. So, too, so. So what you can do through a tax form, or a looker, these types of tools, the next generation tools is something that is not something you could never do analytically, right? You could go and use business objects. And you know, what, if you had five people experts, in giving them too much, they will build it for you. Right? But so that’s what I mean, like, you know, that there is analytically it was mature. But the delivery mechanisms were primitive, it was too cumbersome, it was not effective in the sense that by the time you got this report done, and to your business, folks, it’s already too late, right? Because business is moving too fast. That’s what I meant by you know, there was maturity on that round, but on the usability and the democratization and the effectiveness for business. And I think it went down market too, right. With dark spots and the lookers of the world. All of a sudden, you didn’t have to be an Oracle customer in order to actually, you know, sort of deliver insights. A trick. Absolutely. Yep. super interesting. Okay. Well, let’s talk about product analytics, because NATS is a product analytics company, and can you just give us the, you know, the one minute explanation of what next spring is? So, next spring is next generation product analytics, right? So we are warehouse native product analytics, we’re the first warehouse native product analytics company that brings the analytical power of business intelligence to the world of product analytics, right? So you can think of us in a nutshell, for data folks, the way we try to describe it that really hits home is think of us as Apple dude plus Looker in a package working directly off snow. The imagery that that that the best describes us Yeah,
Eric Dodds 09:32
That’s great. You know, one of the reasons I love that is because in my past and trying to build these sorts of stacks, it’s like I’ve taken Looker and tried to turn it into amplitude and there are so many channels, it’s fully capable of doing that as a tool. It’s actually interesting. You mentioned you know, it’s like all the you can sort of build whatever you want. It’s not like anything’s off limits. But it’s also like, oh, man, it takes you know, weeks to build a cohort report and Looker that is basically out of the box and amplitude. And the same time I’ve tried to take amplitude and like, you know, do some crazy stuff with it, I would say it’s actually, the SAS systems tend to be much more inflexible when you’re trying to, you know, do more complex querying. So I’ve definitely felt that tension, which is interesting. And then ultimately, I think everyone probably ends up in the data warehouse, just because that’s, you know, where you end up being able to perform the types of queries with flexibility that you want. Is that sort of the dynamic that net spring is responding to?
Vijay Ganesan 10:33
Yeah, you said, I think so this is the world, people are in, right. So either you are in the world of product analytics vendors, you know, what we sort of call first generation product Analytics, you know, magnitude Mixpanel, great products, by the way. Either you’re in that world where they’re purpose built for Cortana, if you want to do retention, you are the code analysis they built for that right to give you easy to use UI, very nice UIs, you can quickly whip out a funnel and a chord analysis all that stuff pretty easily, right? So for that first level of analytics, they are actually great tools, right? But this, but when you have the next question, right now, that’s where the problem comes, you don’t have the power that you have in a BI tool to write arbitrarily complex queries and do the kinds of analytics. And so then you end up in this other world of PII. But then these tools are not really built for a time series and event oriented, costing and so on. Right. So, you know, I know, we were talking earlier classes, you were asking, what is the difference? What is the difference? The nature of this data? Right? Well, you know, so the way I describe it is, see, if you think of businesses, there is reporting an outcome. Like, for example, I want to report on how many orders I took today on my website, right, that’s reporting on the outcome. But when an order gets placed, there are a whole bunch of interactions that users go through before that final order gets committed, and you have a record in your data warehouse that says, XYZ purchased this much amount. When you log in, you know, you do third, showing that you’re adding something to a shopping cart, and you’re you know, there are lots of interactions, there’s lots of events that get captured that lead to that final state, right? The reporting of BI is about reporting on those final states, product analytics is about understanding patterns of behavior that will set up events that lead to that final state, you’re doing analysis on that. It’s that it’s that half of analytics, that is studying patterns of behavior that lead to outcomes. And the second half is really reporting on something that happened. Right. So to use our analogy, right, okay. How many widgets did I sell in North America? Last quarter? Okay, that’s a BI reporting tool. Right? Yeah. But to understand, you know, what, which cohorts of users are buying more from me? And why? That’s a product analytics question, right? That involves product instrumentation data that involves event streams, you know, so fundamental nature, the difference in, in the nature of the data. The second thing is, the representation of the data is very different, right? So, you know, these first generation product analytics tools, they are purpose built for representing that event data in a certain fashion that’s amenable for those specialized queries, like a cohort query, or a final Koreans want right? Now, those are very difficult to express in a relational model with a star schema type model that is typical for BI reporting tools, right, and that’s where the tension comes, right. And what we have done in that string is we’ve really brought those two worlds together, right? We call it the relational event streams technology, which is our model is fundamentally relational, but we’ve layered this event oriented concepts on top of a relational model. So we can work natively off data warehouses, and still get the specialized processing that you have in these event oriented systems. And that’s really the key technology breakthrough that enables sort of the best of both worlds. Right? One other aspect to this is, historically that data never came to the data world. Right? If you think about, you know, like, product, instrumentation streams, IoT, you know, your data coming, doing from your mobile phone, you know, those types of event oriented data, historically, never landed in a warehouse warehouse was sort of a small subset of mission critical business data that, but that is shift, which is, as a fundamental shift in, in thinking with Cloud Data Warehouses, people are putting pretty much anything and everything into the warehouse now, right? And today’s cloud data warehouses are amenable, that’s a huge shift. That’s happening, right? And then if the data is in there. By the way, all these tools today require you to ship the data out to the service that is actually going out of there, your systems into some black hole somewhere and that’s becoming a big problem these days. You know, nobody wants GDPR privacy, security and nobody wants data copy. He is going off into some black holes. Some people want control over their data and they want it in the warehouse.
Kostas Pardalis 15:05
Yeah. 100%? I have a question because you mentioned something very interesting here, the data warehouse, the OLAP movement in general, and like the Snowflake, the way that data has been, like traditional, the team structures and models, to drive BI is not good for working with product analytics, right? And with event data, can you tell us a little bit more about the technology that builds, like, let’s bring these two things together? Right, like, going from the very, let’s say, they are made representation of data that something like Mixpanel has, somewhere in between that, and what’s let’s say, like, a tabular representation that Snowflake has, like, how does this work? Because that sounds like something super, super interesting.
Vijay Ganesan 16:00
Yeah. Now, that’s the crux of the sort of underlying technology differentiation, right? So. So basically, the existing products typically tailor single table type models, right? So they are basically, you know, you bring in one event table, essentially, everything is stored. And they have very fixed data models, and there is a notion of a user, there’s a notion of a session, there is a notion of an event. And that’s it, right? That’s pretty much those are the concepts that you have in these data models, right. So, and this, okay, and is very good for, you know, like traditional sort of shopping cart type applications, which is where these products originated. So, in our world, what we said is, we’re not going to go with that single table model, right, we’re going to go with this generic model of any business entity represented as a table in your system, right, you could have a usual table, you could have a document, you could have a ticket, you know, you can start the journeys of anything, right, not just use it. But then some of these tables, if you imagine a table in Snowflake, some of these tables, through some annotations can become event streams, right? So you could have a table Snowflake that you could annotate in that spring to say, you know, this represents an event stream. And if you think about it, in event stream rally, you know, you have a timestamp column, you have an actor that is performing the event, and you have some kind of an event type brace, you know, there’s a click event or an Add to Cart event, right. And those are really the decorations that you need. So we started with an approach of saying, we take a generic relational data model, and we layer in this decK annotations on certain data models that are that that can behave like events, names in the system, right? And then the second thing is the joint ability, right? You know, how do you join an event stream, with a traditional static table. So we’ve got that ability to model those relationships as well. The second thing a layer above is God, if you think about the fundamental difference, at the crux of it, all these event oriented systems, treat time as a first class entity, right? In Snowflake, you know, time is just another dimension, like an account or time. So this time being a first class entity is very core to these types of systems. And that’s one of the differences, right? So, in some ways, you’re bringing some of these specialized concepts of time series databases to the world of relational data warehouse type systems. So that is one. And then the third layer is really an innate understanding of the concepts, like a flow and a funnel and a cohort and things like that. That is first class understanding of these entities that you don’t have in traditional analytical tools that go after data warehousing. So those are the three layers that enable us to do this. Then the other secret sauce that is really around the abstractions, you know, the end of the day, if you want to describe a call, or do you want to describe a funnel, SQL is not expressible for those kinds of things, right? It’s not suited for it. So we have language in the next group, which lends itself to a very succinct and elegant, express ability of these types of queries. But then under the covers, it compiles down to SQL. Yeah. And so that’s how we got to get the most well. And so what happens then is you describe, you know, your typical product analytics type analysis in a language that is very natural, right? You define stages and drop offs and churn and things like that. The way you describe it is very succinct. And then it compiles down to SQL that is optimized for different data warehouses, but that optimization of the SQL is also something that takes advantage of this understanding of the nature of these queries. Time as a first class entity, the way the data is partitioned and so on, right? So a lot of this data is, you know, there’s a sequence in the data that you can take advantage of when a user logs in when they did this, there is a sequence and that’s not going to change, it’s not gonna get updated.
Kostas Pardalis 20:07
We’re taking advantage of that in the cogeneration is also part of the the IDs, you mentioned time series database at some point. And that’s something that like I would like I wouldn’t like from the beginning like to ask you, so, okay, going from, let’s say the completely tailor made solutions for presenting the data to like more generic, let’s say, database system, why? Data Warehouse and notes a time series database? Because at the end events are a time series, right? Like, with the main difference that you have, like more dimensions than just the time one more than very soon? Right?
Vijay Ganesan 20:37
Great question. Are we this is, you know, we, when we started, this is exactly the question we asked ourselves, what is the, you know, kind of underlying system that we would, we would need time series databases are good for, for doing operational monitoring, right? If you look at systems like data, dog or signal effects, and you know, APM type systems, they are great for that, essentially, you know, they’re good for visualizing and rendering fast changing time series data, right? If you look at these monitoring tools, they’re essentially a lot of it is simply I want to see a temporal view of some metric. And I want to be able to compute this very fast incrementally, and be able to ingest at extremely high rates and so on, right? So, you know, they are really purpose built for those kinds of visualizations of time series data. But if you look at product analytics, yes, it is event oriented, time oriented data. But the kind of analytics you do is very sophisticated, right? It’s a you’re, you’re not simply looking at, okay, what is my temperature at this point in time, and how’s it trending, and you’re going, you’re studying, you know, the sequence and these very complex, like sophisticated behavioral patterns, right, which require, which, which require a lot of massaging of the time series data, in a say, in a very similar fashion to the kind of things that you do in a bi slide system on a data warehouse, right? So, it was closer to the computer patterns, the analytical patterns were closer to what you do in data warehousing, and BI to stone, then you do in a time series database in a monitoring type system. And that’s why we chose that.
Eric Dodds 22:15
Can I jump in and ask a question? Because sorry, this is so interesting. So one of the reasons, like I want to, I think one of the reasons is Okay, so if we take a typical, like SAS product analytics, will they use a time series database? And there are a number of advantages to that. But also, controlling the underlying data model allows them to create safeguards for their users. Right, so that, you know, you can reliably produce a Funnel Report or a cohort report? How do you manage those safeguards when you are a warehouse native, because the data can change underneath the tool, right? I mean, the data can change underneath the net spring, right, and a different table, you know, a table that represents some sort of important, you know, metric and a Funnel Report. And so, there’s certainly the advantages of modeling from a time series standpoint, but having a single model also creates safeguards for the user. So how do you manage that? And how do you think about that from a data modeling perspective? Because you don’t have as much control necessarily?
Vijay Ganesan 23:28
I mean, do you have a purpose built system with a purpose built data model that is, that does not even exposed to the user, or you don’t even see the underlying data model, just, you know, there is this out of the box, concept of the user and so on, there’s only you everything has to fit into the data model, and so on. So, you know, clearly those are easier from a management point of view, right? I mean, there’s nothing you can screw up, right, you can join the table incorrectly to some other table or something like that. Right. But then it’s got a lot of deficiencies, right, you know, there is, you know, there’s this data shipping off to these other systems, these are very constrained, like, you know, we were talking to some customer who was studying document journeys, and they were trying to squish this document entity into a user entity, I was like, you know, the lobby the artificial things, you know, high or difficult, like, you know, what hierarchy is, I want to I want to study behavioral patterns by account or you know, product hierarchy or category, all this totally self are very difficult to do with these rigid labels. Yes, they give you some ease in terms of, you know, the purpose of pulling them, you know, you can’t really mess up the day tomorrow, but then there’s such deficiencies in those models right. Now, in some ways, you know, you’re talking about a classic problem of, I got this very general purpose, you know, very sophisticated tool that can model anything, which comes with advantages, but then you can potentially shoot yourself in the foot because you pulled the wrong table out of the warehouse, you’re going the wrong way. Whereas in this purpose built you have no choice you know, they give you out of the box, some canned stuff and there that’s how All you do. So there’s definitely the trade offs. But what we’ve done is we’ve said, look, I think the value of going against the warehouse and being able to address anything that’s available in the warehouse is huge distance, then it’s I mean, the kinds of analytics you can do is phenomenally more sophisticated and business impacting it’s not just the siloed product. And it was metrics, not just product metrics, right? So the advantages far outweigh some of these challenges we have of hey, you know, what, if somebody goes and just pulls the wrong thing does the wrong thing, but the way we shall tackle that problem is two things. One, what you see the view into the warehouse, you get through NetSuite can be tightly controlled by the data engineering team, they can decide, you know, there is this notion of datasets, logical entities that you create that though, the only ones that you can expose to your business folks, right, you can also control that this group gets access to this, but not this other thing. So you’re not exposing the entire world, to folks, you’re exposing what they need. And there is some notion of an application that as you can then say, for this application for this group, for the set of users there, this is what is relevant, and they can expose that, that has huge advantages, because the central team that is responsible for the warehouse on the day model, they have control, because end of the day from a governance security point of view, they’re accountable. So so it gives them that that control, but then you can self service for the business folks on the business side, what we’re saying is, we have this templates, so the same kinds of templates that you have, in these products, analytics tools, we have that so you launch this template to create a covered, it’s a Vizard point and click type interface, where you’re not, you know, writing sequel, you’re not doing joy, you’re just like, filling the blanks, and boom, you get a report. So that’s another way. We have these guardrails for, for at least some of the non sufficient your users, the basic users who can move don’t get, you know, tripped up by, you know, having to work with a data warehouse. So, I think it’s possible to get the, you know, to be warehouse native to get the power of what the warehouse offers. But with these controls in place, I think it’s possible to, to provide the best of both worlds.
Eric Dodds 27:08
Yeah, I mean, I mean, that’s kind of the dream, right? Like, you wouldn’t export, you know, SaaS product analytics data into your warehouse if there weren’t an issue with, you know, trying to query the data. So, yeah, super interesting. Sorry, it cost us. I had to ask just because I’ve, you know, tried to build product analytics on the warehouse, and that’s, you know,
Kostas Pardalis 27:28
No worries. Yeah, absolutely.
Vijay Ganesan 27:33
Well, one thing allowed, you know, the short of what Eric was saying, you know, around, you know, you described, you know, the difficulty of using doing product analytics, BI tools and bi brought in, you know, no, you’re absolutely right, what people end up doing is, they export the data out into a warehouse, they’re writing, Looker, and more than writing SQL basically. Right. And it’s extremely painful. One, we solved the problem. But there’s also another, the other problem we solve, which is, it’s not just, it’s this interoperability, and it’s the seamlessness of this analytics, right? So you want to be able to jump back and forth between these two worlds, right? You start off with your studying cohort of users that exhibited a certain behavior right? Off, right? Do you want to take em drop off and you fork off into this more bi style analytics that brings in account information support information. But then, when you’re done with that, you want to bring that back into this funnel analysis and further continuous analysis. It’s that seamlessness of the analytics that goes between these two worlds, and that’s, that’s always been a problem, because you exported it out of amplitude, you ran your local report, two weeks later the business guy got a report.
Kostas Pardalis 28:45
Okay, what do I do with it? How do I upload this back into my product analytics tool and continue my analysis, right? So let’s add to the problem that we have today. Instead, if you have one tool against the warehouse, you’ve got everything in one place, you can go back and forth between these two flavors of analytics all in context. I want to ask you something about SQL. And you mentioned that GIS Sidwell, is, let’s say, not exactly like, the best syntax out there to ask this question. So the data warehouse doesn’t mean that it cannot be done. But it’s hard for a user like to work with this index. You also mentioned that you have introduced a new language that’s called net script. Can you elaborate a little bit more and also give us an example or two about, like, what makes it so hard when we are talking about product analytics, like to use a language like SQL? To do it?
Vijay Ganesan 29:39
I mean, you know, SQL obviously is, you know, it’s a great language, you know, the lingua franca of data, right? I mean, it’s, like you said, we can read SQL at the end of the day right now about the Express ability of things above SQL, that’s really what we’re bringing to the table right one layer above, above SQL injection. So the crux of expressing SQL product analytics queries in SQL is really around the nature of this type of analysis that you’re doing. Right. So if you think about, if you have an event table, and you’re studying pieces of patterns, right, that requires a lot of, you know, and I’ll give you a simple, it’s a bit simplistic, perhaps is built on self referential type things, right? That, you know, you first have to get all users who did this particular event, right, that’s another table and on the table, then you want to be able to do the next level of things. The product analytics queries in the world, in the sequel world are awesome. Like, you take your table, you write a snippet of SQL to get a subset of the data, then you take that and you write another SQL that takes another subset of the data, please sort of layer above and you’re training all these things together, right? And that makes it very difficult, right? If you look at these kinds of SQL that you generate for these funnels, and paths, and so on, you will see the layers and layers of of SQL, because the, the, the results of a particular stage of your analysis is a function of all the previous stages, which is not the case in BI type queries, right? You need, you’re just reporting on that final, final step. So that’s the thing, right? There is this intermediate computations that depend on previous computations that depend on these previous competitions, and so on, there’s this chaining of computations, that are very difficult. And before you know it, you have 10 pages of SQL, right? So that is the Express ability aspect of it. The second is, and this is true in general, not just for product analytics queries, is a composability and reusability of SQL. So I write a big chunk of SQL, and I give it to you and you want to change, you know, I’m filtering for West region, you’re gonna filter for East region, and I want to then I’m looking at it by product, and you want to then break it down by sales rep, or whatever, right? The composability. And reusability of SQL is very difficult, because you have to go and do surgery within that SQL, right? You know, there’s some WHERE clause and you have to insert your thing and so on, right? What is there was a higher level way of doing it, right. And I gave you a chunk, chunk of SQL. And you said, you took that and you said, hey, I want to extend this and say, Now I want to break it down by this other dimension. Right? So that is something that this new language brings when you can extend it right, it’s composable. It’s sort of like Lego blocks that you can build on top of each other. And the system knows how to then do surgery on the underlying SQL to produce that final secret. Yes. So Express ability, composability reusability. And those are the things that SQL falls short in this world of authority.
Kostas Pardalis 32:44
Yeah, it makes a lot of sense. Going back to something else that you mentioned, like at the beginning of our conversation, people started using these protocol ethics tools. And it was great, as you said, like, it’s so easy, like through a visual interface to go and create, like cohorts and all that stuff. But then there is a point where like, they wanted to do something more like it wasn’t expressive enough to do that. So then the reason I’m bringing this back is because I want to ask you why the user interface like this graphical language is not enough. And do we need a sequel or do we need net scripts? Or what else, are there like to complement what someone comes to for product analytics, like complete user interface?
Vijay Ganesan 33:31
Yeah, so the way we like to describe this is what you can do today in traditional tools is answering the first question. And if you think about the primary value a lot of these tools brought to the table is really what are people doing with my product? Right? That’s the first question. Every product manager wants to know, right? And I released a, well, we’re not even doing a nice new feature, how many people are using it right now? Why is it being? So that first level of first answering the first level of questions? It’s actually quite good. And we’ve replicated the same kind of easy to use template with the first level question where it falls short is the follow up question. Right. You know, okay, you told me that this is my conversion rate, right.
Vijay Ganesan 34:11
Why is it that, that this conversion rate dropped, you know, between 9am and 4pm? Yesterday, what happened? What, and then, what’s that? Why, or what are the patterns? Right? Are there certain patterns, right, are certain types of customers converting? So it’s the next level of question. And the next level of question is a free form ad hoc interface that you need for expressing that right, you can’t you know, that you can’t build templates for every possible next question, right, you can build templates for that first level of questions. The next level of questions is very ad hoc , people do some things that, oh, you know, maybe this has something to do with you know, this campaign that we ran last week, there must have been the right thing, I want to bring in from the campaign. So answering the next level of Question is where a lot of these tools fall short. And the virtual for two reasons. One is, you don’t have an interface where you can do these ad hoc exploratory analysis, forking off from your templated analysis, right? Imagine you’re an opportunity to work afternoon Docker type CoreOS. So that isn’t a non-existent tool. The second thing is, oftentimes, the next question involves context, that is not in the product instrumentation string, you know, this is data from Salesforce, this is the port systems, you know, this is other systems, you know, that have nothing to do with, with these product analytics data sources, right. So that’s the second thing, right? You need richer context. That nominal existed in these tools. And that is the warehouse. And so that’s where the second level of you knowing answering the next question problem comes in, right? To incorporate that business context, you need to be able to already have modeling capability that can reflect your Salesforce schema, your Zendesk schema and the other schema is right and be able to mix it with this product. Strange.
Yeah, 100%. No, it
Kostas Pardalis 36:02
makes total sense. Okay. And one last question for me. Give the microphone back to Eric. One of the things that there’s a lot of like conversation about is pricing of data related infrastructure, right. Like, there has been a lot of like conversations about like the consumption based pricing, like the innovation of a carpenter like we Snowflake, but there’s something, let’s say this some kind of convenience with this previous generation of tools, right? Like, I knew that if I went like a youth like Mixpanel, regardless of like, how, let’s say, Kobe gate, it might be like the pricing model they have at the end, when I use the product, I know exactly, or almost exactly what I’m going to be charged for, right? When we started, like putting layers on top of like, other infrastructure, like we have Snowflake, and then we put like, next spring on top of it, right, we start having, let’s say, we started like using, and getting priced and charge like for different things and communicating this pricing to the to the customer at the user of next printer, which case, it’s probably not the most easy thing to do. Right. And I asked you that, like, as a founder, as someone who’s building a business now and not the product itself? How do you deal with that?
Vijay Ganesan 37:29
Yes, great question. You know, if you know, so, the pricing model does get a little more involved in this composable CDP where we’re talking about right where previously, I could go to a product analytics vendor, I get instrumentation, I get product analytics, I get a Compute Engine, I get storage, I get everything, right, all in one package, right. And I have to deal with one Mandor. Now, with net spring, and this new world we’re living in, I have to deal with Snowflake, I have to deal with RudderStack, I have to deal with net spring, I have to deal with three, three vendors, and the end of the day, I get best of breed and all of these, when I get the best instrumentation, I get a flexible data model, I get all the business benefits, I get next generation product analytics. But I have to, I have three contracts that I have to work with three vendors to put together a solution. So you know, in some ways, it’s sort of like your classic, you know, do you go with the best of breed? Or do you go with sort of a single vendor that can give you everything right? You know, there’s some pros and cons to that. But there is another dimension to this around pricing, which actually is one of the big reasons why people are attracted to that screen, right? So if you look at the way the pricing is done for tools today that are very event oriented, right, they price based on events, like, you know, so many events, clicks per month, how many events per month, so these things become prohibitively expensive at scale. Right? When you’re talking about, you know, you know, if you’re talking about like, you know, you know, like, take a zoom, for example, we’re on Zoom, right? Think of the number of events zoom generates in a single day, right? It’s hundreds of billions of events a day, right, so, so, so, now, this is, of course, extreme scale, but at large scale, people cannot afford to be paying by, you know, 1000s of events, millions of events where it just gets prohibitively expensive. But the reason this is an even bigger problem is, most of the data, you will never do any analysis on it with a lot of these tools. I’m paying by event, but 60-70% of the data and nobody’s using it, but I’m still paying. And so whereas in this new world, you know, that you can put a lot of data into your Snowflake storage cost is relatively cheap, right? You can dump, you know, petabytes of data, and you can you know, but you only pay for the quote data that you query gets, and only if you query so, if 70% of your data is nobody is touching, you’re only paying for S3 costs, which is much much smaller than what you’re paying Are these other vendors now where every event whether they use it or not, you’re paying a lot of money. Whereas in this new world, there are a lot of pricing advantages in that area. Yes, it’s complicated in terms of having to deal with multiple vendors. But at the end of the day, our belief is, you could pay an order of magnitude less than a one of these prepackaged vendors,
Kostas Pardalis 40:19
That’s great. And I’m happy that you, like, serve these because I think I mean, people out there really confused, right. And before, like, it’s very easy also to end up in situations where we give like pizza at the end. And the complexity is like, much higher when you have like to deal with many vendors, but it is important like to hear that, you know, if you’re getting a if you’re a small company and just getting started, you know, you’ve got very low volume, you don’t have data teams, IT teams, you don’t have a warehouse, or you should go in with a package solution.
Vijay Ganesan 40:43
That’s the right thing, you know, for you, although these days, spinning up a warehouse is very simple. I mean, you know, getting RudderStack working is really simple. We got it working in like a day, right. So these things are not as difficult even for small startups, we’re seeing people, you know, like warehouses typically appearing in companies much later down the line they are appearing now. And like, so early Reagan’s just so easy to extend it out. Yeah.
Kostas Pardalis 41:19
100%, I think, like, my opinion is that what has happened is like, the technology has withstood like that fast, but it’s literally so easy to go and spin up like all these tools. But it’s like, what is missing is probably the maturity from the industry to use effectively, which is a lot of education, I think that like needs to happen. And that’s where like, in many cases, you know, like people are like getting burned up at the end because like, okay, yeah, sure, like, let’s get like Snowflake, it’s very easy. Like to set it up in some way they have pretty much no idea what they’re going to do with it. And going through like fast iterations and making mistakes. Yeah, these things cost. And when you don’t get like, value at the end. It’s like a bit of a bitter taste at the end. Right. Right. But I definitely think that it has a lot to do with education at the end, and like how people know what to do, actually, and what questions like to seek answers for anyway. So Eric, all yours. I really monopolized this conversation.
Eric Dodds 42:27
It’s great. I’ve learned so much. This is a, I’m so interested in the term product analytics after this conversation, because, you know, we think about product analytics. On one hand, as you described it, it’s really more event based data. You know, it’s understanding interactions with a customer that lead to certain outcomes over time, but you get into the world of combining to your point, other data sets, right. So you know, you can bring in Salesforce, you can bring in, you know, ad platform performance data. And of course, as a marketer, one thing that I think about that I’ve actually heard lots of data teams, you know, discuss is like, something that’s particularly challenging is attribution. Right? It’s really difficult to build a good attribution model that reflects what’s actually happening in your business, right. And you sort of have two extremes, which we’ve talked about before, either you sort of use the Google Analytics, or you know, the amplitude, like, here’s your default model or set of options, or you work with your analysts to build it. And anyone who’s tried to do that, which if you haven’t, you know, fear warning, it’s pretty, it’s pretty brutal to build a multi touch attribution model using brute force house. But what’s interesting is with this in between having a lot of that sort of, let’s say, outsource for you a net spring, and you have access to the data, that actually becomes pretty interesting. But then, like, what does that mean for the term product analytics, because now you’re getting into a world where you can do a lot of interesting things. And, you know, that’s kind of me, maybe people would classify that under Product analytics that you’re talking to, like a pretty wide variety of users at that point.
Vijay Ganesan 44:17
Yeah, that’s a great point, I think, you know, the, we use the term product analytics, because it’s a familiar term that everybody understands, at least you’re sort of zeroing in on a third category that you but yeah, you know, it’s, you know, it’s much broader, right? Like it, like the marketing thing. You describe a couple customers that are basically top of the funnel, right? You know, you’re really it’s not it’s even before people get to using your product, great product, and when it’s typically after your people started using your product, this is top of the funnel and what are making what campaigns are working, you know, this is what’s driving you know, acquisition, you know, conversion and, you know, activation and things like that, right? This is even before people start engaging with the product and so, so in some senses Do the concept, Felicia, the same type of analysis, it happens to be in a funnel and happens to be the top versus Docker, the users are saying. But yeah, so, you know, so if you think of it as a category, you know, product Analytics does not do justice or anything. It’s a bit narrow. I mean, you know, the people have toyed with different terms. You know, I think some vendors are confidential experience platforms. It’s actually quite a confusing word. You know, there are terms like digital experience, customer experience, there is product analytics, and behavioral analytics, there’s lots of terms this industry still hasn’t converged on in a little bit of time that truly reflects the things that we’re talking about. But yeah, you know, product analytics is the sort of the most widely understood, we call ourselves, product and behavioral analytics and the distinction we make, it’s not so much a product usage, it could be like, you know, marketing that drives to product adoption, and so on. But the distinction we make is product analytics is really around measuring outcomes, behavior analytics is understanding patterns of behavior that lead to outcome. Right. So that’s really how we distinguish it. But yeah, I think, you know, there’s, you know, there is a category that needs to be invented here that can really reflect the end to end journey, everything from the acquisition of the customer all the way to engagement, and upsell and retention.
Eric Dodds 46:24
Yeah, yeah, it’s tough. I mean, the way that the practitioners describe it is like, Ooh, I like a Funnel Report. But then I want to, like slice it, or like, do a pivot on a piece of data that I get from, you know, a completely different system, right. I mean, the practical reality is like, I have a valuable report. But if I can pivot or slice on like, you know, a piece of data that’s like a hierarchy above that expresses, like, you know, the container of the customer journey that I’m interested in, or like the subset of the customers like, because at that point like, that, really, is truly the blend, in my opinion of BI and product analytics, because now you’re looking at what it costs you to acquire a customer in the context of the customer journey, right? And that’s those two things fully coming together.
Vijay Ganesan 47:12
Exactly. And really, they think about it, the crux of the problem is that data is also being in different systems. Marketing Systems are completely different from the product analytics systems, different from your AI systems. And so this is where I think everything coming into the warehouse is really the core of it. At the end of the day, you have to bring the data into one place, right? You cannot have it living in 50 Different SaaS services, then expect to do analytics across it, right? You have to bring it to the data warehouse, you have to have, you know, curation of the data, it has to be modeled correctly, it has to be clean. And then, you know, that’s when the social warehouse centricity is an enabler for production. Yeah. Yeah.
Eric Dodds 47:57
Love it. Well, it’s super exciting. Okay, time for one more question here. So we talked about sort of the Three Waves of BI, we talked about the first wave of product analytics, right, which, you know, again, like, these are great tools. And I think probably a lot of companies have said tosspot, you use Mixpanel. Right, as an analytics company. And so I think many companies will end up adopting sort of a multi pronged, like, analytics approach, you know, even on a team level. So net spring is the second wave, right? Where we’re sort of building this on top of the central data store that has access to all this data, right. And so we’re going to close the gap between product analytics and bi. But you, at least from our conversation, before the show, you think about analytics and sort of waves of decades. So what’s the third wave in the product? That’s, I’m so interested, I actually have been waiting. This question. Yeah, no.
Vijay Ganesan 48:52
So just to be clear, they can reuse it, you know, they could describe it. It’s, you know, much richer product analytics. So that that goes beyond traditional product analytics, silo products are more enterprise wide. It’s for larger, not just for your product managers, it’s for your market to your customer ship says it’s a much richer analytics cross cutting across any team that has anything to do with product and customer, right. So it’s a much broader thing. So the warehouse centricity is more mechanical, it’s an enabler, aligned with the order of datasnap, and so on.
Vijay Ganesan 49:31
So the third way, is really around, AI and system generated insight, right? So today, even with a product like net spraying, you can build a very fancy code, you can do some pretty sophisticated analysis, you know, you know, slice and dice, and so on.
Vijay Ganesan 49:48
There are so so if I have a hypothesis, I can test the hypothesis out really well. Right. So I have a hypothesis that, you know, people in the education sector tend to use whiteboards more in Zoom meetings. Is there another way I can go test out that hypothesis? Right? All the data is available, the modeling, and you know, everything is easy to use point and click and boom, in five minutes, I’ve got the answer, right, I can test out the hypothesis. What if I didn’t know? I don’t know, I didn’t know about that hypothesis. What if the system could tell me hey, you know, listen, you know, you’re a whiteboard, pm right? For zoom, you’re, you know, hey, what if I magically told you that you should be looking at folks in the education sector, that’s really your profitable customer base, or this particular aspect of your product? Right. So, the sort of the system generated, machine learning driven insights, which has been talked about for a long time, right, but it’s never, you know, worked really well. I think that’s the next wave that the third generation, and there’s two things there, I think, which makes it possible, which is a very, it’s a pretty tough problem. We’ve tried it at that part, you know, it’s, we have had some success, but many people have tried, and it’s an extremely difficult problem to crack. But I think there’s two things that are happening, right? If you look at the data warehouse ecosystem, you know, if you look at BigQuery, right, BigQuery ML is pretty sophisticated, it’s integrated with a warehouse, where you can use it natively in the warehouse, right? And pretty sophisticated amongst the offerings. So you have that available for you to use, that you can take advantage of these are fairly sophisticated algorithms. The second thing is, you know, you see, like what, you know, all that you see Chartio GPT-3, right. I mean, the kinds of, you know, advancements that have happened in that, in that world are final. And so the third wave is really, you know, those things, getting to a level of sophistication, maturity, where they can actually do more system generated insights. That’s the third way. Yeah, yeah, I
Eric Dodds 51:39
I agree. That’s, man, I’ve used so many of those sorts of AI types, you know, analytics, insights, features over the years, and you just ended up going back to SQL, let’s be. Yeah, I agree. That’s actually interesting. Like, you talked about the separation of like instrumentation and ingestion from, you know, sort of the actual, like analytics layer and sort of the decoupling of these certain things. And it’s fascinating to think about, you know, AI generated Insights is infrastructure on top of analytics, right. But if you actually break the ML infrastructure out from both the data and the analytics, like from an infrastructure perspective, it starts to get interesting, because then, you know, BigQuery, ML does all the heavy lifting, and you need to feed it, you know, data and context. And then you have the visualization.
Vijay Ganesan 52:38
Absolutely, yeah, absolutely. Right. And we’re very big believers in it, logically, we think that’s what enterprise should be moving towards, right? I use a RudderStack, or Snowplow or a segment. And these are best in breed, and they’re purpose built for instrumentation. That’s what they do, right. And they really do it really well, right with schema management. And, and it’s amazing when you can put any tool on top of it, right, so you’ve got best of breed product analytics. And then one of the things we’re toying with is really great at writing output of the analysis back into the data warehouse, let’s say you, you, you’ve tested your hypothesis, you’ve constructed a sophisticated cohort of users that you want to do something about, right, you can write it back into the data warehouse, simply as a logical database view doesn’t even have to be physical table, right? It can be a view into that data that’s already in a warehouse. And then some other tool that is doing machine learning or doing data activation can simply address that view, and do some more sophisticated things. So it’s sort of like your warehouse, you can plug in all these very specialized best of breed systems on top of it, and you can get some phenomenal value, right?
Eric Dodds 53:45
Wonderful, where we are at the buzzer. As I like to say, maybe we went a little long, but Brexit isn’t here. So we’ll have to ask for forgiveness. Vijay, this has been absolutely wonderful. I have learned so much. And next spring seems like a super exciting company. So best of luck as you continue to build it. Thank you guys. So thanks for having us. Appreciate it, enjoyed speaking,
Eric Dodds 54:11
Kostas, I think probably the most helpful thing for me was thinking about these tool sets in ways that were just really helpful. And I said towards the beginning that it was surprising that he described the world of BI business intelligence as mature when they started thoughtspot in 2012, right? Because he referenced things like business objects, right. And you know, if you’ve ever worked at a company or with a company who is using business objects just feel so antiquated compared to, you know, sort of a modern product analytics tool. But, you know, I’d love the perspective of history, right? And that what he said was true, like, there was no analysis that was impossible. Once, you know sort of DOT level of BI, came to fruition and you know, with, you know, sort of Oracle’s BI solution, you could do whatever you want, it was just sort of a time and cost and difficulty. Question. So thinking through those phases would be helpful. And then yeah, at the same very product analytics. So we’ll see how much AI takes over the world of, you know, whatever features are built into these data lakes in the future? Yeah, 100%,
Kostas Pardalis 55:29
I think we are, we are going to see another, at least another wave of like BI analytics, like analytics, BI tools, that’s okay. They’re not like, primarily driven by AI. Before that happens, there’s still like, I think, and we see that we have, like, companies here that we recorded episodes with them, but they’re still like, okay, providing like new ways to visualize and interact with the data. I mean, I think it’s time to see more innovation in the space and they don’t like things that have started changing like we see, like what is happening with Tableau, for example? Right. And so we’ll see, I’m very, very curious to see what other companies will appear in the next couple of months around the more traditional side of data analytics with this bi endorser, I think we will start probably seeing like more, you know, like, specialized analytical tools like for product analytics, for example, right. But a new iteration of these that it’s going to be leveraging, let’s say, the new data infrastructure out there with a cloud data warehouse, or are the data lakes, the lake houses, blah, blah, blah, and all that stuff? Right.
Kostas Pardalis 56:42
So we’ll say I think that we’re going to show more tools like this. That’s why it was very interesting. To start to date. It’s, I think it’s a glimpse of what we’ll see in the future.
Eric Dodds 56:58
All right. Well, thank you for joining us. Subscribe if you haven’t, tell a friend and we will catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at email@example.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.