Episode 65:

Operationalizing Data from the Warehouse With Aayush Jain of Cliff.ai

December 8, 2021

​​This week on The Data Stack Show, Eric and Kostas talked with Aayush Jain CEO and CTO (Chief Twitter Officer) at Cliff.ai. Cliff.ai is the latest of Aayush’s startup ventures that is focused on “business observability”. During the episode, Aayush discusses what they’re building at Cliff, how community building has impacted the company, and how his academic background formed his current work.

Notes:

Share on twitter
Share on linkedin

Highlights from this week’s conversation include:

 

  • Aayush’s career background (4:13)
  • How his biological sciences academic training impacts his work (8:04)
  • How do we allow dashboards to get messy? (9:35)
  • Building cultural or technical solutions to effective dashboards (15:19)
  • Using data dashboards to make material business improvements (23:19)
  • What is business observability? (32:23)
  • Building a platform for operations teams (43:15)
  • How important community is to the cliff.ai business proposition (41:03)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Automated Transcription – May contain errors

Eric Dodds 00:06
Welcome to the data stack show. Each week we explore the world of data by talking to the people shaping its future, you’ll learn about new data technology and trends and how data teams and processes are run a top companies, the new stack show is brought to you by Rutter stack, the CDP for developers you can learn more at Rutter stack.com. We have a really exciting episode coming up. And what’s most exciting is we’re gonna livestream it, the topic is the modern data stack. And we’re gonna talk about what that means. It’s December 15. And you’ll want to register for the live stream now cost us it’s really exciting because we have some amazing leaders from some amazing companies. So tell us who’s going to be there.

Kostas Pardalis 00:44
Yeah, amazing leaders, and also an amazing topic. I think we have mentioned the Moodle data stack. So many times on the show, I think it’s time to get all the different vendors who have contributed in creating this new category of products. And they define the modern data stack and discuss about what makes it so special. So we are going to have people like Databricks, DBT, and five Tron companies that they are implementing state of the art technologies around their data stack like hinge. And we’re also going to have VCs and see what’s their own opinion about modern data stack. So well, since VC is going also to be there. And yeah, it’s going to be super exciting and super interesting. So we invite everyone to our first live streaming.

Eric Dodds 01:32
Yeah, we’re super excited. The date is December 15. It’s going to be at 4pm. Eastern Time, and you can register at Rutter stack.com/live. So that’s just Rutter stack, comm slash live, and we’ll send you a link to watch the live stream. We can’t wait to see you there. Welcome back to the datasets Show. Today, we’re going to talk with Iuse Jain and he is one of the founders of a company called cliff.ai. He’s actually sort of a serial entrepreneur has done a couple of startups and this one’s focused on what he calls a business observability platform. So I think it’ll be a really interesting conversation. I don’t know if we’ve come across that term, yet on the show have weaknesses?

Kostas Pardalis 02:18
No, I don’t think so. I think it’s going to be a quite interesting conversation, especially talking about, like, how to operationalize data, and how the data warehouse again is like an very important component on how we extract the value of data from the data. So yeah, and also he they he also leads community. So I think those are going to be using here. Yeah. What’s the role of the community? Like how to triple A stuff?

Eric Dodds 02:47
Absolutely. And I think my question, and I’m gonna go, I’m gonna go way back here. So I use, you know, in looking at our guests backgrounds, I’m always interested when someone’s building something in the data space, but they don’t come from a technical background. And I usually actually comes from a scientific background. And I always find that connection really interesting. And I haven’t asked that question in a long time. So I’m way overdue. So my question is going to be how his academic background has influenced, you know, his work in data, which I know you’re going to enjoy, because you haven’t gotten that.

Kostas Pardalis 03:22
You’re going to force me to be philosophical again, and claim sciences data, but

Eric Dodds 03:31
well said, we can just wrap the show up. Perfect. Yeah, let’s

Kostas Pardalis 03:35
go and chat with you.

Eric Dodds 03:36
Let’s do it. I use Welcome to the datasets show. It’s, it’s great to have you here.

Aayush Jain 03:42
Thank you very much for having me. All right.

Eric Dodds 03:44
Well, tell us about yourself, what’s your background? And what are you working on in your day job.

Aayush Jain 03:50
Great. So just just to give you a little bit background about myself, I’m one of the co founders of cliff.ai. And what clarify does is basically it’s a business observability platform, we help companies track and monitor their metrics without the need of requiring a dashboard. And in terms of my background, I come from a very kind of different background. So I have my master’s in biological sciences. And where we started, we started off we are a team of three co founders and we started our company. We started our first business together as soon as we got out of college so we like we used to live next to each other in dorm rooms for good five years of college. And just after that we thought guys we got to do something together. Doing doing a job was kind of uncool for us back then but so we kind of took a plunge into starting our own business without even without even without even knowing how to do a business how to run a business. So we started an online pharmacy in India. So this is way back in like 2016 where the internet the especially the online commerce was booming in India and that the question that we ask ourself is in India at that point, you could get literally anything delivered to your doorstep by just ordering something on an app, the only thing that you would have to still go to, you know, to a shop and to go to a physical store and buy was medicines, we thought, why don’t we kind of build an application that would allow people to just put their prescription and get their medicines delivered at their doorstep. So that’s the first business that we started together. We ran it for around eight months, a big company came in and said, We want to buy in the whole business. And we said, you know, Fine, let’s do that. So we sold that business. Yeah, so we sold that business off. And then after that, kind of took that cash in and then said, you know, what do do next and for a purse was for a short period of time, we started exploring a few ideas that we could work on. And then eventually, we, you know, started our b2b SaaS business that’s called greenback and what basically green deck was, it was a online, you know, price optimization engine for online retailers, we would help online fashion companies optimizer pricing using AI. And one of the core component of the green tech platform was dealing with huge amount of data, what so the way we would help our customers make their pricing better is by providing them a huge amount of data in terms of how their products are priced with respect to their competitors. And that’s when that was my, our first experience with dealing with data, the huge amount of data. So we would crawl hundreds and 1000s of websites on a daily basis, and collect the products and pricing information and help our customers make decision. And that’s where we, we had a first kind of encounter with huge amount of data. And then eventually, sometime last year, around March last year, we kind of pivoted from green deck to club.ai. And this pivot was basically, from the primary the thing that made us kind of excited about what we are doing here in clave.ai is V, we’re dealing with a huge amount of dashboards, huge amount of dashboards, to be able to monitor and monitor and track various processes that are happening on greenback. And it all started with one simple anomaly detection script, we kind of wrote a script internally saying, Hey, we have tons of metrics that we want to monitor as a part of the green deck platform. And the the core problem was, no, there were so many dashboards and we had, we had a small team that we couldn’t even look into all of those dashboards. So we kind of created an anomaly detection script that would just monitor those metrics, and send a Slack notification whenever there is any anomaly that metric. And that worked magic for us. And that’s when we kind of got excited this whole problem about dashboards and how various businesses deal with dashboards. And eventually, we pivoted into club.ai.

Eric Dodds 07:46
Very cool. Well, I want to talk about dashboards. But first, yeah, I’d love to know we love talking with guests who just have different diverse backgrounds. So yeah, biological, biological sciences. I’d love to know what lessons have you taken from that sort of academic background into being an entrepreneur and working with data?

Aayush Jain 08:13
I think the one of the one of the core important things that I learned coming from a science background for us this whole the whole in terms of my understanding of the things, I think the what I’ve learned the most is asking questions, fundamentally, I think so a lot of times when when when we when we look at things from a business perspective, we often kind of forget, or often kind of stop asking the fundamental questions and science works on fundaments? I think so that’s was my biggest thing that I kind of learned from the science background is asking fundamental questions. And starting to think from the first principles. I think that that is what is the biggest takeaway from from for me from files background?

Eric Dodds 08:59
Yeah, I love it. I love it. That’s I’m sure. I mean, in science, you sort of start out with trying to remove sort of influence from whatever you’re trying to study. And it’s hard to do that in business. So that’s a really good lesson. And actually, for anyone working with data, in fact, like that’s a great, that’s a great reminder in general. So I love hearing that. Well, let’s so dashboards, you had all these dashboards small team, you’re trying to understand them. What I’d love to know because you’re building a company that helps people sort of avoid this or solve the problem of messy dashboards. But I’m interested in your perspective on how dashboards even get messy and casus would love to know as well, from your perspective, because you’ve built companies and dealt with dashboards as well. But you would think I mean, business is very a lot of sort of basic business models and metrics and stuff. There’s a lot of commonality across businesses, right. I mean, you have website traffic, you have some levels. conversion, some of its qualified and then someone pays you some money. But everyone, I think listening has experienced messy dashboards. So how do they get messy?

Aayush Jain 10:11
So before before I answer that question in terms of why and how dashboard gets messy, I think so let me take a step back and kind of kind of paint a picture in terms of how we got here. And why we got here. So basically, if you look at what’s happening in the data space, in the past few years, is an in undeniable rise of the modern data stack. And people have different definitions around what constitutes as a modern data stack. And still very kind of some people would argue that it’s more of a marketing jargon, jargon than something that is actually substantial. But nonetheless, the undeniable trend is that the data warehouse has been becoming a heart of the modern data space, more modern data stack itself. And there has been an incredible amount of progress that has been made in terms of allowing businesses to be able to bring their mission critical data into a data warehouse, you have a rise of ETL tools, you have a rise of data quality tools, data quality monitoring tools. And what that means is that now the business have way more amount of data that they would have a few years back, because the barrier to put the data into a data warehouse has significantly reduced. And what what that means what that means as a business is they have a huge amount of data. And the up until a couple of couple of years back, the only mode of consumption of this data had been warehouse, there has been a tremendous progress made to bring the data into warehouse. But the consumption of that data has been typically or even still today, in the 90% cases, it’s consuming of that data into our dashboard. Now, the the thing that has changed in past few years is that dashboards, the fundamental limitation of a dashboard is that a human consumes a dashboard. So basically, the the the underlying assumption of a dashboard is that there would be a human being that would be visually looking at a dashboard and making decisions. Now this thing has changed fundamentally, in the few years, the way it has changed. Is that the before before I answer the question, why dashboard gets messy. The the thing that has been happening is that dashboards since the fundamental rely on a human being to consume them, the thing that has happened with the with the growth of this data, the scale at which people can consume this information of that data consume that data that has drastically reduced. Now, what that means is basically, as a business, as you mentioned, you have tons of KPIs that can be visitors and then followed by users and then customers, so on so forth. And right now what happens is, most of the times you people put their matrix into a dashboard expecting a human being to monitor them on a continuous basis. And because of that, what happens is, there are two things that happens either the since mode of consumption of data for people is dashboards, whenever people have questions, someone would want to know why something broke, or why something changed in the business, the inherent bias is to go to the data team and say, hey, I want a dashboard. And what that leads to be is sometimes answering these data question, let’s do a creation of a dashboard. And these dashboards are in, in a lot of cases, what happens is these dashboards kind of used for a couple of times to get a couple of answer and then no one even looks at that. Now what what’s happening because of that is this creation of this whole dashboard rod, where every organization’s have hundreds and 1000s of dashboards, that are probably only a fraction of them are actually used on a day on day basis. So that that that is one thing that is what what what what in my experience, and in my learnings with working with the various data teams is that the data and engineering team has kind of an ownership of maintaining so many dashboards that are rarely used. So that is one of the reasons what what I feel is that there is we got into this messy situations where we have hundreds of dashboards, which are rarely used, and even the ones that are used, they still rely on human beings to be able to take those decisions. So that is what my understanding is in terms of how do we got into this messiness of dashboards. Yeah.

Kostas Pardalis 14:38
So I use Yeah, I wanted to ask you, I mean, you described like the problem like, predict well, and I think that anyone who has to work with data, not even with data, but I mean people that they are working in any organization inside the company, at some point they had to face these issues. But did you think that this is like an organizational problem or a technical problem or an organizational problem that has a technical solution. So what’s your what’s your perspective on that? Like,

Aayush Jain 15:08
I think, yeah, so I think so it’s it’s, I don’t think so it’s more of a technical problem, I think there’s more of an organizational behavioral problem, where the way we deal with data has been through dashboards and reports. So the natural bias, whenever anyone has any question, is to ask and request a dashboard, I think there’s more of an organizational issue, it’s more of a behavioral issue. And more if even if I have to cover any technical aspects of it. I think so the up until, you know, I will get to that point also later down the line. But I think so the only way for people to get answers right now, is dashboards and reports. And I think so, until unless we have things, you know, technology that kind of solves that part of the thing where people can ask questions without requiring dashboards, I think that this is something that’s going to stay.

Kostas Pardalis 16:01
So how we can do that, how we can ask questions without the dashboards.

Aayush Jain 16:06
So I think so now, not now, this is this is a very interesting trend that has been, you know, that we have been noticing in the past few months, and that has been growing very rapidly, is this whole idea of operationalizing the data into the data warehouse. Now, this is something that is a very interesting concept that, you know, that that that broadly covers few categories of the modern data stack. So one of the most obvious thing that we have very recently seen a very good success in terms of the adoption from the boat tech community, and from a business perspective, as well, is this rise of reverse ETL. Now, so far, what we have seen in the industry is, you know, people bringing in data into a data warehouse, and then kind of writing ad hoc kind of, you know, scripts and report a kind of, you know, the some people would kind of, you know, do airflow DAGs, and everything to kind of put this data back into the source systems. Now, with this, you know, with recent rise of reverse ETL, you know, there are few companies that are kind of doing an amazing job, when it comes to reverse ETL, you know, companies like high touch and sensors, where what what’s now happening is, the data that is required, is now no longer sitting into a data warehouse, this data, this insights that are needs to be generated from this data, where the data in the Data Warehouse is now being fed back into the source system. So now imagine, you know, a sales guy, you know, using HubSpot, or Salesforce as a CRM, they instead of them going to a data team and saying, you know, Hey, show me, you know, I want this list of users who have done these these things, so that, you know, I can target them, and so on, so forth. Now, what they’re doing is they have this information packed into their operational systems, which is Salesforce or HubSpot. Now, this is kind of a fundamental shift in terms of how do you think about data, the data into the data warehouse is no longer is, is kind of is no longer a place where data goes, and people would query it for reporting later on, you are actually activating the data that has been put into a data warehouse and making making kind of business decisions on top of that, that’s around the reverse ETL space, I think. So another interesting trend that has been happening is people you know, using this data into a data warehouse, and kind of building downstream applications from from the data warehouse. So one of the big one of the one of the companies that I really admire in this space is this company called continuous AI, they they have been kind of, you know, helping people build machine learning models, directly on the top of the data warehouse. And this solves a very important problem in end to end delivery of machine learning models, where people previously what used to happen is people would kind of have bad jobs where they would train a model, and then kind of use it into production. And then they have to build that pipelines where they have to continuously fetching the new data from source systems, and then retrain the machine learning models and kind of have that end to an automated pipeline, continual solve this problem by kind of directly placing themselves on the top of the data warehouse, and then kind of allowing people to build their machine learning models. That’s a very, you know, smart thing that I’ve seen recently in terms of operationalizing, the data warehouse, and then in terms of the one of the, you know, in terms of the operationalizing, the data warehouse, the one of the other things that I think so that is something that we are trying to do, you know, kind of tackle with club.ai is, how do we help companies actually generate insights from these data that has been captured into the data warehouse and actually make it actionable? And that kind of is that what we are kind of doing? here@club.ai So, one of the few things that we have heard with with our customers is is that, you know, most of the time, they would have so many KPIs and metrics that they would want to keep track on. And they use dashboards to be able to do that. And one of the fundamental problems that I just talked about, you know, when it comes to dashboard is, dashboards are now being abused as a monitoring tool, while they were not meant for that purpose. And this is something that we want to solve using Cliff AI is Cliff AI, we want to cliff we position Cliff as a business observability platform, it’s a platform that sits on the top of the data warehouse, and monitors every single metric that is in that data warehouse, and let the relevant person in the team know, whenever anything changes in those KPIs and metrics. And this is, in a way, is activating the data warehouse and allowing people to actually take business decisions and actually take actions on the top of their data. So these are, you know, this is an ever growing this, this is a rapidly growing field in terms of, you know, how do various businesses activate the data into the data warehouse? And we I think, so we are just kind of touching the tip of the iceberg as of now, I think. So, there are a lot of innovative solutions that are kind of emerging in the whole modern data stack, very recently, that is kind of tackling this part of operationalizing the data into the data warehouse, one of the categories that I kind of forgot to kind of mention there is this whole category of product load, you know, plg CRMs, where, you know, companies kind of have kind of it’s kind of a CRM built for product lead growth companies, that is kind of tightly coupled with the data warehouse. So there is a there is interesting, so much interesting activity that is happening around, uh, you know, what happens to the database data once it has reached the data warehouse? That’s

Kostas Pardalis 21:47
very interesting. I think you’ve touched like many, many important topics. Quick question. You mentioned, like in a very interesting term, which is the business observability platform that you’re trying to build. And you’re trying also like to move away from the traditional way of like, methodology of like consuming dashboards for that. So how does it work? I mean, you say, there are KPIs, these KPIs getting tracked, and the right teams are getting notified, when something changes, describe to us with a little bit more detailed, like the journey that your customer has on the products and how these things are defined how they are consumed, how they are notified, and how they can react also, right? Because reaction is also important. That’s why we need data. So we can act.

Aayush Jain 22:35
Okay, so you know, so one of the key things that, you know, I am personally very fascinated about is the whole concept of SRE in the engineering domain, right. So, if you look at what happened in the engineering domain, in the past, you know, past decade is basically, in engineering, you have tons of applications and services. And it got in at one point, it got really difficult for business to kind of, for any company to kind of stay on top of their infrastructure without having an observability platform. And, you know, companies like data dog and New Relic data, incredible job in terms of providing the, you know, those SRE teams, the right tools and the right platform, so that they can stay on the top of their systems. Now, typically, you know, if you look at the business side of the things, before the advent of, you know, Cloud Data Warehouses, you know, the business processes, a lot of business processes, were not that data driven. But now with the whole, you know, evolution of the modern data stack, each business process is also data driven. And now what has become it has become really difficult for business teams to be able to keep track of their metrics without having a system that can actually help them the way data doc has team has with the SRE teams. So that is the whole, you know, concept that we have when it comes to business observability is, Can Cliff become that system that allows business teams to stay on the top of their business processes. And the way the platform works is basically, we sit on the top of the data where also the cliff AI is basically a you know, it’s a SAS is a SaaS offering, where we plug in directly into a data warehouse. And we integrate with various data warehouses like snowflake, redshift, BigQuery, and so on, so forth. And the way integration works is basically so let me just kind of walk you through the entire process, you plug in place AI on the top of your data warehouse, we monitor every single metric that is there into a data warehouse in a completely automated manner. And there is no you know, there is there are no rules that needs to be defined, there is no thresholds or there is no such kind of setting that needs to be defined. Every piece of data gets monitored automatically. So we typically deal with matrix so we don’t deal with like the raw data. We do. With the KPIs and metrics that are there into the data warehouse, and then once a whenever any significant changes happens in those metrics. For example, if you see a sudden spike in your visitors sudden dip in your conversion rates, what happens is, you know, we send notification via email or slack or teams integration to the relevant team member within an organization. And there is one interesting thing that we have learned from the engineering domain is in terms of one of the biggest challenges that we have also faced is, how do we identify the right person within an organization that needs to be notified about some some important KPI changes. So that’s where we have kind of drawn our inspiration from from an engineering domain where the bay engineering domain have this whole concept of incident response systems, you know, you have these tools like pager duty, where if anything goes wrong, a pager duty incident is created, and you have an escalation process in terms of who gets to be notified about what, that’s something that we try to bring into the business domain. So for example, imagine this, you know, you have certain, you know, drop in your conversion rate, you define that, okay, you know, the first element escalation about that changes about that, that drop in that conversion goes through XYZ person in the team, if they don’t respond within a particular interval of time, the escalation goes to, you know, l to managers, and so on, so forth. So defining that whole process of escalation in terms of who gets to be notified about what and at what time, that’s also a very important part of this whole observability platform that we’re building. Kara clarify, one of the key things one of the most important, you know, member of this whole, you know, journey that we try to bring in with the business observability is actually the data engineering teams itself. And what we do is basically, a data warehouse is a huge, huge kind of, you know, kind of a space, where, you know, there are certain lot of data that might be usable directly, and that might not be usable, and that the core team that is responsible for ensuring that the right data, the right insights get delivered to the business or actually the data engineering team. So what what we also have is, as a part of our onboarding process for Cliff is, we also integrate directly with DBT. So imagine that our business already having defined their key, their key metrics and key KPIs that they want to track as a part of the DBT models. So what we do is we directly pull in all of those definitions from you know, from from these from the DBT, from from the DBT projects, and have them monitored in an automated manner, so that they don’t have to redefine those KPIs that they would want to ideally track into club.ai. So that is, that is one thing that our customers find incredibly useful when it comes to having this end to end process and having a system that kind of fits in really well with the ecosystem of products that they’re already using.

Kostas Pardalis 27:57
And you have said that like the new user doesn’t have like to mess with like definitions and rules. And so let’s say I’m a company that I don’t have like my KPIs defined as DBT models, what do I do then how are these KPIs defined? And consequently, like of course, like tracked right,

Aayush Jain 28:18
yes, so the two things here, so, let’s assume a scenario where a company does not have already have their core KPIs defined, you know, it right now. So, what what would happen in that case is whenever we connect to a data warehouse, what we also do is we have an inbuilt SQL Editor as a part of the platform where people can just define their queries right within the clarify platform, and those queries get executed into the data warehouse and those metrics are collected and monitored into Cloud Platform, what we have also seen and this is this is something that is kind of a new evolution that has happening in the whole modern data stack is there is now a rise of an intermittent or kind of intermediate matrix layer, where what people are doing is, you know, the this is this is kind of, you know, the looker has done it really well and Looker has this functionality of modules, which is what they call as look ml where they have kind of made it really easy for businesses to define and manage their KPIs in a very declarative manner. And what we are seeing recently in the in the industry right now is the rise of independent matrix layer, which is kind of right now defining the matrix and kind of maintaining those matrix as a function used to live with the BI tools right, you know, you would typically define your matrix and dimensions in a BI tool. Now with this, you know, Rise of matrix layer. There is a new paradigm that is a change that has been happening is you have a dedicated space and you have a dedicated platform to be able to define and govern yoke all the KPIs, KPIs and metrics into one fingerplays. So we have a very limited and a very kind of kind of a basic version of that matrix layer within the cliff platform itself where people can define and manage their matrix in one single

Kostas Pardalis 30:12
place. Oh, wow. And is this like? Do you have some kind of, like, declarative language that is used for this definition? Or?

Aayush Jain 30:20
No, as of now, it’s just SQL. So what we believe is that, you know, introducing a new language wouldn’t kind of solve any purpose for us. And I think so SQL is the kind of language for, you know, it’s kind of the, the the most commonly used language for the data engineer. So the our goal is to fit in the existing workflow as smoothly as we can. So that’s why it’s a plain simple SQL Editor that can be used to define queries.

Kostas Pardalis 30:47
Yeah. Makes sense. So are the dashboards dead?

Aayush Jain 30:53
No. So that, you know that that non marketing answer to that is no, dashboards are not dead in it. So dashboards serves their own purpose, you know, dashboards are good for a, you know, dashboard are still a good tool for reporting. But I think so what, what dashboards are dead for is the is that for kind of, you know, the use cases where you need to have a continuous monitoring of something. So dashboards are dead for, for all those use cases, where you need to have a human being constantly monitoring them for any kind of insights? I think so that is something that was definitely that for.

Kostas Pardalis 31:34
Yeah, I don’t know, Eric, I have a feeling about like, the highest level of escalation of the two like this probably going to be the board of directors. And I know, I don’t know if this is like a very good idea. What do you think?

Eric Dodds 31:48
It’s a good question. You one. You know, one question, I have a huge and this may be this is sort of jumping back to specifics. But when you think about continuous monitoring, and becoming you basically being served a notification. That’s really talking about anomalies. Right, yeah, things that are important enough to need someone to look at it. Because there’s been some sort of change. And when you think about, and I’m gonna oversimplify this, but let’s you sort of have two classes of anomaly, right? So one, where a metric is changing significantly, because of some sort of business activity. So marketing runs a campaign, you get on the first page of Hacker News, you know, your, or, you know, so that sort of positive or, you know, negative stuff, right? Like, at AWS servers go down, you know, so user activity, you know, falls off or whatever. Yeah, so the first class is like, something’s happening with with the business that sort of a fundamental lift or decline. The other one is changes in data, right? A definition changes, the name of an event changes, those sorts of things. How do you think about classifying those new you mentioned this a little bit? Before saying, how do you get the notification to the right person in the organization? But it’s interesting, because numbers can go up or down, either because the business itself is changing, or because the business really is there isn’t an anomaly in the business, there’s actually just a change in the data that creates an anomaly. The metric? Yeah, no,

Aayush Jain 33:38
and that’s a that’s a great question, Eric. And I think so the, you know, the way you have classified them into two kinds of problems, were those anomalies or actual business anomalies, or versus those are those data anomalies? Right? And I think so this, both of those pieces are very critical component to have in building into an observability. And I think so we have recently seen a rise of a lot of data quality monitoring tools that are kind of tackling this, you know, this whole problem of the data quality. And, you know, you know, we you know, and they’re they kind of, you know, doing an amazing job in doing in doing that, right, you know, in terms of ensuring that whatever data that gets put into a data warehouse, is that a quality data? And I think so, we have kind of very consciously kept our focus in terms of, you know, monitoring the business anomalies, rather than the data quality issues, per se. I think so when when we talk about, you know, the business observability platform, you know, monitoring metrics for anomalies is just one part of it, right? Because when you monitor something, you know, if you’re, if it is all about anomaly detection, it would, it’s better to call it a monitoring platform rather than an observability platform. And I think so what makes something observability tool rather than a monitoring tool is Not only telling something that something went wrong, but also to assist in terms of identifying why it went wrong. So the second part is equally important in terms of why something went wrong. And I think so, you know, if you look at from a business context, right, so if you see that there is a sudden spike in, you know, letter, the number of visitors, the very first question that any business team would ask is why, why is this happening? Is it because of, you know, is it because of some campaigns or the marketing team is doing? Or is, you know, is it? Is it because of certain other parameters? So that’s where we come to the second part of the observability platform is answering these questions why. And within the clip platform, what we have also built is a very smart root cause analysis tool. What it does is basically, if you have let’s imagine, let’s say you have a matrix, and you have a certain set of dimensions associated with that matrix. So what Cliff platform does is basically it does an automated root cause analysis to be able to identify what were the key segments that contributed to that spike, statistically, you know, what are the statistically significant factors that contributed to that spike. So in that, what we’re trying to do is we are trying to do and try to complete an end loop around observability, where you not only know what went wrong, but you also get an idea of why it went wrong. And the office, you know, that there is an underlying hypothesis, is the business have the right dimensions or right dimensions associated with that particular matrix? Do they have the right dimensions that that would assist them into the helping that root cause analysis as a part of the data warehouse? So that’s, that’s a key assumption that we have.

Eric Dodds 36:41
Yeah, super interesting. And this may be a funny question, but I’m interested in just thinking about, you know, all the listeners in our audience who have worked on the underlying data layer that drives metrics. I mean, we’re all familiar with that. How many? And let’s just define metric as sort of a single number. Yeah, you know, that represents some part of the business. How many metrics are your customers tracking? I mean, is it 10? Is it 100,000? Because I think we all sort of, you know, when you work inside of a business, sometimes you you know, it’s like, wow, do we? are we tracking a lot of stuff? Or are we not tracking a lot of stuff? So can you provide some perspective on that, since you see it every day?

Aayush Jain 37:22
Yeah. So that’s actually a great question. So I think so You know, I would just kind of, you know, have a clear demarcation here in terms of what a KPI and a bottom metric would mean. So basically, what might happen is business might have a limited set of KPIs that they would want to monitor. But the number of metrics that can arise that can occur because of the combination of dimensions can grow exponentially. So let me give you an example. So for example, the number of visitors coming to the website. So that’s just one KPI. But this particular KPI can have hundreds and 1000s of matrix of what that would be number of visitors coming from Google number of visitors coming from Facebook, and each of those metrics can have a significant impact on the business. And what what business try to do is they would not want to have just a monitoring on the top level KPI like how many visitors are coming on the website, but also how many visitors are coming from the let’s say, the social medias all from, let’s say, organic search, and so on, so forth. So the number of metrics can grow exponentially, depending on the size of business. Typically, you know, one of you know, one of our biggest customer is a telco company, and in that telco business, you know, I think at this point they are monitoring, I think so, you know, a broad number for for that would be they’re monitoring roughly around 200,000 KPIs, not KPIs, the matrix in a, in a near real time manner. So it can grow, as Yes.

Eric Dodds 38:51
Wow, that’s incredible. And as kind of a follow on to that. So it’s really interesting to think about the one of our previous or recent guests, I use the term data value chain, right. So we think about, you know, you have a collection of the data, you are sending it to places, whether that’s a different tools in your stack, you know, you unified in the warehouse, ideally, there’s metrics layer, ideally, I agree with you, I think the metrics layer is is the way that things are moving in the future. I mean, it’s super cool, what you can do with tools like DBT, and sort of all of that, then you are building dashboards for certain things, then you’re sending and operationalizing that data and, you know, ideally observing, you know, sort of doing business observability where as those things change, where does the data engineer sort of fit into the data value chain in the context of having sort of data quality, type automation, you know, tools say like big AI, or Monte Carlo, you know, with business observability tools like yours. What Are you seeing is that is, are these tools in sort of the changing data value chain around this modern stack? Is that is that sort of repositioning teams and data engineers in terms of where they fit?

Aayush Jain 40:11
That’s actually that’s a, that’s a very interesting question. And I think so what, you know, I was recently recently reading an article and what the author, you know, kind of gave a very kind of very apt summary of what’s happening in the data space is previously, you know, there used to be in separate roles within the organization called data platforms, where, you know, you would have a combination of, you know, engineers, and the data guys who are kind of owning and building the data platforms within the company, and with this whole rise of the modern data, second with the size of fold, you know, tools, and, you know, that are kind of emerging in various aspects of the modern data stack. But what’s happening is that the role of a data engineering becomes more prominent, in a sense that with this, you know, with all of these tools, Monte Carlo data clarify big, big, big data, the data platform aspect has been taken by the third party vendors, whereas the core, delivering the data value chain, or the tying of this data value chain still remains with the data engineers. So previously, you know, initially, the data engineers would be the one who would be writing the ETL pipelines to pull in the data from one place and putting into a data warehouse, but now their role has become way more significant in the entire value chain. Right, from the generation of the data to the consumption of data. And this is, you know, I think so with with the advent of DBT, I think so roles like data, you know, analytics engineer, you know, these, these were the rules that were not even heard of a, you know, a couple of years back, but this has become like a mainstream titles in terms of the, you know, titles like analytics engineer, so data engineering, analytics, engineering is something that is just growing day on day,

Kostas Pardalis 41:57
I use, I have a question, you, you use the term business observability platform, right. And okay, the most common observer, let’s see, observability started from the need like to observe our infrastructure. Then recently, we are also have started talking about like, data observability, where we have like other tools that are trying to like mimic what an observability platform does, but like for data infrastructure, specifically. And now you are like taking this on a level higher, which is like the business of building the platform. Now, in each one of these case, usually, we have like a very specific role that is related to operation. But it’s interesting that this user is a user of these tools, right, like we have the SRE is that they are using data, for example, who is using the business observer, like who is the equivalent of an SRE but for a business will say?

Aayush Jain 42:50
Yes, so that’s a that’s a good question. So I think so for us, the end consumer of the business observability platforms are typically the operations team. So now this operations team is also a very broad term. So it can span across, you know, revenue ops, marketing ops, or, you know, sales ops, or even in a lot of cases, actually, the operations, you know, the physical operations of a company. So, for us that typically, the audience are the teams that works with the operations team, you know, in one case, we have a revenue operations team that is, you know, monitoring various KPIs related to finance using clip.ai. So, typically, for us, the audience or the operations teams, the teams that are most impacted whenever any number of changes in their matrix. So for us, the audience are the operations teams, but the enabler of the platforms are the engineering teams, you know, the, the finance operations teams, we know we don’t expect our finance operations teams to kind of connected to their data warehouse and actually write those queries and get those numbers that they want. So typically, for us, the enablers are the the analytics and the engineering teams, the consumers are the operations team.

Kostas Pardalis 44:08
reads one last question from me, then I’ll give the stage back to Eric, I know that you are also actively like creating a community. So I’d like to ask you, and this is something that we have seen happening a lot lately, especially with products that already have to do like with data. I think everyone saw like the success of DBT and the DBT community, okay, of course, like communes are not something new, like open source communities for like since forever. And based on your experience, how important is a community around the data products and how they relate together? Or is it just like a marketing tool of the end? Like, how do you see the community what’s the position of the community as part of like your business value that your company, right like delivers?

Aayush Jain 45:00
Yeah, yeah, so I think so that this is something that I have a slightly different opinion probably, you know, would be very contradictory with my opinion that other people would have is, you know, I think so if we start a community with intention of getting a business value in return, I think we defeat the whole purpose of the community itself. And, you know, we started this modern data stack dot XYZ community, with the whole purpose of, you know, finding a place where people interested in the modern data stack can come together and create a resource that can help anyone to learn about the modern data stack. And in terms of the value that is there in the community, I think. So for us, the modern data stack community that we have built, is more about just interacting with the like minded individuals about what’s happening in the modern data stack. Because the modern data stack is kind of changing every single day, you know, there isn’t something new coming up, there’s something you know, into, you know, the categories there, the entire categories, or three getting created in the modern data stack. Very, very, you know, very quickly. And that’s one of the reasons, the only reason we wanted to create this community to just bring in like minded people together, in terms of speaking about the value of the community, I think, so the biggest value of a community is having an audience or having a kind of a kind of a set of people connected with you, where you can share these ideas, because one of the key things that is happening in the whole modern data stack, is the emergence of the new ideas, you know, you know, people haven’t heard of reverse ETL, you know, up until a few years back, people haven’t heard of PLD, CR, and people haven’t heard of business observability. And having a community, the biggest value of having a community is having a connection with the like minded individuals with whom you can share this whole idea, you know, no matter how much crazy it sounds, you, you share that idea with those set of people, and you kind of get a feedback on that those ideas, and from a from a value from another value perspective is, you know, we found our first set of customers, you know, first set of customers from this community itself, you know, we would share these, the IDEA says that, you know, the thing that we’re working on is club.ai. And we would get the feedback from that community. So I think that that is the the value chain value that you get from the community. And we have been very conscious that we always wanted to create an open community, we don’t want this to be a community by clip.ai. You know, even if you look at the modern data stack dot XYZ website, you know, we have a very, very small footer at the bottom, it says, you know, run by the team at the clever Cafe, and 111 of you know, one of the one of my close friends, so you know, who has been a pioneer in building an AV testing platform, you know, this is one thing that I learned from them is, you know, when when they were building a community, they would rarely talk about the product, their, you know, their, their offerings, you know, they would just talk about a B testing, and Debbie kind of became kind of a go to go to place for anyone to learn about, you know, AV testing, and eventually that has the business itself. So from from my perspective, the, you know, the community is kind of a long term game. And the goal here is to just bring the like minded individuals together, not from any specific business goal perspective.

Eric Dodds 48:32
Really great perspective there. Yeah, I agree. I think building a community that provides true value around subject matter without commercializing, it really takes commitment over a long period of time. You know, it’s not necessarily something it can it helps create context for the business problem, you know, the company that you’re building, but really appreciate that perspective. We’re close to time here. But I have one one more question for you. So you get to see anomalies in data all the time from customers who are using Cliff Clift on AI? What are some of the most interesting anomalies that you’ve seen? That just sort of surprised both your customers and you?

Aayush Jain 49:12
Okay, so I think so I remember this case, where, you know, and it’s actually it’s not not very surprising, but it was very kind of impactful was one, you know, there’s a customer who was who were monitoring their marketing ad spend with respect to.ai. And one day, they saw a sudden spike in their ad budgets, you know, that set up some, you know, in the Google ads, they have set up, set up the daily limits, but it was that that limit was set to be kind of decently high. And what happened was, you know, there was one footballer who kind of mentioned a term that is that is one of the target keywords. And what they saw is they had a sudden spike in their ad spend, and that It was coming from a completely irrelevant audience and the source by and they kind of turned on their bidding on that particular keyword for the specific duration of time. So that is very interesting.

Eric Dodds 50:12
Yeah. But I mean really useful to like be able to catch that pretty quickly. Wow, that is so funny, amazing how, you know, someone with a huge following just mentioning something can impact, you know, a company’s budget.

Aayush Jain 50:26
And you know, and they were a b2b SaaS company that has nothing to do with, you know, what that footballer mentioned? Yeah,

Eric Dodds 50:34
that’s hilarious. Do you remember what team that footballer played on?

Aayush Jain 50:38
You know, I, I can I can very vaguely remember who was a footballer, but I think so. It was someone from Chelsea. Ah, okay,

Eric Dodds 50:49
Chelsea. Hilarious. Well, I use this has been a really fun show. Thanks for joining us. And best of luck with Cliff today.

Aayush Jain 50:59
Thank you, thank you very much. Thank you for having me.

Eric Dodds 51:04
Really interesting conversation, one thing that I really appreciated, and it’s one of those things where you kind of know it in the back of your head. But since you’re experiencing every day, you don’t really think about it. But it is pretty wild, the number of ad hoc analyses that are created and require a lot of work, and then are basically thrown away. I mean, Google Sheets for every company must be a massive graveyard of ad hoc ad hoc analyses, you know, and in some ways you have, you know, saved queries on the warehouse that are similar. And thinking about the paradigm shift to sort of monitoring or observability in the context of your stable metrics and KPIs is really interesting. So yeah, it made me think about how many Google Sheets I have, in my drive that, you know, are old stale, ad hoc analyses that were really useful for 15 minutes, and then I’ve never looked. Yeah,

Kostas Pardalis 52:07
yeah, I think everyone can relate to that. And I think anyone that comes from my bi background, they can definitely feel the pain of what it means like to maintain all the different, like dashboards that the company has, I think it’s much more evident early on in the life of a company because there’s no clear ownership of the BI process, like everyone is like a bi analyst in a way. And I think it’s a problem that BI tools have struggled like to figure out the solution for like forever. And I don’t think that they have managed to do it like efficiently to be honest, mainly because of like, technological problems, also, an organizational problem. But that book analysis is like very important. What’s for sure. Like, we always need like to ask questions. So of course, we have to do that like also without data. Now, how we manage all this garbage that’s created from the last, yeah, something that we discuss this today. With us, I think, there are two things that I found, like really interesting in our conversation. One is the fact that this base of operationalizing data becomes richer and richer. If I remember, like, okay, like, our audience probably is already familiar with rivercity l. But that’s like only one manifestation of how to operationalize your data, we had another guest who, the CEO of airbike, mentioning that machine learning models are also operationalization of data, right, which is a very valid point. And I used today mentioned conveniently AI, for example, which is exactly that. And of course, like all the like Tecton, and all the different features stores of the hand, that’s exactly what they are doing. And now we have observable business observability, which is another way to operationalize like the data. That’s, that’s very interesting. And I’m really looking forward to see what else will come up. And the other thing is that how big of an impact the SRE discipline and the operations, the engineering operations discipline has outside of just like monitoring data, I mean, managing the infrastructure, the the IT infrastructure of a component. Yep, we see that like, getting repeated in data. And now we see it also like on business where we have like these themes of like forever herbs, marketing of sales of all these different like roles that they arise. And of course, like all that stuff, like is built on the availability and accessibility of data today. So yeah, that’s this was like a very, very interesting conversation. And I’m very curious to see how these business observability category is going to evolve.

Eric Dodds 54:49
I agree. Well, thanks again for joining us on the data sack show and we will catch you on the next episode. We hope you enjoyed this episode of the data stack show Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me Eric DODDS at Eric at data stack show.com. That’s E R I C at data stack show.com. The show is brought to you by Rutter stack the CDP for developers learn how to build a CDP on your data warehouse at Rutter stack.com