On this week’s episode of The Data Stack Show, Kostas Pardalis and Eric Dodds are joined by CTO and Co-Founder of Holistics, Huy Nguyen. Holistics takes an approach to business intelligence and data analytics that they call DataOps. They focus on data team productivity and company-wide access to insights.
Important points in the conversation included:
- Introduction to Huy and Holistics (3:12)
- Approaching BI with more than just visualization (8:59)
- How friction between different roles within an organization is addressed by Holistics (15:20)
- Holistics as a complementary tool (23:25)
- Describing their own data stack (34:47)
- History of BI and trends for the future (39:33)
Eric Dodds: Welcome back to The Data Stack Show. Today, we are going to talk with Holistics, which is a self-service BI platform. They do some really interesting things relative to a lot of other options in the market. And I actually had a chance to meet the Holistics team in person over a year ago in San Francisco at a conference I attended and their team is incredibly sharp, really enjoyed meeting them.
But from a technical standpoint, I’m interested Kostas, what do you want to ask them based on what differentiates them? Because there are so many options in the BI space.
Kostas Pardalis: Yeah, actually, it’s very interesting that we are discussing with someone from the BI space because as a market it has gone through a lot of changes lately. I mean, it’s not been long ago that we had the acquisition of Looker from Google, for example. The acquisition and going out of the public market of Tableau from Salesforce, the merge of Sisense with Periscope data. So, it’s a very interesting market. There are many things that are happening, products are really competing with each other trying to differentiate. And we also have to consider that we have a company that is actually based from Asia. They’re based in Singapore and Vietnam. And they managed to do an amazing job in expanding both in the United States and in Europe. So it’s very interesting, first of all, to see how they manage to differentiate as a product and how they perceive differently the BI problem and the visualization problem and how they succeed in that, especially with the constraints that they have, right. I mean it’s really hard to compete in this space, even if you’re in an environment like Silicon Valley, it’s even harder when you have to do that from completely different cultural and times zones, like being in Asia, for example.
And I think the team there has managed to do an amazing job in building a technology that really stands out compared to the rest of the products. I’m really interested to shave because they’re approaching the problem from two sides. One is from the data analytics, of course, which is the main consumer of a product like BI, but at the same time, they are trying to approach and solve the problems of the data engineer.
And we have seen in this show about the data engineer as a, I personally would say, the organization is becoming more and more important, and it would be great to see how these problems are addressed from a purely BI tool which stems from the visualization space. So, yeah, I’m very interested to see what Huy has to say and let’s move forward and talk with him.
Hi Huy, it’s very nice to have you tonight on The Data Stack Show. Thank you so much for your time. I’m very excited to hear about yourself and Holistics and your products and the stories that you have to share around BI and the data industry in general. So welcome. Can you please introduce quickly yourself and also say a few things about the company?
Huy Nguyen: Okay. Hi, Kostas thanks for having me. Very excited to be here. So my name is Huy. I, you know, I run this company called Holistics. So we are sort of like a data platform that helps data teams view and maintain a central source of truth for analytics, logic, or business logic. And then expose that to the business user, to the non-technical users via a very simple interface for them to kind of ask data questions and get answers themselves without bothering the technical people, the data teams. Essentially that’s what we do.
Kostas Pardalis: That’s great. So you give us a little bit more color on the origin, the background behind Holistics, like who came up with the idea and the evolution of the idea, because as everything else in this world, probably it was a bit different at the beginning when we started compared to how it looks today. So it would be great to see the evolution of Holistics.
Huy Nguyen: Yeah. So my background is as a software engineer, and, you know, I studied computer science, you know, I have been doing programming all my life. And then when I studied in Singapore and then when I, you know, finish university, I kind of joined this company called VIKI. It’s actually a US startup, very popular back then and still now, and they do Korean drama, Chinese drama, Taiwanese drama, Japanese drama. So they, they basically tech…Think of it as a Hulu for international audience, international movies. So then I joined that company as a data engineer, my first job out of college back in 2012 or 13. So then we have a very small data team back then, there were only two or three people, and then I was kind of the first hire that they hire there. I worked with my boss, the director of analytics, and, you know, in a short span of, you know, one or two years, we started to build our internal analytics infrastructure that serves the company, internal users and even external users.
Right. So along the way, you know, we can get more about what kind of a data stack back then we were using in a bit, but along the way, you know, we view a bunch of internal tools inside a company. And then, you know, one of the tool is, turns out to be a simple dashboarding tool, right. You know, you write a SQL query, you slap a chats visualization on top, and then you, you write some sort of a boilerplate around it to share it with other people. So, so I started building that thing and then after awhile, I realized that, Hey, this shit can, can be abstract. It can be externalized to other people. So I went to my boss to talk to them, to him. I went to the management team and asked, Hey, if I can spin this off into a separate startup, then they all say yes.
So, so the idea kind of started from there. So, and then I talked to my friends, recruited my you know, current co-founders. And then we go kind of work at it part time, a bit in the evenings. And then once we landed our first customer, we went full time from there. So that, that was how it started.
Kostas Pardalis: That’s great. Where are you based right now Huy?
Huy Nguyen: So the team, the company started in Singapore, but then when it started, I moved back to Vietnam. And then started the product and engineering team here. And so, so right now the company is split between Singapore, Vietnam, and Indonesia.
Kostas Pardalis: Oh, that’s great. That’s very interesting.
And in terms of your customers, I mean, are you mainly active in the Asian markets? You have global customers? Can you share a little bit more about that?
Huy Nguyen: No, so actually we, we started with the Southeast Asian customers, but as of now a majority of our customers are US and European based. That’s a common question that people ask me, right. You know, you guys are based in Asia. Do you guys only work with Asian customers? More than 50% of our customers are US and European.
Kostas Pardalis: And I have to say it’s amazing what you have managed to do. I mean, I try to do something similar from Greece, from Europe. And I can understand the difficulties of trying to build something that equates trying to share with the American market, for example, and do it outside the American market.
Huy Nguyen: Thank you. Thank you. I mean, if anything we learn, we, we should have come to the US sooner. That’s one of our lessons.
Kostas Pardalis: Yeah, that’s probably also my lesson to be honest. Well, anyway, that’s a, that’s a, that’s a topic for a business podcast, now we want to discuss a little bit more about technology.
So getting back to the product, because I would like to spend a little bit more time, on the product. If I understand correctly, the perception that most people have when they are, we are talking about BI, is that it’s all about the visualization. And we have somehow associated strongly the term of a BI tool with visualization, but from what I understand Holistics is not just a visualization platform. Can you tell me a bit more about what’s different about Holistics, how you compare to other tools out there? Like Looker, Chart IO, Sisense, and the rest of the tools. At the end what’s special? What’s the secret sauce?
Huy Nguyen: Thank you. I think you brought up a very good point about visualization. I think for the longest time, when people talk about BI, the thing about visualization, it’s a very understandable lot of thinking. And I think, you know, the, the leader in that space, Tableau, has done a tremendous job, educating the people. And it’s, the thing about it, is if you are a business user and people talk about data, you know, the first thing that comes to mind, is the visualizations. Right? And then the next thing that comes to their mind is, you know, what is the software that, that does a good job at visualizing the data, which is, you know, Tableau, right, the leader in that space. Right, but, but that’s because you know, the business user, the non-technical users is not the person who prepare, who went through the entire process of collecting, you know, cleaning, validating the data, transforming, preparing data for the final visualization–the data preparation period.
That’s the data team’s job. So, so essentially, if you look at the statistics, a lot of people will say that they spend more than 80 or 90% of their time, just trying to get the data, preparing it, getting the data to the right format before visualizing the data. So from a data team perspective, a majority of the time it’s not spent on visualizing data, but spent on preparing it.
So, so that’s kind of how, you know, Holistic comes in right as first cut, right. Contrary to say Tableau where in order to visualize the data in Tableau effectively, you need to get the data into the right format. That’s where Holistics comes in. So that’s the first thing. So the second thing is, if you look at the BI space in general, I mean, I’m going to give you a more long-winded answer because it’s kind of, you know, we’re basically talking to the data analysts audience here, right?
So they wanted that nuances. If you look at, into the BI space in general, you see that they are two groups of generally there are two groups of BI tools. On the left is what I call the pre-cloud tools. These are the BI tools that were built pre-cloud built for the desktop era, the server era. Where what they do is what the assumption is that the data warehouse is expensive.
So then what they do is they were going to be a data store for themselves, and then they’re going to load all your data into their proprietary data store. And then they’re going to expose kind of a very simple drag and drop interface for the users to build a report.
And then on the other hand, you have the group of tools where it is built around SQL. Right. You know, when more and more people realize that, Hey, you know, I should let the data warehouse do the storing and the processing of the job instead of letting the BI tool do it. A lot of people started building tools that rely on SQL. So you write a SQL query. You sent data away to the database it’s executed query and return the result and visualize it, clear. So the difference is that on one hand, the first group of BI tool is very easy for the business user to use, because they can use the drag and drop interface to, to be able to report but it’s not actually a very friendly for data people, the data people prefer SQL, right. And then the second group of BI tool, because it’s SQL native, data analysts and data team really like it because they, you know, they prefer working with SQL. Right, but, but on the flip side, they are not friendly for non-technical users because now the known technical user have to learn SQL.
So Holistics is kind of sit in the middle right there. Right? So on the one hand, we are a SQL base. We are based on SQL. So there is a very family experience for the data team. But on the other hand, we don’t require the nontechnical users to learn sequel in order to build their own reports. So, so that’s kind of how we are differentiated.
Right. Does that kind of answer your question?
Kostas Pardalis: Yeah, absolutely. So it’s great. I mean, I can understand that, especially compared to products like Tableau or Power BI, for example, or even products, like if people remember Periscope data before it became part of Sisense and compared with Sisense because Sisense was exactly as, as, as you are saying, , right, you called for more, in one hand, something like sciences that was very strong, visualized stuff.
I don’t have the kind of SQL experience that data teams needed. And that’s why the two products are merged together trying to deliver at the end, this kind of experience. Something similar also happens in a way with Looker, I would say. The difference there is that Looker implemented their own language at the end.
So you have LookML, which then it gets , it becomes something similar to what DBT is. So yeah, I totally get the vision and I think it makes total sense.
Huy Nguyen: Yes. Yeah.
Kostas Pardalis: As always the devil is in the details. It’s all about how, how you implement the product. From what I understand is based on what you’ve shared with us so far about the product.
So Holistics is not just a product for the data, for the consumer of the data who’s the person who’s going to pay the reports. Right? It talks about the people who care about bringing the data together, saving the data, transforming the data, et cetera. It’s also about outside of the analyst or the business analyst, it’s also about the data engineer. From what I understand, if not, please correct me, but I assume that , it’s also a tool that has a special role for the data engineer. In this case, can you. If this is true, of course, can you share with us what are the problems that these people have and why these problems and how actually they are addressed by your platform about, about, from how Holistic addresses these problems and solves them.
Huy Nguyen: So, you talk about the data engineer, right?
So essentially we don’t really focus on the data engineer that much. The kind of person, the role we want to focus on is a data analyst. If you really, so basically we wanted to make the data analyst’s life better. Right. And the data engineers soon, you’ll see that the data engineer is part of the picture.
So if you think about the data analyst role, you know, he, or she will be the perfect person with the right incentive and with the right kind of skill set to be the main person driving the data team, right. Because you know, a data analyst, a person, she’s technical enough to understand how the data structure looks like, understand the nuances in the technicality of the data.
She also has a business mind enough to be able to work with the business users that you kind of understand their perspective, how they think about the business model, how they have the businesses running and offer the data perspective to help them make decisions better. Right. But if you, if you kind of really look at the, the pinpoint, the problems that the data analyst is usually facing in the data organizations, and you see that she has a lot of bottlenecks, a lot of things you need to overcome, for example, right. A very common example is that if you, if your organization has nothing installed have no set, have no data set up the data team, sorry, the data consumer, the nontechnical user will keep coming to your data analysts for ad hoc reports, right?
I want to check, Hey, how is my sales in this quarter compared to last quarter? I want to check if there’s any customer abnormality happening in my department, you know, stuff like that. And then the analysts have to always spend time manually compiling those reports to prepare for the business user.
So that is a bottleneck right? Because the analysts waste time doing that manual reporting and then the consumer actually wastes time waiting for it to happen. Right. The second bottleneck I see is between the data analysts and the data engineer, right? Because the data analysts usually come from a not very engineering background, you know, maybe from an economics background, a finance background, she knows a little bit of SQL, a little bit of a business, you know, knowledge.
Right. But she doesn’t, she usually, she doesn’t know how to write a code like Python, Ruby programming code. So then whenever she needs some sort of data, she actually have to go to the data engineer to ask, Hey, can you pull in this data for me? How can you prepare this data for me so that I can view this report on running the analysis for, for the CEO.
Right? So that is the second bottleneck. The analysts actually have to wait for the engineer to get, to prepare, to prep the data for her. Right. And so, so, so. And then the third one, or you will see is that as the organization, as the organization grows, You know, you have more data analysts and then you started to get into analytic chaos.
Right. You know, that data analysts have no mechanism to collaborate between each other. Right. You know, some analysis you’ve done or some kind of aggregation work you do are not being shared or communicated to other data analysts. So then different analysts kind of use different formulas. To run the report to get the numbers so that it started getting into analytics chaos.
Right. So, so that is another friction, right? So it, it, yeah, so essentially there are three bottlenecks that happen with analytics between the data analysts and the business user, the analysts and the engineer and the data analyst with another data analyst. Right? So you know, our hope our vision is to build a platform that can empower analysts to resolve, to remove these bottlenecks altogether.
Does that make sense?
Kostas Pardalis: Yeah, absolutely. So can you start a little bit more information of how these can be done today with Holistics? I mean, for this kind of these three types of frictions that exist inside their organization, right? Do you solve all of them, first of all, do you focus on one of them right now? And yeah, tell us a little bit more about the current state of the product in solving these problems and your future plans about this.
Huy Nguyen: Yeah, so, so we kind of solve all three problems in a kind of a one shot. So, so the three values that Holistics. So is that it’s first of all, it’s allowing the non-technical users to sell service, and get the data directly without going through the data analyst.
Right. So that’s the first bottleneck between the data analyst and the business user. Second of all, the, a lot of the work that data analyst require the data engineer to do a pipeline in work. So we actually give data analyst kind of data engineering powers to do so. Right. They can, you know, they can load data from different sources into the platform.
They can do simple transformations. They can, they can build reports to expose to the business users. So, so those are kind of, we call data engineering powers that previously data analysts don’t have. Right. And then the third thing, which is the friction between the two, the data analysts and data analysts, we actually as I mentioned earlier in my, in our kind of intro pitch, we actually help data analysts view central source of truth for, for the analytics logic for the business logic. So that whatever work that you do is, be it in, you know, be in, check in, is be in version control in a way. Right. And being communicated with other people so that the team don’t repeat themselves. Does that make sense? So we do this. The way we do this is to build a careful, very logical semantic layer that sits between the business logic and the data logic and the underlying data warehouse logic. So that the data team they define the business logic on the transformations or the pipelining in that semantic layer. And then, and then that semantic layer becomes the source of truth for the organization to come to get the answers. Does that make sense to you?
Kostas Pardalis: Yeah, absolutely. That’s very, very interesting.
So I guess through fully directions that you had so far with your customers, you will also expose us to the data stacks that your customers have. Right. Because I assume that Holistics is not the only product that they are using for their data needs. So can you share a little bit more about what you have seen out there? What you have seen in terms of technologies that companies are using, how they try to architecture their data stacks, and of course at the end, how holistic’s story fits to these architectures that we see out there?
Huy Nguyen: Yeah. I think over the years we have started seeing people shifting. So, so your question is specifically around what data stacks that we see our customers using, right?
What are they thinking about data? The data stack.
Kostas Pardalis: And what technologies do you usually see working together with Holistics?
Huy Nguyen: Okay, let me think. So, so one of the things that we wanted to position Holistic is we are not a replacement. We are a complimentary tool, an argumentative tool with the existing data stacks.
And so when we come in, they actually don’t, if they have something that’s working, they don’t need to replace it completely, but over the years, I’ve seen that, you know, when people were familiar with data warehouse. Right. But for some of what I see is that when they started, when companies started, they actually don’t need a data warehouse.
Right? Yep. The first thing they do is they just take some sort of BI tool, SQL BI tool that Holistics plugging directly into the production database. And off they go and they can start building a report. You know, these are not frequently assessed reports, so, all that best practice advice is that we give people about, Hey, it’s going to increase load to your database, it’s going to affect your production applications.
It doesn’t apply. Right. You know, they just need something that works, that, that fits there. They need very simple, their analytics needs, right. Except for when they have something like a MongoDB database, as an example, where it’s very difficult to do analytics on top of MongoDB. And that’s where we kind of recommend them to say, Hey, you know, you should spin up a data warehouse instances, like BigQuery, Redshift, Snowflake, or even a simple Postgres database too.
And then pull the data over from MongoDB over to the data warehouse and slap a BI tool on top. Right. So that’s one thing. The other thing that I see is that I see increasing usage of, I see kind of a shift between things like, and so recently we, we see a lot more customers using Snowflake and, you know, the hot new data warehouse.
Yeah. Not as for the older customers, they are still on Redshift, or BigQuery, but I, I do see you know, some trends where people are kind of moving away things like Redshift when they run into some performance problem over to say BigQuery or Snowflake.
You know, unless the infrastructure required them to stay on AWS. A third thing is, you know, I do see that, you know, tools like I, another thing I started to see is that they moving away from tools like Google Analytics to over to tools like Snowplow. And of course, Rudderstack, you know, being in Google Analytics, doesn’t give you that kind of granuality in the events data that they need as compared to collecting the event data themselves.
Right. And then, you know, they start to realize that, Hey, so we see a lot of company moving from Mixpanel to a custom built in-house solution, usually open source solution. And of course there’s Snowplow and Rudderstack comes into mind. Yeah. So that’s the third thing we see. I mean that’s what I can remember right now.
Kostas Pardalis: Yeah. Yeah, it makes sense. I think there let’s say democratization of data warehouses because actually data warehouses they’re like the past decade or something, they have become much more accessible to almost everyone. Actually, I think today, even for very small companies accessing something like BigQuery or even like Snowflake it’s cheap.
Right. I mean, yeah. And all these systems they charge based on your volume you have there, the processing that you are doing. So if your data set is pretty small you’re not going to be charged a lot of money anyway. So that’s becoming a bit of a no brainer for companies to, even at an early stage, as you said, to use some of these technologies.
Redshift is a bit of a different story, mainly because it’s requires a lot more management although they’re working to change that. And I think that’s one of the reasons that we see that fully managed services, like Snowflake and BigQuery are winning big.
Huy Nguyen: So to comment more on that, right. I think interestingly, if you go back to say, I remember in 2012 and 13, that was when Redshift first came out.
Right. I remember because I was the data engineer back then looking for VIKI and this company, so we were, our data warehouse back then was Postgres. And then when Redshift came out in beta, we kind of immediately jumped into that to try it out. And it was wonderful, right. It worked great. It’s yeah, basically, because it’s compatible with Postgres there were very little things that we needed to do to migrate the data or to migrate our reporting system over. Right. Because it’s compatible syntax is known that. Alright, so Redshift was the first cloud data warehouse that is popularized, you know, basically drop the price of data warehouse dramatically. Right. But then I think the downside of that is because they, they were, you know, and I’m sure you know, about the history of Redshift, where they were based on Paraccel, which is our base, our Postgres. And then Amazon kind of strike a deal with them to kind of bring the Paraccel version onto their cloud. I mean, they did an amazing job of kind of making it more a cloud SaaS and making it more accessible to people.
Right. But essentially all this infrastructure, I don’t think they are built natively for the, for the cloud era. If you look at BigQuery and Snowflake, one of the main advantage, is the splitting of compute and storage out. Right? So then the compute and storage don’t sit in the same kind of physical servers, so to speak.
And that’s why, you know, even though Redshift was the first to come out the market. It became an educating factor. It educated people to start using data warehouses. And then when they’re faced with the problem with performance usually, and any costs because they have to constrain themselves to a physical unit of a computer, server units because of storage and compute.
And then that’s where kind of BigQuery and Snowflake kind of comes in and takes off from there. So I think that’s very interesting, kind of a thing to observe over there.
Kostas Pardalis: Yeah, and I found very interesting what you were saying about applications like Mixpanel and these very specialized let’s say web applications around on analytics.
Because my feeling is that as we stopped having very powerful and pretty cheap to use data infrastructure, like the modern data warehouse on the cloud. Very sophisticated BI and visualization tools like Holistics, having your own infrastructure to actually do the product analytics that you could do with these kinds of products, it becomes much, much easier.
So instead of using another data silo and other products inside your company, you can reuse your data warehouse and actually build at least some of these functionalities that you find on these products, on your data warehouse using something like Holistics and Snowflake
Huy Nguyen: Exactly, exactly. So, coming back to my last job, so back then what we were doing is that for our events data, we were actually storing them in Hadoop. So we have this getup for, we build this collector, right, you know, a custom-built collector on top of a tool called Fluentd. And then, you know, we let them web endpoint, we push event data there, and then we use a Fluentd and push it to our S3. And then we slap a Hadoop cluster on top.And then we run some sort of aggregations and then the aggregate results pushed back to our Redshift data warehouse. Right. Alright. Does that make sense to you?
Kostas Pardalis: Yeah. Yeah.
Huy Nguyen: Yeah. And then the way we do that, the reason why we do that, it was because I think, I think cost of data warehouse with also, you know, it’s not like BigQuery or Snowflake, right? Where you, where they separate compute and storage, you know, in Redshift, if you, the more you, the more events that you push, the more, the more raw events data you push to it, the more storage is consumed and it’s actually, and then when you want to upgrade, you have to actually.
Upgrade the entire cluster. Essentially, we maintained a dual system, a dual data warehouse, one run on top of the Hadoop ecosystem and then the other one runs on top of, you know, traditional MPP databases.
Kostas Pardalis: Yeah, this is, this is a topic that even now you can see, in some companies, any company that has to operate on AWS, and they have like huge amounts of data to work with, especially if they are event related data, you can see that they are probably going to implement something like a data lake where the data will be stored on S3 then just the subset of this data is going to be loaded in Redshift or something like Spectrum and Athena to prepare and load the data or even query the data directly. And yeah, I think that this is also a big byproduct of the architecture and the amount of data that some companies out there have to deal with.
So yeah. What you did with, with Hadoop I mean, I think it’s still happening. It just the technologies have matured and things are a little bit easier than just bringing up…
Huy Nguyen: It’s convergent basically. I mean, if you think about it, then there are two tracks, right? On the one track that is things, the MPP database that spin out of a C store there, the columnar storage mechanism. And then on the other track that you have, you have Hadoop, right? The idea of separating, compute and storage, and MapReduce. And then what you are saying is that, and what I’m seeing is that these two tracks would somehow converge to the same idea. A lot of the concept from the MPP columnar storage database is also being, has already been applied over to the Hadoop ecosystem and vice versa.
Kostas Pardalis: Yeah. Yeah, I totally agree. That’s also what I see. And it’s very interesting to see how this market is going to develop in the future, because I think we are still at the beginning with what is happening with the technology around data. So I’m very excited to see what the next couple of years will, will bring us.
So, okay. We talked about the, the data stacks that you have seen out there in the wild . Quickly can you, because at the end you are also a company, right? You also have to work with data internally and do reporting and other stuff. So, yeah. Very quickly. Can you share with us a little bit of your infrastructure?
What kind of tools do you use? I assume you’re using Holistics in-house, but if you don’t, you can tell that. So yeah, share with us, what are you doing? What kind of best practices you are also following and what kind of stack you have?
Huy Nguyen: Okay. Okay. I mean our data, I mean, we are B to B company, right? Our data stack is pretty boring so to say. You know, we don’t have a lot of, you know, huge volume of data to process. And then we just kind of a standard, right? We have, you know, our production database is a Postgres database we use. And then our data warehouse, we are using BigQuery right now. We loaded our data from, we use Holistics, for sure, we loaded our data from Postgres over to BigQuery and using Holistics, you know, and then we used Holistics to do the modeling, you know, to do all the business logic to data logic mapping, to do all the transformations, within the data, within the data warehouse, the queries. And then we also use Holistics to expose kind of self service interface for the business user of predefined dashboards.
We use holistic too, set up these, push the data push from, so, you know, we don’t log into Holistics every day, right. We push data into our Slack channel. So we set up this report and we push the data over to our Slack. So then every morning we log in to Slack, we open up and then we can see the very nice visualization that sits there to say, how many say users we got the last day of the last week, stuff like that. Right? So that’s on the transactional side, on the kind of for event, the analytic side, we set up Snowplow, you know, that was a year back, and then similar things, Snowplow push data to BigQuery. And then in BigQuery, we, you know, we also use Holistics to model a lot of the events data, page views data, and then push to the visualization.
So, so that’s pretty standard. That’s pretty boring. I mean, essentially you can reduce the three things, right? Snowplow, BigQuery and Holistics.
Kostas Pardalis: Well, to be honest, I think that it’s something that they have encountered the look like in this podcast, like the most successful data stacks that we have seen so far usually they employ some kind of boring technology, boring design principles, but at the end, makes sense. I mean you can’t just use every state of the art thing out there because you will pretty much end up, like duct taping your infrastructure. So it’s quite important to also use that proven technology out there.
And it’s really interesting. We’ve had some discussions and this is something that we also do internally at Rudderstack. For example, for our products, we created some kind of queuing mechanism on top of Postgres. Right. I was talking in this upcoming episode with the guys at Slapdash, for example for their own products, they also needed to build some kind of, some kind actually graph database and they decide to do that again on top of Postgres instead of using one of the state of the art products around graph databases that you can find out there.
Huy Nguyen: Yeah. Yeah. That’s really cool.
Kostas Pardalis: Yeah. I mean, if you think about it, okay. Postgres is a piece of software that has been developed for , almost like 20, 30 years now. So there has been so much human energy needs. I mean, so mature and when you’re building a product at the end, you need to make sure that you deliver the best possible experience with your customer. Right? The customer doesn’t care what you’re using on their background.
Sometimes it’s good.
Huy Nguyen: Yep. Yep. I mean, I think, I mean, I can say, so I have so much good things to say about Postgres. I mean, it’s a very good general purpose database you can use for a lot of the use cases, especially analytics right. I mean the SQL syntax and the functionality around SQLs is insane, way better than mySQL. And actually, I actually wrote a blog post about, this is like six years ago, on why you should use Postgres over my SQL when it comes to analytics.
Kostas Pardalis: Yeah, that makes a lot of sense. I can understand that. So, all right. Moving a little bit forwards, actually going a little bit back, let’s let’s expand a little bit more about around the BI market. And can you tell us, I mean, you’re an expert in the BI market. Many things have happened in the past two years. Many acquisitions, products had merged. Looker was acquired by Google for a huge amount of money. So what do you see that it’s happening right now in the BI space, and more specifically in the visualization space, what are the trends that you see there? And what do you think is the next big thing when it comes to BI?
Huy Nguyen: So I mean, if we, if we, if we step back a little bit and look at the, I mean, BI has been around for 60 years, right?
I mean, it’s been around for very long time, but if we really look into the history and how the BI market evolves. It’s very interesting to look at. I mean, and then we’d roll this, you know, guidebook on a Holistic website. If you want, look at the, we call it the tree to the three stage or three waves of BI.
So at the beginning, this is maybe 40, 50 years ago. BI is a very centralized system. Right. You know, you have things like Cognos. IBM Cognos, you know, you work for only the big corporation can afford BI. I mean, not for a small company, you know, they, they kind of fought BI. So do you have the, you invest millions of dollars to build this?
BI system is centralized. It’s managed by IT basically. It’s only, it’s because the computing resource was so expensive that they will only be able to serve the business, the top level management in that order. Right. Basically there is some sort of a huge system. The data gets loaded into that system. They run overnight and then in the morning they churn out some sort of reports, the very standardized report for the business user to look at. If you have random questions like ad hoc questions, you can’t. Basically your request goes into a queue in the IT desk and then, you know, the IT person will prioritize, prioritize the CEO, a C level executive request over your request, right? So usually you wait for maybe one or two months to get the data. They get the report you need. And every report we have to go through IT. So, so, so in a way I call that the centralized era. And then, you know, with the centralized era that they have all these problems and that basically only the top executive have asset to report, you know, the mid level, the low level operational person they’ll have access to it.
So then there comes, there comes the second era what I call the decentralized era. The decentralized era happens when tools like Tableau or Excel even comes about. Basically, instead of, you know, submitting the request over to the IT team, you, you lock into some sort of the system, the CRM system the production system, you download the CSV file, the Excel file at the export, the CSV file, and then you load that CSV file into a desktop program installed on your computer, a tablet, desktop for example was awesome. You know, you, you load your CSV, dump the CSV into Tableau, and then you started to, you know, really explore the data it’s completely drag and drop.
You know, it required no SQL knowledge whatsoever. Non-technical user could learn to use Tableau and, you know, assuming they got the right data extract and they could come up with a fancy graph for the rest of the company to consume. So, so that was the second era, the decentralized era, which tool Tableau is a solution to the problem faced by the first era.
Right. Are you following, are you with me so far?
Kostas Pardalis: Yeah, of course.
Huy Nguyen: Yeah. Yeah. So, so, so, so then there come the problem with the second era right. Of what we call the metric knife fight. Right? So the problem with the second era is because it’s so decentralized, it’s so decentralized, and people started using this data through a workaround routes without going through central IT. It is fairly easy for the data and to come out wrong. Now, the nontechnical user is the one that do the, exploring the building of the reports. And basically, a scenario would happen where you have someone from the sales department, say that the revenue is X, and then there’s someone from the marketing departments says the revenue is Y.
And then what happened is that, you know, each of them may extract the same CSV, but use the wrong formula, the different formula to calculate revenue. Each of them, may extract a different CSV, because one CSV is stale data, and one is not stale data. Right. And then this will become a disaster because imagine that you use the wrong data to report to your director, it’s just things like that
are going to happen. Right. And then it’s become a total mess, right there. Does that make sense so far for you?
Kostas Pardalis: Yeah, absolutely.
Huy Nguyen: Yeah. So, on the first era people have, you know, people have little access to data, but at least because it’s going to central IT, they are experienced people.
They, you know, the double check the data and the data is not being all over the place. So then the accuracy of the data is correct, but in exchange for accessibility of the data. In the second era where the data has been, decentralized, anyone can extract the data from a system in reporting.
Basically you get an abundance of access to the data, but in exchange, you don’t have the accuracy of the data, which is very important because if you don’t trust the data, you will stop using it altogether to make a decision. Yep right. Then there come the third era. Right. And you know, basically, basically it was, we say, Hey, okay, there’s this friction between the business user and the data and the IT, the data team, right?
The business user wants access to the data. But at the same time, the data slash IT team wants control over the accuracy, the consistency of the data. Alright. So that’s where tools like Holistics or Looker comes about. Right. You know, instead of letting the business user download the data and build it or report in tools like Tableau or directly.
Yeah. Or lock them out altogether and ask them to lock into a central system to view a predefined report. And there’s no way for them to ask ad hoc questions. Tableau and Holistics expose a semantic layer of data, a modeling layer. Right. And then instead of building the report for every single request from the business user, the data team, the IT team, only need to work on maintaining the data modeling layer to make sure that all the data business logics are properly recorded on the metrics.
And defined clearly in the modeling layer. So then, so then this will be exposed. As I mentioned earlier, exposed as a BI interface for, for the business user. So then they can, they can still get the decentralized experience in the second era, but this time they don’t have to rebuild everything report from scratch again with maybe using the wrong formula.
Or you don’t have to download a CSV from somewhere in another system to load into the BI anymore. Right. They use the the data from the data team that the IT team provided to them. They use the definition of the metrics that the IT team, the data team prepare for them on their need to do is just explore on that restricted, you know, flexible, but restricted interface to get that data.
Yeah. So, so that kind of the third era, right. That’s already happening now, right? It’s not as clear, clearly obvious yet, but I think that’s going to happen sooner or later, does that make sense?
Kostas Pardalis: Yeah, absolutely. I think you managed to do an amazing description of what has happened in the BI markets from its creation up to today. And I think we still have very exciting things to see about what’s happening in the future and yeah, I’m looking forward to it and I’m pretty sure that Holistics is going to be a company that will make some of these new things happen. So having said that and moving to the end of the, so for today, one last question.
Would you like to share something about Holistic that is coming in the future. Something that is really exciting for you. And you would like to share with our audience?
Huy Nguyen: Oh, thanks. Maybe not so much on Holistics itself, but let me come in and do a little bit more. On another trend that, that we are seeing in the, in a data analytics space, which, you know, at holistic, we also trying to figure out how to, how to tap on. Would that be okay?
Kostas Pardalis: Yeah, of course, of course.
Huy Nguyen: Yeah. So, so I think the other trends that we see happening, which, you know, you can say that the fourth wave or the fourth stage of BI or analytics in general, that we think is going to happen. Is the fact that a lot of which is I think this is already happening, right? A lot of basically the analytics is people are actually taking the learning from the best practices that happen in the software engineering space or the dev ops space over to applying to the data space and analytics space. You know, basically whatever principle is that in the dev op business they are applying like CICD , you know, continuous delivery, agile development, but you didn’t apply over to the data space. And people call it data ops. Or as I think Tristan Handy from DBT Fishtown, he coined the term analytics engineer. Analytical engineering, which also fits nicely with, with, with that kind of trend.
So among those trends in the basically applying software engineering principles over the data, one of the one of the key elements that I see happening is that the use of a call or text to represent a logic in the data. If you, if you look at the infrastructure space, there is, you know, there’s obviously tools like Terraform.
Are you familiar with Terraform Ansible?
Kostas Pardalis: Yeah, of course.
Huy Nguyen: Yeah. So the Terraform Ansible allows you to write code around a text, you represent your entire infrastructure, and then you just run a comment to kind of recreate the infrastructure on the cloud, right in your production. So basically there’s no more log into the system, UI drag and drop click here, click there.
Everything is code, is, you know, is coded as text. And that has amazing benefit. Right? It’s allowing you to enable automation. It enables maintainability. It enables reusability, enabled clarity of logic. It’s a simple practice, a simple mechanism of using code as code to represent infrastructure has a bunch of multiple benefits to, to, you know, to, to the company.
Right. So what I’m seeing happening slowly is that that has been applied to analytics, right? What we call it, analytics as code. Right. And I mean, codes with Look ML, you know, tools like DBT among the first tools to adopt either consciously or subconsciously to adopt these practices.
And I see more and more tools, I’m sure more and more will basically catch on to, to adopt these practices. Does that make sense?
Kostas Pardalis: Yeah, absolutely. I mean, and I totally agree with you that this is a huge trend it’s actually currently forming and we see more and more best practices from both development and engineering, but also from infrastructure management, because as you very well said and mentioned ,Ansible and Terraform.
We see at this parts coming also to the management of data and how to work, how to use these kinds of products to accelerate productivity and increase quality and solve many of them and solve the problems around working with data from the past. So, yeah, that’s really exciting. And I’m really interested to see what happens there.And I hope we will see things happening in this space also from Holistics. So Huy thank you so much. It was great chatting with you today. I’m pretty sure we will have the opportunity to chat again in the future. I think we need a couple of episodes at least to cover all the different things we can discuss together.
I would encourage everyone to check your website. I know that you have an amazing wealth of content there. So I’m pretty sure people can find some very opinionated and interesting stuff around data BI, all the stuff that we discussed together. And of course give a try to Holistics.
Huy Nguyen: we also wrote a free book. I mean, I mean, sorry about that. We wrote a very free book for those of you who are basically wanting to get a better understanding about the data and BI space. You know, we wrote a free guide book to explain that the BI space or the analytic space in a very layman term, you can check it out on a website.
Kostas Pardalis: That’s great. I would encourage everyone to go and download it and yeah. Thank you. And I’m looking forward to chatting you again in the future.
Thank you Kostas. Thanks for having me. I appreciate it.
Eric Dodds: That was a really interesting conversation. I think their approach to separating various components within the BI ecosystem is fascinating, but Kostas what piqued your interest and what did you like most about that conversation?
Kostas Pardalis: Yeah. First of all, it’s like pure delight to
Huy Nguyen: chat with Huy. I mean,
Kostas Pardalis: it’s been like more than 50 minutes, probably our longest episode. And I feel like we still have a lot of things to discuss with him. Huy is an amazingly aware person around what is going on with the BI space and in the anything that has to do with data in general.
I think the whole conversation that we had around the evolution of the BI market and the products out there was great with the three different phases. How things started, what was the second wave of BI tools where we stand right now? And what’s the future? I think the team there has a very crystal clear actually vision of what’s going to happen with the BI space and they are executing pretty well on that.
It was a great mix of both business and technology related insights. I think it was very interesting part when we were discussing on analytics as a code. And we see that a lot happening lately where we have companies and product like DBT, LookML, from Looker, LookML was a big part of the success of Looker.
And we see how DBT becomes one of the most favorites tools for data engineers and how the same approach can also be used in the BI space, in general. We had the opportunity to even chat about Snowflake, the different data warehouse solutions. We literally went through the whole data stack and Huy said with us, he’s experienced from the BI point of view of every single parts of the data stack. And that was extremely interesting. Unfortunately, we didn’t have, actually, we didn’t have enough time to go through everything. I’m pretty sure that we will have another, at least another call with him in the future to revisit some of these topics. And also say what’s Holistics is going to come up next in their product.
They are really building an amazing product. And it’s very interesting to see how they are going to progress.
Eric Dodds: I agree. Well, we will definitely schedule another call with their team and we’ll catch you next time on The Data Stock Show.