Episode 109:

How Does Headless Business Intelligence Work? Featuring Artyom Keydunov and Pavel Tiunov of Cube Dev

October 19, 2022

This week on The Data Stack Show, Eric and Kostas chat with Artyom Keydunov and Pavel Tiunov, the CEO and CTO of Cube Dev. Together they discuss the ins and outs of Headless BI, from data artifacts to visualization and more.

Play Video

Notes:

Highlights from this week’s conversation include:

  • The context of Headless BI (3:31)
  • What Cube Dev does (9:24)
  • How Headless BI works with other tools (13:03)
  • An analysis of LookML (18:04)
  • User interaction with Cube Dev (23:40)
  • Who manages data artifacts (25:22)
  • Taking care of the developer experience (30:37)
  • Levels of performance (30:37)
  • Artyom and Pavel’s background and career journey (35:47)
  • Why you should use Cube Dev (43:38)
  • Roles within a data organization (48:55)
  • How Cube Dev impacts visualization (53:35) 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Welcome back to The Data Stack Show. Today we are going to talk with the people who started cube, which is a super interesting tool in the analytics space, Kostas. They, they claim that they’re headless VI, which is super interesting. We may have had the concept of headless come up, but I don’t know if we’ve had a company come on, who just explicitly calls themselves headless, which is really interesting. And that being in the BI space is, is really interesting. So my burning question is actually just around the topic of headless itself. Like a lot of things we discussed on the show, you know, like, say CDC are other topics, like, the technology, or the concept itself isn’t brand new, but the way that it’s being implemented is actually pretty new. And I think he was doing that in the BI space. And so I just want to ask them about, like headless as sort of a concept and how that plays itself out in various forms of technology. And, of course, specifically what that means for BI. But they talked about data modeling, and all that sort of stuff. So I know you have some, some burning questions, too.

Kostas Pardalis 1:46
Yeah, well, you first of all, like whenever it seemed like the drum shadeless has nothing to do with Cube. Like, for some reason, I always expect something watching like that dim button movie or something to go by.

Eric Dodds 2:01
Of course. Tim Burton movies— We’ll have Tim Burton on the show and we can talk about this.

Kostas Pardalis 2:13
We should do that. Absolutely. Yeah, I’m pretty excited like you talk about Bob, I think what we are seeing here of like some very interesting software engineering patterns. They were getting, like deployed like when the more petrol level. And I think that’s well, we’re also like happening here where the handle is behind like Holly near the couple things so you can have like more control? So yeah, I’d love to hear how they do. It’s widely economics and how it is used? And most importantly, what are the challenges?

Eric Dodds 2:54
Great, well, let’s dive in and talk.

Artyom and Pavel, welcome to The Data Stack Show. We are so excited to talk with you about both analytics, and headless BI in particular, but also your story. So thanks for giving us some time on The Data Stack Show.

Artyom Keydunov 3:14
Thank you. Yeah, really excited to be here today to chat about data and headless BI.

Pavel Tiunov 3:19
Thank you. Yeah. Hi, everyone.

Eric Dodds 3:21
Okay, I’d actually like to start by sort of setting the table around the context of headless. So headless is, you know, as it as sort of a conceptual, like, data flow isn’t actually new, right, it’s sort of one of those things that became popular in like, various forms, and, you know, sort of gains the term headless, in contrast to existing technology, and I’m not an expert, this is just sort of, you know, my perception that I think one of the first big public-facing marketing pushes that we saw for headless was the CMS right content management system, right. And so people are used to dealing with like, enterprise, like brutal enterprise content, mandus management systems, like, you know, Adobe Experience Manager, or, you know, at scale like really gnarly WordPress, you know, stuff. And so like, you know, which not only from like a data and user standpoint was tough, but like, from a performance standpoint was really bad, right? So headless, comes out of solves a bunch of these problems. Can you talk to us a bit, especially speak to the listeners who I’m guessing a lot of them know what headless is, but like when we say, you say BI, like you don’t think about headless as a first-order concept. So could you just help sort of go back and explain what is the concept of headless and maybe even break down the technology flow? And that I think will help us understand like, why it’s important for BI.

Artyom Keydunov 5:02
Yeah. Yeah, definitely. I think the headless CMS actually is a great example to overall, I think they do have the headless lives in a decoupling of the visualization plane and weird data plane, or like a sort of a control plane. And that’s not like the concept itself is very old. And you know, you can find roots of this concept in a software engineering and look at patterns and software engineering and all of this. When we have models, we have controllers who help us right, we try to decouple logic and decouple, you know, like responsibilities of different pieces of code, how they operate in, in a software. And I think when we roll up that idea on a higher level, like an application level, we started talking about, like, can we apply the head Muthi gear and olika decoupling pedia, to the application layer, learn. And one example is the headless CMS where we decouple visualization where, basically from the data layer, right, so with headless CMS, you can, you can have a lake, the database of your content. And when you create this database, you don’t think about how that’s going to be preserved or tried because the presentation layer is a different front-end development team. They can use Gatsby they can use next Jas all of this like folder of jumps technologies to present data however you want, try some new dual outta control as a data to the contract manager, you do not control the presentation layer, and they communicate with each other over the API, I think was headless BI. The same idea is, is it possible to decouple the data model layer with visualization there. The good example that many people talk about is the likable care because they have such good data modeling. They’re like a Luca Mel and many times when we talk about the headless BI concept and idea many people they build an example around lookers, like what if I can have a look here was that look ml but without the visualization coming from a looker, right? Like what if I can still use look ml, but then use any realisation any other desperate tool on top of them I look ML model. So the idea is here. What if we take the BI we unbundled the BI into the data model layer? That’s one piece Miss caching, access control all other layers that kind of naturally belong since the data model the data plane, and then we’d have a separate visualization layer, decoupled from the data model. And Apollo Europe, you talk and think a lot about that Padlet concept fulfilled maybe if you had any thoughts?

Pavel Tiunov 7:48
Yeah, I guess all That’s right. And I guess the whole idea around like this headless part is basically, whenever you’re building something, within reason BI, you’re actually locked in with SBI. Because you’re building logic around your data. And that’s when you’re building security controls. So when you’re building rollover security, and when you design, cool can see this slide they’re all identical, okay, cannot and do advanced calculations. On top of that you’re actually building logic inside visualization where the reach is basically when they’re looking too many users. And it’s why we actually started being asked to can we just were a given a sequel, and it was a force where you’re like, crazy idea. But over time, we realized it’s actually it made sense to grab this like feature?

Eric Dodds 8:55
Could you dig into that a little more? So like, when you say you started getting? And actually, maybe we should step back a little bit? And can you describe what Cube does we sort of started off with like the baseline concept of headless? But can you describe what Cube does and why users started asking you to query tea with SQL?

Artyom Keydunov 9:19
Yeah, I guess I can start there. Pavel, feel free to follow up. We usually talk about Cube have we done any more with like four layers of feature that que pas and, and eventually we’ll cure already, right? So imagine, look, ML is a data modeling career that’s done in college, right? That’s a foundation of if you so every other feature is pretty much a downstream of the data model. So the first thing that cube has is a data model. So you food you chew on top of your data warehouse and you start building the data model view. What are your data sets? What are the measures? What are the dimensions how Data Sets relate to each other through building to sort of a data model similar to LookML data model, then inside the data model, you can define how you want to cache the data or you want to refresh the cache, how what is the access control rules or security controls, right, who can access the data and how the data can be assessed. And then finally, cube provides a set of API’s, we provide a REST API graph, QL API and a SQL API. This way the data can be the data model, and the data source and data model can be accessed from all these downstream tools. The very sweet spot was a cube. Pascal Tascam deal, what sometimes called data and so sometimes called embedded analytics, right? You build in a software product you have in dashboards, you haven’t some reports, features built in your product. And you want to expose that features to your customers. And most likely, you’re going to build with something like we are going to use maybe the one charge year for charting library, d3. JS, you want to build really custom native experience by the date of play like Data API, that cute console that in that case, their users will queering the cube through the rest API’s through the Graph QL API to get data, and then display it to the customers and cube will provide the data model caching security to all of it. And then what Pablo just mentioned all the sequel API, that’s where like, many of our community members, he started wanting to use different BI visualization for dashboarding tools, megalith Metabase are actually super sad for hacks, just to connect from all of his tools to cube and to be able to consume data not directly from the warehouse, but consume data models built in Cube.

Eric Dodds 11:50
Fascinating. Okay, I have a ton more questions, but Kostas, there’s so much in there that I know is going to be interesting to you. And I’m interested to know what questions you’re gonna ask, so I’m gonna pass the mic over.

Kostas Pardalis 12:03
Sure. Let me start with a question on like, what you were just talking about our trade show? Can you explain a little bits in more detail how headless BI works together with the rest of, let’s say, the common tools that we have in the data stack out there, right? Like, we still need some kind of geolocation and get us to do that. And then we also have a query engine somewhere where like, Dave, I get scripts, and all these things. So we found like, white assumingly bots there. So how do we orchestrate all this together? How do we work with all this? scalar? And wait, should we start from when we build a new site?

Artyom Keydunov 12:54
Right. Yeah. I mean, that’s a great question. So and I think that obviously, what I believe that Apollo was the linchpin, Molly could data warehouse or data lake-centric architecture. So what we see iser on data storage and compute sides, right? We usually see as a data warehouse, or sort of something like a Databricks or Trino. You know, like, it seems like more like a query engine, like a data lake architecture, right? So it’s usually one of those two. And obviously, there are like a lot of magic Lake RudderStack, getting events into this, right, like ETL things like loading data from different places. But in terms of, you know, like a compute and storage, that’s what we see, usually with a stack, and then it would depend on the use case. So if we’re talking about consuming data internally, such as a dashboarding, group, reporting tools, right, a use case usually would be that maybe different teams and organization wants to consume the data differently. Like we have a tableau for one team, we have maybe Jupyter notebooks for the second team and Metabase for the third. So in that case, that stack would look like that they will use cube on top of the Databricks, something like a Snowflake. They will build data model and few they also can transform data with tools like a DBT and Snowflake that we see that’s very, very common. So they can try a data engineering team can transform data with DBT upstream and then they put a cube on top of this cube will create this the metal layer or semantic layer, right sometimes, it’s sometimes called just data model. So we usually we use term Data Model X cube. So data engineers info field, the data model, they’re in Cube, and then they will expose that to Tableau. They will explode that to the Apache superset of Jupyter Notebook. And then all the teams they will just consume data through the store but they would not consume data directly from the Snowflake, right, but they would consume data through the queue. And all the mathematics, all the data definition, everything would be defined in Cube. And security controls would be defined in Cube as well who can access what kind of data, right? If you wanted to mask some of the fields or provide Row Level Security, and final application would be done in Cube as well. So just to make sure that the data is cached, on the same in all it was the same rules for every downstream clause. So that would be used architecture for internal use case, when we’re talking about exposing data to the customers, right, in many cases, infectious and same data set and same data model definition. So what would have gone is the developer team would build on top of the API from Cube, some sort of, you know, like React application was charting libraries. And they would use this as part of the front-end application serving to customers, right? In that case, that’s going to be like embedded analytics, right? A data app does go through the cube. So from react, it will be like, REST API with Graph QL API call to the cube cupel to data modeling, processing queries and Snowflake, right, get data back and then fence to get to the front end. But the idea is just like, regardless of the use data, downstream data application, they all goes through the cubes of sort of put centralize the data model.

Kostas Pardalis 16:24
And is careful to say that’s a good way the others you yours only for BI or let’s say the rest of interaction with the data warehouse will also have my dream job, let’s say that engineer and I’m mainly building ETL. pipelines. Is this something that I was still should do a zoom to Bories? Like something that they bite by? It’s completely like I’m not even aware of, like, bigger?

Artyom Keydunov 16:57
Yeah, I think in that case, it happens more upstream from Kylin. Big, I would think about QP, Mollica, downstream. So ETL, pipeline data collection, transformation itself, that happens upstream from cube, and then cube usually works with data, either raw data or data already transformed by to Qlik and TPG.

Kostas Pardalis 17:19
Mm-hmm. And you mentioned, you mentioned Looker, and so you can marry. So based on your experience, I mean, what are let’s talk a little bit about logins, first of all, in LookML, and many I want to ask that because I think that the introduction of VLOOKUP well was like a very interesting thing that happens, the market. And it’s been around for a while. So I’d like to hear from you, like in your experience, like, what do you think about LookML? Look at the things that they did, right? And a couple of things that you think they did wrong?

Pavel Tiunov 18:02
Yeah, I can take this one. Yeah, I think like luchar and LookML is definitely a unique piece of technology. However, if you tried to remember what was before you, you can find out like different BI tools, which basically have better the same approach as I will come out and basically introduction of data model in a declarative way, where you define your measures dimensions, and then basically our web model and basically OLAP cube. In a sense, what Looker introduced Well, and what it introduced like at first it was so cultural lab based model, really relational web, because it can sit there or not. Look, you’re started when data warehouses started to be really responsive. So you can do live coding. And if you remember before that it’s actually a must-have BI tool store on the like working on download a copy of the data. And look, you’re one of the first tools who was working on the left gurnemanz And right now it is it is standard for all the tools so that so this thing was done Rio right. And another thing they just introduced this data modeling can then new Well, which allows you to define all the relations between all of your basic basically tables and but in fact, it’s more like cubes, or iron lookers so-called views and alsek relations between them. called Explore rich where you can define this, like, Jordans. Right. So, and this was done than video. And it felt like, race it clear. It’s really, really demand for it. And that time from what we already discussed and what we can do better is basically visualization part, which is, which is in fact, not really so great. And it’s if you compare it with use Tableau is where are you meeting? Yeah. But in fact, the modeling clever itself is actually a very powerful and if you separate this as a like separate product, it makes sense to palette, like general.

Kostas Pardalis 20:52
So how, how model cover carbons into every cycle? How do you cover long blogs defined by similarly to look at miles or by herbs? You do it in different ways?

Pavel Tiunov 21:10
Yeah, I mean, we have a concept called cube that we get a name. So the cube is similar to you in, in looker. So it’s basically the reference to the table in your data warehouse, in a more simple way to just like select star from a specific table, rather just a reference to some physical table in your database, it can be a little bit more complicated, like to write a select statement, right, and then it puts a table by the end of the day. But it’s just like a more like dynamic table, right that we define and apply. But every cube is backed by the table. Once you have a cube and a table, you define measure measures such as basically the reference to the columns, but you apply aggregations to this references, right? Like, the classic example would be you have like a product, you know, like a mount column radio plays Samba and then the total amount of the products, right? It’s an aggregation, that’s a measure as they always have obligation. And then you have dimensioned dimensions, usually just mapping to the columns, right? There’s a properties for a single role, very common is the same letters, as LookML does. And then in cube, you also can define relationship between cubes. So if you have a cube orders, you may have cube users. And then you can define the users. They can have many orders, right and orders, they always belong to the user. And you can define this relationship that’s useful when you query that when you want to get analysis straight knowledge of orders by users, a show by users country, right? A cube already knows about this relationship cube knows the data graph and then can construct the correct sequel. We didn’t have a concept of looks for the same as poker, but we just define the relationship and look of the cubes on a different level. I think we all were thinking it more. There is a new interesting project from a looker called Maloy. I think we are looking at Maalik and direction of this lake. Malika, closer to Malloy, and from the joints. perspective, President LookML itself but it’s all, you know, like details. But overall, it’s to where it was same concept of making the data modeling, cubes measures and dimensions.

Kostas Pardalis 23:35
Mm-hmm. Make sense. And how is the user interacting without this, like a markup language that is used or is like a user their face? And I would say, like the UI pretty well, my colleagues in the experience of the user have liked the product.

Pavel Tiunov 23:55
Yeah, and we are indeed most engineers. So we did it in code. And I also believe that that was one of the greatest innovation of Wilkerson’s, we mentioned it, they put data modeling in a cold tried to data model and being in BI for a while. But it’s usually been done through the user interface is, you know, like a drag and drop. And I believe that should be done in code. I believe we should apply best software engineering practices to the data management. And it’s all starts of putting that in a code, putting that code under version control. So we’re doing that in code two. For our case, right now we’re doing like a JavaScript JSON based data online. However, we got a lot of requests from a community to support YAML as well, you know, like, which is lookers doing Yamo based. So I think we’ll support YAML soon to but right now, it’s like a JavaScript JSON base.

Kostas Pardalis 24:52
Okay, based on what you’re saying, like we are talking about what some pretty technical artifacts that have to be managed. Who is the owner of this artifacts? Is it an analyst? Is it a data engineer? Who is, let’s say, the person inside the command that has ownership and is responsible for like managing these artifacts that come from Cube.

Artyom Keydunov 25:18
Yeah, I can take this one. This is a great question. And basically, data ownership is, I guess, one of the emerging areas, it’s still so. So in fact, as you may imagine, is, like Cube data model is, in fact, just GitHub repo. So and you have, like different data modeling files, where different part of your data model is defined. So and, in fact, you can, what we usually see with music teams, in fact, teams can apply the very same forces they applied to call, for example, you can have multiple directors, and multiple planning departments that engineering departments can have different rules on Origin PRs to those directories and different review policies, you can set up those voices in GitHub, using cron O’Neill files, for example. But you can use in more advanced workflows if you have, like, different than the price tools for your basic source code management. But I idea is quite simple. In this data, modern Claire, in fact, we’ll have like governors from like different data engineering groups. So what we are also working on as metrics, like the LoRa, our data catalog. So in fact, it’s basically the catalog feature will allow you to see who is an owner of different pieces, like different pieces of your lead model. Yeah. So but it’s still in progress.

Kostas Pardalis 27:09
All right. So we have a model that is represented in JSON constructs that you’re monitoring, and I would assume, but like, what happens is that this piece of JSON gets translated into SQL and bang, executed on the data warehouse, and the results are returned back to the user. Right? Right. What is the maximum your experience like so far, because there’s some cogeneration, like, what is involved there? And the reason I ask is because of my experience with liquor, because things kind of like retail at some point. It’s really kind of like to build and like doing it, no plastic bag, on the data warehouse, right? When you have so much like autogenerate, and SQL. So how do you take care of that, like the developer experience around that when it comes like debugging and being able to understand at the end, what my tool is doing, right?

Artyom Keydunov 28:14
You’re actually spot on. That’s a big problem. And it’s interesting said, we, we realized that too, because cube started as an open-source project. And we had a community, we have a community now. And we started to get a lot of questions like about how to, how to debug how to understand what’s going on. And we actually we realized that an opportunity for us to build a commercial product, because we were like, when we were Baghdad, and we didn’t have a commercial product. And we’re thinking about what he would build our commercial product would look like, because, as you know, like, it’s always a question for, you know, like, for open source companies, right? Like, how do you go from open source product to a commercial product, what features you should have, right? That’s every Mitsi. When we’re raising a seed round article, it’s like, you know, like, what features are granted, built in a commercial product and bags, and we did a talk friend, like, we all will figure it out. And I think there’s like a debug and trace and developer experience, that was the biggest one. So we actually rebuilt a good set of tools to help knowledge navigate queries, how they’ve been executed, cross reference them to go into data model to look what happened and to cache the source database. How did what was the queue status, all of that knowledge things? Frankly, it’s no easy answer. It’s like build this specific dose. It’s usually built a bunch of tools that can developers, the data engineers can use in order to debug this issue. And we’re still building them. But that’s, that’s, luckily, that’s been a really good, you know, like, sort of differentiator for a commercial offering.

Kostas Pardalis 29:56
That’s awesome. And good you want What’s your, let’s say live action with the data warehouse itself and like some of let’s say, the ergonomics that data warehouse typically has like views, define views, materialized views, which is something that sounds like important when you want to work like in the performance. So what are you doing there? And how is it seems like to use that stuff from the data warehouse.

Pavel Tiunov 30:33
Yeah, those are all great questions. So like, in terms of performance with have BI really multiple wares to achieve their results. The very first layer is basically this one lies right to my child side with you, we just discussed in the beginning, so you can transform data before the cube. And people usually will use DBT, or else war or any other ELT post-transforming tools just to get the data in shape, before getting to the cube. And at this point, there will be news materialized views created and asked in a queue view, just right select for from this views. But this is just the first step, but many within what boils down to joins and many relationships. That’s where it’s where I can get like really complicated. And at this point, we provide two web caching system. First one is in memory is enabled by default. So it’s basically every query result, which you’re going to give and back will be cached. And there are rules based on so-called refresh key, as our bank-based or SQL based, that can tell you for how long they want to get those results. So this is one where That’s where is basically aggregate awareness in qubits, cooperate, free aggregations. So you can perceive the data, which was in the warehouse, for example, there is a complex join. And the you have materialized views and your Snowflake, but inside, you’re joining them, and there are virus compositions of the joints. So you can join this in a data warehouse and your sister zero labs in cube cache. Caching queries called cube, sir, is N NS basically, really, highly scalable cache, it’s designed to store like billions of rows for all damn data. It’s not designed to store like raw data. So there is a single purpose for cube stories download, premium to roll up and serve it effectively. So in designer to store really sizable overlaps in size, it can be like in hundreds of gigabytes. And so in basically graded on a scale, so the weapons the time is low. So we are aiming sub-second response as.

Kostas Pardalis 33:27
Okay, that’s not super interesting. And this classroom is hosted by Cube? How does the user like interacting with the gas like something that like leaves on your own, like on your systems? Like if someone wants to have something on Bramblett? Say, like, how does this work?

Artyom Keydunov 33:48
Yeah, it’s a great question. So that the cache itself is as an open source so that those bars are open source. So what they offer on the Cube cloud is a basically, Enterprise Runtime. So for this technology, which reaches for example, for cubes, or if it’s available on demand is very same way as you access BigQuery. So you don’t cause the query, right? And you’re using the service way. And this is the worst same for cube cloud, but essentially technology the implementation reference and conditions are the same, but the runtime is different for to cloud.

Kostas Pardalis 34:34
Okay, that’s super cool. All right, guys. I asked like, quietly, clearly would say one technical questions and I want to ask something before I yield the microphone back to where it has to do with your own journey. Like I certainly like all this time, and I hear from like, two deeply knowledgeable Are people about what the market needs? And well, that technology also means to have rights. And it takes time. That’s right. Again, I’m not going to argue like how smart we are on this new art, but doesn’t matter. Experience will always take time. Like McDonald’s Zipline Bigeye mentioned, unfortunately. So that was a little bit about, like, your experience how you ended up, like starting to, like two years ago, if I remember correctly based on like, LinkedIn and what you’ve done before and how these things are linked together.

Artyom Keydunov 35:40
Right. Yeah. Well, we’d love to share our journey. So we actually Pavel and I, we met when we started to work on a company called Stand spot. And that will company that was before cube and idea for that company was to build a Slack application. It started all as my hobby project. I, it was 2016. So slack, or even 2015. So slack was growing really fast, more and more companies started to use it. And we started to use Slack at my old company. And it was raining engineering good company called hit numbers were like doing educational tech, are we building software for schools, for kids in schools, mostly on the East Coast. And we’re like using Slack a lot. And what I wanted to build, I wanted to drill the flag applications that can get data from different places from different systems. I think we use New Relic and some spring cPanel, maybe Google Analytics, Mitch, Adrienne, we had a bunch of Postgres databases. So what I wanted to do, I want her to get data from all of his places, and just put it on Slack or it sort of knowledge control plane, and monitoring Qi. So I build some sort of slack integration. I put it out on Reddit, after I realized that it was very helpful for my team. So I put it out on Reddit patron use all of these places, so people can start using it. And people started to use it, that will Federalists, that was fun, and then slack reached out. And they will, they will say that they wanted to launch a Slack application directory, and they know that stock board was one of the bots had been already used by different teams, so they want to put it, you know, like on our owners application directory. So red was great. And I was like, yes, let’s do that. But that came a lot of traffic, like a lot of people started to use it. And PowerShell was one of them. You know, like, she started to use it. And she texted me, we kind of chatted, and then she joined to help me because I was really like, I had a Ruby on Rails application. And that was not scaling well. And it’s like, I really, I really need as many hands, you know, like, I, I couldn’t get to be able to scale that. So Pablo joy if he did a lot of magic, and it started to actually be stable. And then Missy started to reach out and they were like, Would you like to raise money for the project. And we didn’t consider that further. Because we thought that would be a hobby project, like a side project that could you know, like, make somebody but then he felt like let’s give it a shot. And I was in LA back then moved up to San Francisco, we went to the 500 startups accelerator. So that was a lot of fun. That was my first exposure to really, you know, like a startup ecosystem. And that was that it was like a lot of, you know, like, accelerator, you know, like, sort of, you know, like style Codsall, and all of that. And really, really fun. And we, we continued to work on Stanford after the accelerator, but over time, no, like, we were like building more and more technology that eventually became cube, because we needed a technology that would help us to get decK data from different places, and to build some sort of data model layer. And then we would apply semantics with the natural language. So people would be able to query that source of slack. We added support to teams, we even started to build support for voice ads, because we thought that maybe no voice would could be a big thing. So idea was for us how we can build an engine that can potentially support main interfaces, right, exactly the headless VR use case. But over time, you know, we realized engine was more valuable than product that we’re building around it. Some of the companies we started to work without digging deeper to understand how they can use the cube as an engine to power some of their internals projects, or some of their external customer patient complications. So and we started to dig deeper and eventually that led us to a Diego, let’s give it a shot, Laker it’s open source the engine, and let’s see, you know, like, maybe people would find it useful on its own. So we all from source that added it click, so people started to use it and when we chose that, and we just decided, let’s just kind of see what prompts that’s about to queue because it seems to suggest a bigger, bigger theme, and, frankly, just more fun to work on. I mean, we are all engineers, right? And it’s like building more engineering thinking just like sounds and it just more interesting, right? So then, like, that’s like application was fine, but you can’t compare it to the data entered. Right. So yeah, that’s the story. And then, you know, like, we’ve been doing on the open source for a couple years. And then only recently, we released cube cloud, our commercial product, which is super, super early still.

Kostas Pardalis 40:52
That’s awesome, guys. Like I hope people like we get inspired by the Germany, I think it’s really important to share, like, when you build stuff, there’s always a genuine rollz and midlantic. Excited. So like, there’s no like Masaryk deal of like waking up one morning and coming up with an idea on something the next day it like, tenfold worldwide and doesn’t work backward. So I think it’s important for people like to hear this kind of experience. Like right, thanks for just for pushing technology. But radicals allegedly. You know, like, Listen, my master, like reserved me like the girls and like everything has evolved. So wildly, extremely, extremely valuable, where people want to come here. Yeah. So thanks for sharing. Eric, it’s all yours.

Eric Dodds 41:42
Oo, how exciting. No, congratulations. Yeah, that is a super exciting journey. One thing I’d like to do so you, you did a great job talking about looking at now. And you know, some of the things that they did, right, and some of the things that they did wrong, I actually want to zoom out a little bit on the BI space and almost play devil’s advocate to Q, if you would, you know, if you would entertain me. So let’s say that I’m building a Modern BI function. So like, I have a data engineer, and they’re sort of doing all this stuff that’s upstream from Q, you know, in my warehouse is in a pretty good place. And at this point, they have a huge number of options to choose from out in the marketplace. Right. But, you know, people who have been around for a long time, you know, may sort of choose to run their like Tableau playbook. Their sigma, there’s no sort of Metabase, you know, sort of like open source visualization pieces. There’s like, metrics, layers, tools that sort of, like feed directly into it, you could use tools like hex, you know, what is the sort of motivation for me to use shoot? And specifically, maybe, like, do you see? And I guess that’s also like, in a sort of self-serve BI context. And I know you’ve said like, you do a lot of editing Analytics as well. Universe, there are a lot of tools that, you know, sort of drive that piece as well. But like, what is the motivation for someone who’s out there browsing the analytics ace, to adopt the cube methodology?

Pavel Tiunov 43:32
This is a great question. And we were asking ourselves at the beginning, really why people will use it. And it’s actually starting to build it gives them a sense of people and see what they answer us. And the answer is pretty interesting. So there are multiple use cases. And here. So first of all, you just told me there’s like a call, like banjo lunch or the tools like really, you need to choose the vendor and commit to it before you try it. And if you didn’t work with it a lot, you’re going to do a lot of investment beforehand. So then, the thing is, people actually want to avoid when the routine and the use case here, as I already mentioned, they want to avoid defining security at the eye level, which is very costly to redefine in any other BI tool, if they’re not sure if they’re going with Tableau or quick side or they want to Metabase so or, in the case they have multiple BI tools. They adds even, like close proximity to have really all the security controls set up in every day to consistently there is no way you can do it like in urbanization, which uses multiple data to send, like bed or sell security controls is one of the most significant part care for internal data consumers. Another part, which is you already mentioned the embedded analytics part. And so as we, as we started with these use cases, actually, a lot of people at first deployed cube to their customers as customer-facing analytics. And they started to ask, Can I use the same for our internal app, just not to redefine the data model? So and that’s when the sequel API, seeing think, upwards, like terms the first place? So but what we are at right now seeing what embedded these cases in row seven further filter is basically, people starting to ask, can we provide this like SQL Fe adore customers? So the customers want to use their data tools, but they can’t connect to data warehouse directly? Because for security reasons, and the data is not there someday, but our sales are customers, and there is no really great security controls in place to limit access all the different genomes to the data, but you perfectly solve this problem as well. Yeah. So that’s why.

Eric Dodds 46:42
Super interesting. And I’d love to know, could you dig into a little bit more around, like, the methodologies, sort of that people who migrate to cube for embedded analytics, you know, serving analytics in their application? Like what are a lot of the sort of methodologies or technologies that you see companies using, before they migrate to cube for embedded analytics.

Artyom Keydunov 47:08
I think we see a lot of what we call integrated behinds. So you know, like insights, and good data, Power BI, they all have embedded offerings, right? The problem with that is usually because they are coupled with the data modeling that they coupled to the rest of the BI stack, they have very, very inflexible, so it’s really hard to customize, it’s really hard to you know, like to, to make them look and behave the way you want. So you know, like to where you have a lot of restriction for what you can do what you cannot do. And they always not, were very fast, but for exactly the same reason, because the coupling there, you know, professing cajon was a visualization, and you started to have like a 20 charts on the page. And then it started to be extremely slow. So, you know, like, clunky, not flexible, not customizable solution. And that’s a very common sentiment we’re hearing, you know, from a company that wanted to migrate to something more fabulous, I guess, you know, like, we started, we started these whole conversations Padlets was a was an example of headless CMS, right? For example, imagine, you know, moving from there, like Adobe is CMS or quanto peon or like, customized WordPress, to something more like the modern stack, right? Like, wherever you have Contentful. And then you ever been to Nigeria to get BTS application? So that’s pretty much the same story here.

Eric Dodds 48:37
Got it makes total sense. Yeah, that’s super interesting. One thing I’d like to do in our last little bit of time here is talk through your views on roles within a data organization or an analytics organization. So, you know, in the last couple of years, this concept of like an analytics engineer has been popularized, you know, I think, by DBT, and other, you know, vendors in the space, you know, it’s just sort of like a hybrid role between like, an analyst and a data engineer. You know, you have sort of pure analysts, right, you have, you know, data engineers who do some analysis or like feed some analysis. Within the context of Cube, how do you see teams using cube? And then what do you view as sort of the, like, ideal role of analysts within the data organization?

Pavel Tiunov 49:38
Yeah, I think that’s a really good question that we unfortunately don’t have yet answer. But I think it’s, it’s all you know, like, it’s always not a point of time, right? It’s always like evolution at some sort of vectors that Nolleke as technology changes, the behavior changes, you know, like the roles are changing. And I think that analytics, something really great ETL frankly, no is a great to have, you know, like to start to see the role. And I think the main, one of the main drivers here, and I do plan on best practices of software engineering today. So that, that moment that really created this role, basically. And I think we’re still trying to understand, you know, like how that should be in the industry. And to the knowledge landscape, I think we must place the data engineers or analytics engineers, or application developers have been made, like owners of the project, like those who are building to know, like data models. And then on a more like a consumption layers, right? Internally, we usually consider like, our users into two groups like America, builders, developers, and then consumer. So builders, developers, they really dude, engineers, analytics engineers, select Application Developers, based on use case and you know, like type of the company, and then what consumers could be front end developers, that actually, you know, like, if you build an embedded analytics, you must be going to consume, you know, as a front end developer, you’re going to use cube CPI, or if you use in maleic, that is a BI dashboarding piece of it. So that could be analytic. So for us analysts would be more like consumers rather than owners of the cube. But it will also depend on the structure and a culture of organization. Because in many cases, many organization would try to build a model where analytics engineers and data engineers who built all the models, and sometimes we’ll build the dashboards to, and then we’ll just give dashboards to the, to the knowledge to the business units, right, and when ideal is a business units would be able to look at the desperate from their own. So sometimes there’s really no like analyst role in some organization. But if there is, is there any analyst role, it’s more like a consumer, for the cube, I would say rather than those who build the data model, but again, every organization different, or sometimes we see data analysts building models as well.

Eric Dodds 52:15
Got it. Super interesting. The space is changing in so many ways, and with all the new technologies you see, like different roles and different tools, and all that sort of stuff. I’d love to ask a question about visualization. You know, because if you think about one of the core value propositions of cube, you’re decoupled at, you know, sort of the data layer some of that visualization layer. Have you found that that actually false drive better visualization? Or how’s that impacted visualization? I mean, visualization. Visualization is one of those interesting components of analytics, where you can stop visualization with good data or bad data, right? Like, visuals like it, good data makes visualization easier, but you can still spit it out, right, like good data is not a guarantee for good visualization. But it seems like the approach you’re taking a cube where you’re sort of modeling the data and making it available via API, can have an interesting influence on the way that people do visualization no matter what tool they’re in. Have you seen that to be true? Or how are you seeing cube packed into sort of the actual materialization of visualization and dashboards?

Pavel Tiunov 53:31
Yeah, that this one is a really great question. And in terms of visualization, there are actually multiple layers to it. So first one, it can for visualization, about visualization, which is for embedded analytics, and it’s what’s the very first value proposition of q, where you can build very custom-made and custom-tailored visualizations. So basically, you can build weary product native experience for your users, which can be embedded any obligation and those users can even not realize it’s something like behind the scenes, like such a cube, which is providing this data, and there will be no like way to distinguish it from your product, as opposed to most of embedded BI solutions where you, like just embed an iframe, and that’s it. And you basically, in your product, you’re looking at their iframe, that board which is looking is quite a bit different and doesn’t feel native. So and then it was very first world proposition of Cube and why it was successful in the very beginning. But then we started to realize it’s actually as it Good idea is to provide this custom their authorization where it’s also time-consuming for people to build those wares. Because those require front-end engineers, which is, which is not cheap. And, in fact, people wanted to use like actually data engineers to build like reports and data analysts. The first step, and that’s why also sequel guys started to pick up in an embedded space. So people just want to start with simple visualization, like embedded iframe, based on superset or Metabase, as the first person to just validate the product, and not the span for it front-end team resources. But data model itself is not so hard to build, sell, bear and building anyway. And in the point, they realize they want the custom chairs for some of part of the product, it just replacing them one by one, it will happen. So if you if they feel they need to different level of quality for their product, and this places to develop.

Eric Dodds 56:14
Pavel, thank you so much for that answer. You know, it’s funny that you mentioned Hi, friends, I think about while you like some of the worst web experiences I’ve had. And iframes are like a serious offender. Although they do make a lot of sense. It’s like an MVP for embedded analytics. But yeah, it’s super interesting to hear that that’s sort of the flow of actually implementing embedded analytics, which is super interesting. And super interesting that they’re starting really simple, even with like, the SQL query sucked. That array of data engineer can produce, like really missive. So it’s super interesting.

Well, I think we are out the buzzer here. But, guys, this has been a really incredible conversation. I’ve learned a ton. And amazing to see what keeps happening in the analytic space and the stuff that companies like you were building. So thanks for telling us about plot today.

Pavel Tiunov 57:13
Definitely, definitely. Yeah, thank you for having us. That was really great conversation. Thank you for all the questions. They were great.

Eric Dodds 57:23
Kostas, as always, that was a fascinating conversation, mainly because of me having to move to another room, and you hearing the cathedral-esque echo in my microphone. That was no extra charge.

Kostas Pardalis 57:39
Yeah, it felt really good. To be honest. It’s like this. Like it’s gonna be slightly good when you’re inside the church.

Eric Dodds 57:48
Yeah, it’s a natural read. So you’re welcome to that. No, I think my biggest takeaway. I think my biggest takeaway, actually, one thing that was really interesting was the core value prop of Cube is embedded analytics, which makes total sense based on the discussion we had around headless. But it’s fascinating to me that they see this current adoption around people using it for their own BI. Which is, which is just super interesting. And I think, in many ways, I think that that is possibly reflective of a product that’s really hit a nerve with sort of a workflow or a paradigm for how to manage certain things, right, where users will actually pull additional use cases out of the product, you know, without the company even really thinking about it. So that was really cool. That was my big takeaway.

Kostas Pardalis 58:49
Yeah, what I took from the conversation that we had outside before, like the technical details, like how is Atlas BI works? And like, what are the challenges? They’re like, interact? They are how they are? No, that’s not what we talked about, is how products change. Because of my V, say, the engineering class and how we do things, and that stuff like to do with students, there are too many things that are happening here. One is, first of all, we’d have like more engineering roles out there, like pretty much seem like more companies having white engineers working in their world, you would probably expect, what a couple of years ago. The other is that as software engineering is maturing and what happens and lessons learned from buying offline country like builds, products and technologies. Some of these patterns are more universal, and we can use them like robots to do other things, even if an engineer is not necessarily right. evolves, right? And look what we see here with this decoupling, like the visualization fog lifts. The model was there. So, yeah, what’s really fascinating? It’s like a sign of progress. Right. And so the product way, super interesting. And I’m looking forward to chatting with them again, like in the future, and in traveling, many more things to discuss about. We’re looking forward to that.

Eric Dodds 1:00:32
Absolutely. All right. Well, thanks again for joining us on The Data Stack Show. We will catch you on the next episode. Tell a friend about the show if you haven’t subscribed if you haven’t, and you liked this episode, and we’ll catch you on the next one.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.