Episode 165:

SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

November 22, 2023

This week on The Data Stack Show, Eric and Kostas chat with Colin Zima, the Co-Founder and CEO of Omni, a company building consistency of a shared data model with the freedom of SQL. During the episode, Colin, a former Looker employee and Co-Founder of Omni, shares insights into Looker’s impact on the data industry, its unique architecture, and the role it played in the evolution of data analytics. He also discusses the concept of an analytics engineer, the balance between governance and flexibility in data modeling, the future of business intelligence, the merging of SQL and Python, the vision for Omni, and more.

Notes:

Highlights from this week’s conversation include:

Colin’s Background and Starting Omni (1:48)
Defining “good” at Google search early in his career (4:42)
Looker’s Unique Approach to Analytics (9:48)
The paradigm shift in analytics (10:52)
The architecture of Looker and its influence (12:04)
Combatting the challenge of unbundling in the data stack (14:26)
The evolution of analytics engineering (21:50)
Enhancing user flexibility in Omni (23:44)
The evolution of BI tools (32:53)
What does the future look like for BI tools? (35:14)
The role of Python and notebooks in BI (39:48)
The product experience of Omni and its vision (45:27)
Expectations for the future of Omni (47:52)
The relationship between algorithms and business logic (50:51)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show. Kostas, I’m so excited because we got to talk with Colin, he’s at Omni, but he, I mean, literally helped build the Looker. Over almost a decade, seven or eight years, and Looker is I mean, has had a huge impact in multiple ways on the data industry, from you know, analytics to architecture to modeling. It has spawned entirely new categories of companies. And I am so interested to hear about what Colin learned at Looker that he is building his new company on, right, because I mean, Lookers are still a great tool. Right? So the people who built Looker, like what are they trying to build? I think that’s what I want to figure out. How about you?

Kostas Pardalis 01:21
Okay, I’m very interested to see how someone who is involved in such a successful product and combining stocks and other companies and other products in the same industry. So I want to learn about that. Like, what’s why and how. So I think it’s going to be super, super interesting to have this conversation with him today.

Eric Dodds 01:42
Yeah, I agree. Well, let’s dig in. Let’s do it. Colin, welcome to the datasets show.

Colin Zima 01:50
Thanks for having me.

Eric Dodds 01:52
Okay, you have a fascinating background. So give us the story. And you know, especially how you ended up starting Omni?

Colin Zima 02:00
Yeah, sure. So right out of school, I treated Synthetic CDOs. So think of them as credit instruments that no one needs anymore. As I was blowing up, I ended up deciding to give tech a try and got a job as a statistician at Google in search. So a small team of people that works alongside all the software engineers that are actually doing ranking for Google. And we help evaluate search ranking results. So we were sort of like the judge team for how search was working. I started a company actually with one of my co-founders at Omni and ended up selling that to a company called HotelTonight. HotelTonight was actually the fourth customer. So I led the data team at HotelTonight. Following that acquisition, I got very close with the Looker team and eventually said, Hey, I love the product, I want to come work on it. And I joined as around the 40th, employee leading originally Customer Success and Support alongside analytics eventually took over the product team kind of moved in and out of those roles, was there for eight years through the Google acquisition. Frankly, I got a little bit tired, just as we scaled up, and sort of the culture was changing and, and wanted to fire it up again. So that’s how we started off. Very cool.

Eric Dodds 03:21
So many questions about the background. But really quickly, can you just give us a quick explanation of what Omnia is.

Colin Zima 03:29
Yeah, so it’s gonna be very familiar to people that are familiar with Looker. But the core of AMI is that we balance the analytical process. So we give you all the power of a data model to write queries and self-serve for end users. And then we also give you all of the Freedom pneus and openness of something like writing sequel or extract based analytics. And the idea is that users can actually mix and match between those two versions of the worlds, they can move very quickly in a sort of sequel and free land. And over time, we help them build a data model so that other users can self-serve.

Eric Dodds 04:06
Very cool. Okay, well, I have to ask, of course, as a marketer, I have to ask a little bit about being a statistician at Google. Because I think we were chatting before the show that was, you know, sort of what time period was that when you were

Colin Zima 04:23
doing 2007? To 2011? Okay, wow.

Eric Dodds 04:25
Yeah. So like, man, that’s like, when search advertising was going through a crazy hockey stick? What types of projects did you work on? Like, what types of things were the engineers building the algorithm? Because it sounds like you supported the engineers building the algorithm, like, what types of problems were you trying to solve? Or what types of things you’re trying to understand?

Colin Zima 04:46
I mean, it’s gonna sound kind of funny, but we were just trying to define mostly what good is for search. And I know that sounds sort of obvious, but kind of similar to like a lot of analytics actually, in sort of businesses. Google’s using The mix between live ranking signals, so think about things that people click on. And then they’re also using objective or subjective, I guess, evaluation of ranking results. So it’s not just a black box that looks at clicks and promotes things up to the top of the result set. And similarly, it’s not just a survey that says, kind of, here’s one algorithm, here’s another algorithm, which one’s better, it’s a mix of those things. And the whole job was sort of creating the process and the framework for doing that sort of evaluation. So an example that I love to give, because there’s almost a mix of philosophy and statistics here is one of these queries that would always come up as Harry Potter. So Harry, and then P, O, R, T, E, R. And it actually gets into the philosophy of what people are searching for when they look for ranking results. Because Google evolved this idea of sort of correcting aggressively over time. And I think through how frequently does the user need to be looking for Harry Potter to intersperse or return Harry Potter results, versus exclusively providing Harry Potter results? Yeah. And how do you create frameworks for doing those sorts of things across sort of the whole surface area of search? Yeah. It’s this process of trying to use statistics to create frameworks. Yeah. But also sort of tying that to the logic of what people are trying to do when they’re searching.

Eric Dodds 06:27
Yeah, super interesting.

06:28
And

Eric Dodds 06:30
What was the output of your work? You know, sort of like this specific work product? Was that an input to the algorithm? Like, what did that exchange look like when you shipped a product

Colin Zima 06:41
or a product? Yeah. So I mean, the simplest way to explain it is that engineers are constantly coming up with refinements to the algorithm. So they’re saying, I have a new sort of layer in the way that we do return search results, and it changes search results in a certain way. So it might affect 1% of all queries sampled on some sort of weighted basis or something like that. And these are the results that my algorithm would give. And these are the current results. Like, how do we create a framework for deciding whether that change is actually positive for users? That was our team’s job was to create that framework, try to explain it to leadership, and then leadership essentially made decisions. And sometimes it’s cut and dry. It’s just like, we’re finding the exact thing that person is searching for 10 times more, very frequently, it’s a lot more subtle than that certain results get better, certain ones get worse. They change in ways that are not obvious. So we’re also creating a framework for how to evaluate those things. That was the whole process for what we did.

Eric Dodds 07:35
Yeah, super interesting. Okay, one more question. How did you integrate newer, but sort of like, peak trending search topics? And then optimize around those? And like, how did you wait for the work ? This is very timely and very important, like, plans or whatever? Yeah, yeah. So

Colin Zima 07:53
I mean, there are, the simplest way to explain is there’s lots of different modules in the search algorithm. And freshness is sort of a whole area of search. And one of those modules, we had to create guidelines for sort of how timely results get expired over time, and how much boost they get when they are occurring. So it’s like, it’s sort of all part of the framework. What is so subjective about a good search result? Like, if you search for, like a politician today, how different should the results be from yesterday, based on news and like, how newsworthy isn’t an event? Yeah, it is very subjective. But it was sort of vaguely trying to come up with how you describe these things. So they could be evaluated. And then obviously, using a mixture of click signals. So sometimes clicks can give you answers to these sorts of things in terms of what people are using for. There are also problems with clicks, like click bait is a thing. So you have to sort of adjust for things like that as well, in terms of how ranking works, so that’s why it couldn’t just be click signals. Yeah.

Eric Dodds 09:01
Well, as a marketer, I can say that you did quite a good job over time of really limiting the ability to game the SEO side of things.

Colin Zima 09:11
I mean, it was an impossible ongoing battle. And yeah, it was a micro piece of it. Yeah.

Eric Dodds 09:18
Okay, well, let’s jump. So, so many questions about Omni, but I want to jump from there to Looker. As it seems like there’s a really clear connection between, you know, freshness was a word you mentioned, tried to define what good is you mentioned analytics. And I mean, to me, it’s like, well, a lot of those things got baked into Looker. You know, because that’s, I mean, as a Looker user, like I’ve experienced a lot of those things. Is that true?

Colin Zima 09:49
I think certainly Looker had a unique take on the analytics world. I thought it was really interesting when I started using Looker and eventually joined. I remember the founding team so Lloyd and Ben took on Out of pride and not looking at the analytics landscape very much. That was good in some ways and bad and others. Yeah. But it meant that Looker did have a fairly unique perspective on how to approach BI in a way that was honestly scary for people. Like, we got a lot of criticism from folks like Gartner for the whole life of the company, that operating exclusively a database was crazy. I remember, we would get back a Gartner survey and there’d be 100 questions on your end database engine. And we just have to write na for six, the survey.

Eric Dodds 10:32
Did those analysts still work at Gartner?

Colin Zima 10:34
They do that. I mean, they’re still slowly coming around to the concept of database analytics as an exclusive way of doing things. And it’s kind of funny, because now it’s building on these buildings in memory layers above the database. So it’s like, we’re what’s old is the new constant. Love that. But I mean, we were trying to do things in a very different way. And that it was this really strict compilation of SQL down into the database, heavily governed. And like it was a backlash to things like Tableau, where extract was the focus. So I mean, we, in many ways, we had to sort of teach people the way that is normal to think about analytics today, which is like centralized data in a data warehouse, put something that looks like a data model on top of it and let people query freely in the database. Those were scary concepts at the time, like, one of the biggest reasons that we lost deals early, was because people couldn’t get their data in a database. And I think now the idea of buying Fivetran, or stitch, and getting that done might even happen before you buy a BI tool in many contexts. Yeah. So there was this sort of paradigm shift that was happening. And it was really Redshift, and then later Snowflake in BigQuery, that actually opened that up. But the idea of an analytical database that you could just really exercise heavily from the BI layer is sort of what unlocks the whole world.

Eric Dodds 12:04
Yeah, super interesting. And in terms of the architecture, like, where did that come from? Right, because I mean, look, are in many ways sort of introducing this entire new architecture? I mean, most companies today, I would say, are setting up a new data stack or trying to modernize their data stack. In our sort of, in a way, they are influenced by the architecture that Looker championed. Where’d that come from? A Looker? Right? Because like you said, it’s a scary concept. But it really changed. I think, a lot of the ways that people think and I think I mean, literally launched a lot of new companies.

Colin Zima 12:42
Yeah, the chorus pieces of the concepts came from software engineering, which was just this idea of layers that sit on top of each other, where an API from below is sort of what the layer above can work with. So like this whole microservices approach to doing analytics, like connecting to get building a code base model. So a lot of people don’t even know the first version of the look ML model, you only could interface through the command line. So it was truly a sequel compiler to start, it was not a BI tool, or I mean, it was a BI tool, but it was a sequel compiler first. And then sort of the BI layers flowed out from there. But the real core was just this modeling layer that could describe essentially queries in a more abstracted way. So rather than writing SQL with characters, now we can write it with fields, filters, pivots, word structural concepts that users think about.

Eric Dodds 13:37
Yep. Yeah, that makes total sense. Okay. So, Looker sort of reinvented the way that a lot of people approach architectures, what I see now and feel free to disagree, but it’s sort of, you know, Looker introduced a couple of layers, but they were integrated. Right. And so like, the, the wonderful thing about Looker is you sort of like, put it on top of raw data, and then you can, then you visualize it or whatever. And so now, that’s been unbundled on a number of levels. Right. And so, companies took Looker, I mean, DBT is obviously the elephant in the room for looking at well. So let’s abstract that out. Yep. Which is interesting. Thoughts on like, is it good that the unbundling is happening?

Colin Zima 14:26
Yeah, I mean, I think in some ways, it’s natural. But I think that there’s also equilibrium that you have to deal with in terms of like, how many of these tools that you want to manage over time. So the challenge with Looker is always or with sort of any BI tool that’s bundled with a modeling layer. So Omni included, is that you want to create a data model to build the analytics and then inevitably, you want to use those things in other places, so you push things further and further down the stack. The challenge is that the further down the stack that you push things, so into DBT or into an ETL process or something like that the more rigid that transformation becomes. So the more difficult it is to adjust and adapt to what users are doing. And I think in many ways, LookML superpower was this concept of development mode where I can go into a branch, I can edit a thing, I can immediately see what a report is impacted on. And I can push that thing out into production, and move it further down into the stack. So into the ETL pipeline, or into DBT, or something like that creates this discontinuity, where now the API layer below is producing a dataset, and the thing above needs to consume that data that can’t really as easily interact with that thing. Yep. And so the trend that I’ve seen is more and more people doing things like producing Reporting Tables, almost cubing, Allah sort of 2005 Sure. And the advantages of things like that are you do get standardization. So you get tables that your BI stack can consume your data science stack and consume, you’re now reverse ETL stack can consume. And so you do get that standardization. The challenge is that rigidity is a business problem. Also, the number of people that can touch that layer naturally drops over time. And I think very early, that was an advantage for DVT. Like almost you have now a modeling layer that fewer people can touch. So we can do screenings. Exactly, we can maintain it more tightly. And its inaccessibility is an advantage. But if you play that forward to a whole organization dependent on materials out of materialized modeling, the challenge then becomes very few people can touch it. So now we’re waiting on the data team for the next column or something like that. And I think this, these concepts play really heavily into the way that we’re thinking about building a product, which is, obviously, we do have a modeling layer, kind of all look like ML that is doing just in time transformation and pushing the sequel down. I think the mistake that we made with Looker was hoping that our modeling layer could be everything for everyone, so that we could do all the transformation for the company. And I do think the things that DBT has shown people and just sort of the evolution of the data stack has shown people is that there are concepts that may start in your BI layer that need to get pushed down and standardized for everyone. And it’s even obvious to say, but like I remember, we made a customer health score, our business depended on it, we actually picked it up out of LookML. And we put it into an ETL process in airflow. And the reason is because we didn’t want people to touch it. So there are good reasons for those things. I think the challenge is that you need these divergent layers to be able to speak with each other. So I need to be able to start and produce a report. And I don’t want to start by doing ETL. To do that. I want to be able to iterate on it, quickly publish it, validate it, and then decide whether things need to get standardized. Yeah, so our point of view is that you do need those modeling layer pieces. But we need to be more pragmatic about the things that we truly need to own and orchestrate. And the things that we should push out. So like the example in Omni is, I want you to start by writing a piece of SQL. And then I want to take the pieces of that SQL, so maybe the joins and model them for you are fields. And then if that sort of virtualized view becomes important, I actually want you to pull it out of AMI, and I want to publish it into DBT. And I want all of our reporting in Omni to continue to function silently. So I don’t want to care where that lives. But I want a user to be able to make that quickly, and then harden it as needed. And some things should go down into those lower layers. And some things actually should not. And I think that is sort of what the ecosystem is missing here is, yeah, there’s almost this view that everything should be in the centralized metrics layer, when the reality is like, that requires time and investment. And something should have that level of care from the data team with SLAs from datasets and sort of alerting and things like and something should not. It’s not worth that effort. And so we’re trying to sort of inject some pragmatism into the modeling experience for the user.

Eric Dodds 19:18
Yeah. Can you talk about that in terms of learnings that you had at Looker around? You know, like a KPI like you said, like a customer a customer health score, it’s core to the business and it shouldn’t be touched. But so much of good analytics is actually exploratory. Right. And so when you I think it was like when you were talking about going deeper and deeper into the stack, like, one way to look at that is that you decrease people’s ability to explore because you’re really constricting parameters. Yeah, what did you see when you like Looker, and even with Omni like, exploration actually is the way to figure out maybe what you need to harden But if you just start with the modeling layer, like, you make so many assumptions that you end up having to go back and change it, and it’s very slow.

Colin Zima 20:07
Yeah, I think that’s exactly right. I mean, the way I would sort of summarize it is like, I think there’s a lot of things that are bottoms up, and a lot of things that are tops down. And what that means is that there are certain datasets that you can publish out, where making them easy to work with is the superpower of them. So maybe it’s like a revenue time series or something like that, where you’re not going deep. And accessibility is the most important thing. And then there are other datasets where there’s no amount of pre-emptive manicuring that you can do to make it effective for people. And I think Event Analytics is a great example of this. Like, we’re building new features. Maybe we thought a little bit about tracking, maybe we haven’t, we’ve got nested JSON blobs all over the place. Yeah, I’m not going to be able to as the data analysts predict what my product manager needs to do, they’re going to have a question, and I need to give them as much as they possibly can to go answer that question. And then we can think about reshaping that data set. So like, again, I think it’s about data teams focusing on where the cleanup that they do has the most leverage. So maybe Salesforce does need a lot of manicuring. And they do need to build published datasets there. But these longtail sets just require getting data into people’s hands and enabling them, and then reacting to what they’re doing with it. It’s actually very similar to sort of like the MVP process of building a company, which is like, we can over build our product before people are using it. And we’re not going to learn from it. And if we put out younger things that are less complete, we can see how they’re getting consumed, and then react to it. But if you put up walls in front of people, you’re going to disengage them. So I think trying to sort of dumb it down for the user can be advantageous but not universally.

Eric Dodds 21:50
Yep. super interesting. What do you think about your user Omni, based on what you learned at Looker? Because, you know, one term that’s cropped up in the last several years is sort of analytics engineer. And in many ways, Looker basically created that because it’s like, you know, you sort of have someone who does a bunch of data modeling, but they’re not an actual analyst, and they actually become an analyst. And then like, Ray, and so Looker enables this really interesting environment where it sort of gives superpowers on both ends. So how do you think, nominee? No, it’s

Colin Zima 22:23
True, it’s like giving sequel people superpowers. I think that we talked about this a little bit in the pre show. But I think in a lot of ways at Looker, we needed to really simplify our message, because we were teaching people a new way of doing things. So this idea of a centralized modeling layer that’s highly governed and highly controlled, was very appealing. And so the templating sequel was a piece of that. But the core message of Looker was governing data. And I think the flip side of that was that it’s very hard to compromise your most core message. And the core message was like, everything is governed. And it’s really tight. And what that meant was that when people needed to do pragmatic things, so they needed to transpose before we had transpose, or they needed to write a piece of SQL that they didn’t want a model, they were picking it up and injecting it into the data model. And it wasn’t governed, it was just a raw piece of SQL that was getting dumped in a practical way. And so I think one of my big takeaways was, we in some ways weren’t allowed to be pragmatic about our user. We weren’t allowed to give them more sequel things because we had to simplify. Yeah, and that’s a lot of sort of the opportunity of doing this, again, is now rather than teaching people, the Looker way, we can build on people that understand that and DB, T, and Fivetran. And now we can say like, great, you want modeled things, I can give you model things. But if you want to poke through the model and write some SQL, I want to let you do that too. And then you can decide to model it later. And so it’s sort of like we can give people nicer things because we don’t need to protect them from themselves. And it’s, that was a lot of the balances that in, we felt like we had to be very opinionated about the product. And like, if the developer of a model was not good, that was not our responsibility. And I think now we were taking a more opinionated point of view and being a little bit more aggressive. So, a simple example is we don’t operate exclusively in databases. If you write a query to a table and you read select star, that table will actually pull the whole thing back, put it in the browser, and let you re query that dataset. Oh, and because it’s sort of more pragmatic and faster and better. And so like, we sort of get to take some of these foundational concepts and go two steps further, in terms of what we’re allowed to do with them. That’s what’s been so fun about doing this again, is we know how to build the core foundational pieces. And now we get to build those things like just outside the Customer Price. chemists that are sort of so exciting. I always used to joke that I wrote more SQL like raw SQL than any other liquor user ever. And now I just get to write raw SQL alongside a data model. And it’s like, it’s what I’ve always wanted. So it’s, I get to build for me a little bit.

Eric Dodds 25:16
Very cool. Okay, one more question for me before I hand it over to Costas. Can we talk about so, you know, we talked a little bit about unbundling? Can we talk about the relationship between the visualization layer and the modeling layer? As a Looker user, I think that was one thing that was really nice was that, you know, you sort of had the ability to like, drill and then it’s like, okay, well, if you want to, like, look under the hood, you can look under the hood, which is really nice. And so, you know, with the unbundled model, you don’t really get to do that, right? Like, yep, that becomes a ticket for someone who’s, you know, doing the DVT model, or whatever. And so you’re a user like, I love that about Looker? Is that something you’re trying to retain an AMI or like, how do you Yeah,

Colin Zima 26:07
and actually, we’re trying to even sort of push it a step further. So like, we sort of talked about this in terms of like, there’s a level of prep on a dataset that you can make for a user. And the more prepared it is, the less flexibility the user gets. And so like, the simplest version is you make a reporting table, and that’s all the user can touch. And the Looker version was we give you sort of this model schema, and you can touch anything inside the model. And we’re almost going a step further, which is you can touch the model and do anything. And if you want to even poke through the model and write SQL, we’ll let you go that step further. But the key here is always sort of this interplay between trying to structure things more over time. So if you do let someone write SQL, what we’re trying to do is sort of pull out those sorts of granular concepts that can make the next question simpler. So a really obvious example is that if you join two tables together, we know that we can make a virtualized view over that table. And so I don’t need to write that next time. And kind of the more of those pieces that we can help you build fluidly. So I don’t need to drop into a model. It published the model out, right now it’s just I can write a join, and now it looks like it’s modeled. And then we can kind of structure that model more and more over time. That fluidity, I think, is really the superpower of what sort of them the modeling layer helps with the compilation. But it’s not just having a model there that does it. It’s the model and the ability to adjust the model based on what the user needs. So like, a great example, that sort of constant is that you have some sort of internal product, and you want to filter out your internal users. The version of it, where you’re doing this in the ETL cycle is like, go back in the ETL cycle, rewrite the queries filter the rows, the what Lookers true innovation was, on a query result, you can drop into the model, put a where clause on everything that’s hitting the table, and then boom, it’s gone. And it’s that coupling, that is the real power of the model is that it takes that Events view and it really makes it Events view where the user is not internal. Yeah. And it makes it super accessible. And I think then what we’re trying to sort of build on is sort of how do we then refine that work that user did and make it as fast as possible? And then if that were needed to push all the way down into DVT? Can we make that really simple as well? Yeah. So it makes it really fast for the user to answer the question, and then makes it really robust, or as robust as the company wants for controlling the logic later.

Eric Dodds 28:42
Yeah, that’s super interesting. It’s, it sounds as if, like, you know, in many ways, and I know, this may be an oversimplification, but it’s almost like reclaiming the value of the 1000s of ad hoc, you know, activities that people are performing on a, you know, weekly or daily basis, because there’s so much value in what they’re trying to do. But largely, it goes wasted, you know, that’s exactly it coupling, you know,

Colin Zima 29:10
and I’d say like, even if it’s trying to also knock down that sort of decision node of like, do I make this scalable now? Or do I answer the question? Like, I want you to answer the question, and I want you to make it scalable later. And sort of like, the original version of the world is like, do I go pull up mode and write it in sequel? Or do I go take the time to like, think about a data model and model it? And I want you to just answer the question, and then I want to pull out like, hey, we found three things that are modifiable. Do you want these things like boom, let’s model them. It still can stay orphaned, like, it’s okay to have one off sequel. And it’s pragmatic, and users are doing it and we can’t argue with them. Yeah, so it’s like, that is what the subtlety here is. Let’s let users put themselves in trouble a little bit, but let’s try to help them to make it scalable and make it better.

Eric Dodds 30:00
Yeah, love it. So fascinating. costus, please.

Kostas Pardalis 30:04
Thank you very, so, golden. I have. Okay, let’s start with a question about the past and how it relates to today, right? Like, yeah, look here was, I think the company was founded in 2012 or something like that sounds about right. Yeah. Like then yes. Right. So, and then yes, after we have only so what has remained the same and what has changed between 2012 and 2022? When it comes to the problems that I like only today are, um, pretty stunning.

Colin Zima 30:39
I mean, I think a lot of it. So first I think there’s a really big difference between 2012 and 2015. So I think in some ways, Looker got a little lucky. A great example was our first HotelTonight instance, which was actually on top of our production MySQL. I took down the app a couple times to query and looked. Not recommended. You set up your set up your production replicas, but after Redshift, and Snowflake columnar, like databases on the web, and essentially like this sort of mixing between the data lake and the data warehouse, and just the ability to query lots of data has become just obvious and normal to people. You don’t need to walk in and say, Do you know what Snowflake is? Are you ready to start doing that thing that has just become completely normal? I think the idea of compiling SQL and sort of sequel familiarity, and that being a core component of your data stack has just become normal. And similarly, this idea of just all of your data from 10 Different sources showing up in your data warehouse, or your data lake or whatever it is, has become normal. So I’d say early in Lookers life, we were teaching people these concepts and sort of learning about them early in omnis life, we can assume that you have Fivetran set up, you started your Snowflake, you’ve got DBT implemented, and you have two people that know how to write sequel great. And you’re ready to start going. And so like we get to start with a lot of those concepts existing in the user base. I think the demands of end users have not changed at all, like people, the dashboard is not dead. Like most people are looking for dashboards, they’re looking for interactive analytics, they’re looking for some version of self service, so that a marketer can go look at their channels over the last three weeks, and see the evolution of sort of lead generation across them. I think all those things have become just normal and standard for people. I think one sort of big obvious thing that’s sitting out there is they’re just, there hasn’t been a generational bi company since Looker. So like Tableau and Qlik, and to some extent Power BI were the wave before Looker and Looker. Like, I mean, I obviously have a very Looker centric point of view. But I think Looker grew up isolated like it was the isolated winner of its generation. And there’s a couple other tools out there that are sort of similar generations. I think the current generation has not quite been figured out yet. And so there’s some whitespace. But at the same time, Tableau and Looker and Power BI have become extremely commonplace in people’s data stacks. So I do think people are somewhat comfortable with the stack that they have now. Which is a little bit different. Because we got to be very different. When we were looking. We were pitching something very new. Yeah.

Kostas Pardalis 33:29
Yeah. 100%. Okay, you mentioned something very interesting. I would like to spend like, few minutes on anything you like, like, hear your thoughts on that. So going back like that’s the look at the cohort of BI tools, right? Like they were like, a couple of them. Like we had like Chartio we had a periscope data mode, which is still around starter. What’s the other one? The good, like merged with, with Periscope data? Size? And yeah, so we have like all these companies growing. And at some point, we get like the acquisition from Google Plus Google from Google to acquire like, look here, and it almost felt like a cycle was closed like this, let’s say in the market, right? Like, companies got Mertz IPOs canceled like sighs since it was like talking about an IPO going for a while if it hadn’t happened yet. And then we also have events like Tableau, for example from being publicly acquired, right? Yep. And we still have, by the way, like, as you said, like Microsoft was Power BI, which is we don’t talk that much about it, but it’s huge, right?

Colin Zima 34:58
Biggest by far. That’s yeah. Yeah, exactly.

Kostas Pardalis 35:01
So give us a little bit of like, what happens with these cohorts? And after you do that also, how do you see the future like the next iteration? Yeah.

Colin Zima 35:16
I mean, I think the tool is sort of divided into a few different buckets. So I think the thing that Looker did really well was it was very opinionated about what it did in terms of this modeling layer. But I think we also understood that to be a big successful company, we needed to serve enterprises effectively. And so while we started really focusing on the hotel tonight, so the world of venture backed tech companies that were young, and sort of first, like early adopters of product, we built a lot of the things that gigantic companies needed to be successful with enterprise analytics for 100,000 people. And I think that was one of the reasons that we were able to be so successful as a company is, we thought a lot about the business as we built out the product that wasn’t always best for the product, to be clear, and there was always there’s always some tension between the business and the product that you’ve got to deal with. But I think that a lot of the reason that we were allowed to be successful was because we thought about the trajectory of what could support the business like I remember having conversations about how to get to a billion dollars in revenue. And when you’re having those types of conversations, it makes it much easier to think about sort of what the business looks like 510 years from now, and what it needs to be successful. And I think some of those companies that you’re listing, were more focused a little bit more down market, and maybe a little less focused on sort of the Sustainable economics of the business, though, again, likes, some are surviving and like sort of continue to grow. And the SAS model is great for things like that, it’s really hard for me to figure out what the next generation is actually. And obviously, I’m trying to build one of them. But I think one of the things is that when we were starting Looker, I looked back for a lot of inspiration at MicroStrategy Cognos Business Objects, like the first generation of BI. And I think in some ways, I was literally looking for MicroStrategy stocks. And they’ve got like the little folder menus, and it looks sort of like Windows 2001 As you’re using the product. And I think some of these tools, like even Tableau to some extent, feels a little bit dated in terms of the sort of the web interactions and sort of the user interaction models. And I think a lot of the opportunity is just sort of updating, really, at the margin some of these concepts in terms of how we interact with the database. So a great example here is, as Looker was growing up, we had the columnar database growing with us. So Snowflake, BigQuery, Redshift, and just the idea of using them was sort of the new concept. We’re not going to extract everything we’re going to operate in the database, not only is it going to be okay, it’s going to be faster than if you were working on an exact basis. And it’s real time. I think now we’ve reached the sort of pain point in that sort of trajectory, which is like my Snowflake bill is a million dollars, is my BI tool to recklessly consume that layer. And I think DBT is probably the contributor to this as well. And so this is where now we can take some of those core concepts and say, Okay, what is the borrowed concept from historical bi that we can actually layer in here. So the example here is that we’re silently putting in memory layers into our products. And again, no new concepts here, like BI tools have done this for 30 years. But I think the concept of operating entirely in a database when you need to be so that you’re real time and you’re working with Fivetran? Well, but I’m able to build a dashboard, where I can download the whole dataset in memory and cross filter it instantaneously. I think that’s actually what users want. So it’s sort of like, how do we look at the things that are great about columnar in a database, and then build on them, or five trees and having all your data there or DBT a familiarity to SQL those sorts of pieces?

Kostas Pardalis 38:53
Yeah, that makes a little sense. And we’ve been talking a lot about SQL, but there’s also a lot of discussion about Python, right? Yes. And the reason I’m asking that is because, okay, I get it for DBT, for example, to like being like having like a lot of requests around Python. They are working a lot like with ETL ETL. Traditional Glue was always like Python yet. But there are also like a couple of, let’s say, bi products out there, but they try to merge the two paradigms together. And yep, it’s usually a couple of weeks through like, the notebook. Yep. Yeah. So what’s your take on that? Like, if these that you think that this can be like the new iteration of bi or at the end, like notebooks is like something’s different and it’s not bi?

Colin Zima 39:48
I think it certainly can be bi. Like, I think all data consumption is sort of overlapping Venn diagram circles, where like, similar users are doing similar things. I think that I have found it in Pass that it’s a very different type of user. And that data science activities tend to be done less scalable, and less to be shared and sort of more forced directed diagram style analytics, then creating a self service environment for end users, and you’re making different trade offs. So like, we are very focused on the sequel side of things for now, and sort of the query consumption side, I would say that like, as databases make wearying in different ways more available to end users, I think that people will want to use them, it’s just the frequency with which I need to use looping or sort of like, higher order construction, I think tends to be more important in data engineering type activities, and data science versus consumption and reporting and things like that. So for me, like I would say, we’re actually going in the other direction, which is like, I want to build a functional library that looks a little bit more like Excel on top of our data. So a Google Sheet style interface on top of queries, rather than thinking about Python, because I think that’s what makes it more accessible to more users. But I sort of understand your point, I think you need all of these things. And that’s why I don’t want to lock the business logic into our lair. If we help you build business logic, I want you to push it into the right place, so that a Python user can go pick it up. And maybe we’ll do that in sort of the infinity of time. Like, I would like to do everything eventually. But our focus is very much on like the sequel compilation that consumes and reports the functional, sort of consume layer more than sort of deeper science pieces. Yeah,

Kostas Pardalis 41:47
that makes total sense. Okay, I want to share with you what I always considered to be, Gene’s use of Looker. And then based on that, I want to ask you, I’d love to hear it out on me. So what I found, like amazing about Looker was how, I mean, it goes like you have like a product that there’s two that in order to deliver value, you had to engage two completely different personas. One was the business user who likes to get the data and does like the reports. And then you have the analyst or the data engineer who likes to prepare these data and likes all that stuff. And look, you’re for me creative is like a very distinct experience between Luca Metacat was for let’s say that they lines in the air. Yep. And then for the business user, where you pretty much like, you could only do what you know really well to do, which is like PyTorch, right? Like, that was like, just like switching from the developer mode, like to the real mode, let’s say it was amazing. Like it was amazingly smart. What happened? So I always considered it to be a very successful and unique example of a product that can serve two personas almost equally well, right, which is hard. It’s super hard to do. And based on that, let’s talk about only like, yep, who is the user of only do you have like this duality? Again?

Colin Zima 43:19
Absolutely no, like, I think you perfectly actually, it’s amazing, because we really tried to profess that point of view strongly. Internally, we actually had personas for each of those, it was called The Fox and the Hound. Like they were top level user types in the company, and we care deeply about them. I actually think the funny thing is, I think we stopped just short of really diverging the product enough to serve both of those well. So like, almost what we’re trying to do is take one more step in both directions. And that is like I want to give those technical users more superpowers in SQL, and sort of more fluidity to model and do things and sort of fork away. And I want to make the end user experience more excelling, and more sort of interactive, still best based on the pivot table. But like a simple example is use point and click interactions to make functions, instead of typing functions in a modal. Like, how can we elevate it so that any user can consume things? But I think to your point, even going back to sort of the success of those previous businesses, I think that was the most important things to look for success was understanding that we’re selling to a data person, but we’re selling to a data person whose job and what makes them successful is making other people successful with data products. And that is exactly our focus is we’re not building a product for data people. We’re building a data product for data people for business users. And when you sort of shift the thinking a little bit it still needs to be outstanding for the data person. Like that is what got Looker ba And we got plenty of criticism from our non technical users. But they’re doing that work not to do research on an island. They’re doing that work to build a self service environment for people. And so that self service environment needs to be truly great. Ours is still getting better. But that’s exactly what we are trying to do is build a great environment for end user self service that lets data people go a little bit further to.

Kostas Pardalis 45:26
That’s great. All right, one last question from me, because we’re getting close to the end here. And I want to give some time to Eric, to ask any follow up questions that he has. candidate, give us a little bit about what AMI is today, like, what’s the product experience, and also share with us the vision that you have, like what we should expect, like in a year from now. Because only in your company, you’ve been around for four levels. So please do that. Yeah,

Colin Zima 46:00
So, what we are doing is really balancing these two worlds between the directness of writing SQL, and the sort of governance and the richness of accessing a data model. So what you see when you’re using Omni today is all the analytics is built through a centralized data model. But on top of that data model, users can essentially embellish in SQL and go beyond and sort of ask open ended questions and do open things. And then they can take the components of those analysis, and push them down and centralize them. So the idea is that I can start in a workbook, I can do analysis with a mix of sequel and pivot table and front end fields and UI. And I can do that in isolation. And I can pick that isolated thing up, push it down into a centralized data model and share it with everyone, or I can leave it in isolation. And so the idea is that I can really straddle these worlds between doing things in a free and open way, and sharing them directly with my neighbor. Or I can build a data environment that everyone can self-serve from. And I can sort of evolve that over time. So I could start in a very open sort of sloppier analytical pattern. And I can slowly have it look a little bit more like a mature Looker instance, over time. And what’s happening behind the scenes that’s powering that is sort of our model management piece that is picking out the sort of components of your SQL, turning them into a data model. And then we’re able to pull them sort of out of SQL queries, and push them into our data model and push them out of our data model and into the database. So you can almost think of it as just Looker meets a sequel runner, and lets you sort of move back and forth between them completely fluidly.

Kostas Pardalis 47:45
And what should we expect? Like, give us something to do? You know, yeah,

Colin Zima 47:52
I mean, certainly, like more and more maturity around these sorts of experiences, like the magical super motion that I want to see in the future is that an analyst starts a one off analysis, and they start writing SQL, and they share it with their team, and their team wants to do interactive analysis with that thing, we’re able to hit a button and quote unquote, model that sequel down, put it in a centralized modeling layer and give people self service with it. Now a data science team decides that they want to work with that same data set. Again, we can pick up that business logic that persisted in DBT, through some sort of, sort of cron schedule. And all of the metrics work silently through the Omni layer. So it’s a self service environment for your end users. And it’s sort of a technical iterative environment for your technical users. And we do all the orchestration of the business logic between those layers. So visualization reporting obviously comes along with those things. And then you’re going to see more and more in terms of end user experience. So things like spreadsheets, dial analysis, CSV, upload, acceleration, a lot around those pieces, so that you’re on a dashboard, and you hit a button. And now your dashboard filtering is instantaneous. And it’s all just sort of happening magically behind the scenes.

Kostas Pardalis 49:10
Okay, let’s, let’s Great. Very, oh, years from here.

Eric Dodds 49:16
All right. And so this is a question that kind of combines multiple, multiple parts of your past experience. So one really interesting thing when it comes to data, you know, today’s space in general, but in analytics, as well, is that you have, you know, machine learning and AI. Getting a lot of there’s, there are a lot of headlines out there about, you know, ML and AI and, you know, Automated Insights and, you know, anyone who’s actually, you know, tried to use Google Analytics to use their, you know, automated like AI based insights like they know that They are real, if you’re like a real business trying to scale, that stuff is like very difficult to do. But you also have a lot of experience, you know, in sort of, you know, from your Google experience, like feeding algorithms that are making decisions, you know, building products that create a lot of data that go into algorithms. And then in some ways, it sounds like Omni is trying to make intelligent decisions, or at least, like, make decisions around what options to give you based on what you’re trying to do. Just guessing. Yep. Not that’s like AI or ML necessarily, in a formal sense. But what’s the relationship? Right? Because in some ways, like, it’s dangerous to introduce, like, an algorithm into business logic. Yeah.

Colin Zima 50:50
Like, I actually think you sort of nailed it, right? Yeah. And there, which is like the option. I think the most underrated concept around all of these things is like a light human in the loop on these sorts of concepts. So like, we’ve even actually noticed this in our product, there’s a really big difference between writing automated joints on your behalf, and telling you that we think that we found very good joints. And the difference is being right 100% of the time. Like, I think these are the sorts of concepts that tie to self driving and things like that, which is the bar to be incorrect in a lot of contexts. It honestly is like, it can be very close to 100%. Yeah. And so I think that you can use these tools in ways that are extremely powerful, but that you just then want to present them to users in a way that’s more interactive. So like, an example that we’re thinking of in the future is, I don’t really want to have people writing joints in our product, like you can, if you show up and you have a list of joints that you want to punch in, I would much rather see two tables that you want to join together and say it looks like these two keys join. And these are the three most likely other couplets of fields. And this is why we think that like hit yes or no, yeah, to me, that is like super magical. And it’s taking, it’s like a half step back from just doing it for you. But I think those are the types of pieces that we’re trying to layer in. And it’s actually the same with our SQL parsing. If you write SQL, we parse it out and try to write fields, we don’t just immediately stick those fields in the model. Because we’re wrong. We’re wrong. But we’re comfortable being wrong. And we can accelerate your ability to make those fields. And we can do that in a way that’s very expressive. So I think we’re trying to layer those pieces in, we’re just trying to do them in a way that puts the user in control as much as possible. So like very little black boxing, but black boxes that point you at decisions to go make.

Eric Dodds 52:49
Yeah, fascinating. I love it. Yeah, I mean, you can almost like, as those patterns become established, then those become even more powerful, right? It’s almost like linting for queries or something. That’s exactly

Colin Zima 53:01
and like they can become automated, but like, let’s make the bar really high.

Eric Dodds 53:05
Yeah, yep. Love it. Okay. We’re at the buzzer here. But

Colin Zima 53:09
Before we jump off, where can people learn about trying to use Omni? Yep, exploreomni.com. Fill out the form there. Like we’re about to probably put something real public out there. But we’re still young. So we’d like to sort of handhold through the early process for now. Or just shoot me an email Colin at exploreomni.com And I’ll take on the tour myself.

Eric Dodds 53:35
Awesome. Sounds great. Well, Colin, this has been an amazing conversation. Thank you so much for giving us the time. Of course. Thanks. It’s been fun. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 165:

SQL Queries, Data Modeling, and Data Visualization with Colin Zima of Omni

November 22, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter