Episode 94:

Notebooks Aren’t Just for Data Scientists with Barry McCardel of Hex Technologies

July 6, 2022

This week on The Data Stack Show, Eric and Kostas chat with Barry McCardel, co-founder and CEO of Hex Technologies. During the episode, Barry discusses analytics and ML, “commitment engineering,” and how Hex solves issues in the data world today.

Play Video

Notes:

Highlights from this week’s conversation include:

  • Bary’s background and Hex (3:05)
  • Reconciling two sides of data (9:16)
  • Collaboration at Hex (15:10)
  • What it takes to build something like Hex (20:02)
  • Defining “commitment engineering” (26:01)
  • How to begin working with Hex (30:56)
  • Hex customers and uniqueness (40:31)
  • The future in a world of data acquisition (45:30)
  • Crossover between analytics and ML (51:33)
  • Advice for data engineers (57:19)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Kostas, we are talking to Barry who co-founded a company called Hex. They’re in the analytics space. And this is my burning question. So you have Looker acquired by Google, you have Tableau acquired by Salesforce. Periscope— Someone bought Periscope, right?

Kostas Pardalis 0:45
No, they married sweet. Yeah, acquired by five sense but yeah, you can.

Eric Dodds 0:49
Sure. Amplitude went public, like. So in some ways, it’s like you sort of see like industry movement like that. And it can feel like, wow, okay, the core analytics problems have been solved at a massive enterprise scale, great check, like, let’s sort of move on. But my strong sense is that that is not how Barry feels. And I think that he is going to help us understand like, what innovation is happening now and will happen in the future, the analytics, so I want to get his take on that, like, what parts have actually been solved? And then where are we in early innings? So that’s what I’m going to ask. What about you?

Kostas Pardalis 1:32
I don’t know. I think one of the things that we should learn from when it comes like to technology, and this industry is that everything happens in cycles, since whatever has been invented gets reinvented, right. Probably that’s one of the mistakes that IBM did. They came up like with all these huge servers, and they were like, okay, like, we solved the problem. And now today, you have AWS. Mike, as you run, like cloud computing, and suddenly even infrastructure with reinvent it right. And probably were also like, even there, we are entering another cycle of integration. But anyway, I think it’s very interesting time. Because yeah, like, as you said, it’s this market cousins, like the BI market that visualization of market hasn’t, like produced any innovation for a while now. And when we say, well, we are mean, talk about like two or three years. Yes, on that, right. But that’s a long time when it comes to technology. So I’m very excited that we have bought it today because I think we’re going to see what’s next in this space. And we will learn more about the product that they’re building, which is going to be which is a very good example of what’s next in the space. So let’s go and talk with him.

Eric Dodds 2:49
Let’s do it.

Barry, welcome to The Data Stack Show. We are super excited to chat with you. Thanks for having me. Okay, give us your background and tell us a little bit about Hex.

Barry McCardel 3:03
Yeah, so my background, I’m very the co-founder and CEO of hex I had been working in at around data. Basically my whole career in undergrad I kind of stumbled into some really interesting research around social networks and was to stop and spreadsheets and then like our this was sort of like before data science was really a thing. I went into consulting and I was doing dark, unholy things in spreadsheets, I was building on a cold data apps that Excel. Tab by tab data transformations and drop-down interfaces. And I was writing like VBA to build you eyes, that it is deep and dark, elicited Access databases that PC towers, I pilfered it clients to run a database. So a major US airline had a lot of their pricing and Wi-Fi maintenance infrastructure, but they have a spreadsheet over here, which is or if I, every time I fly that airline, I wonder if they’ve migrated from it yet. And then I went to Palantir. And I was there for about five years. And that was a really an opportunity for me to sort of like do that type of stuff. I was really enjoying data analysis and building different apps of data sort of in the big leagues. And I was there through a really interesting time, like 2013 through 2018, around like the really emergence of a bunch of things that we I think almost take for granted around like, really working with large scale data. Big data was the buzzword, data science as a discipline and a lot of technologies that were emerging that are now quite widely adopted, like Spark HDFS, even just AWS in general and the possibilities to cloud. So it was a great time to be there. And I got to build very a lot of very interesting technologies. And I also met a bunch of folks that I’m working with now, including both my co-founders. After that I went to a healthcare startup in New York. That was really the acute moment for Hex, where we had this problem we had a really kind of quote unquote modern data stack we had like Redshift, I think we were early adopters of Pvt. We had Looker for BI. But we were still doing a lot of our like exploratory analysis, modeling storytelling work in, like read one off Jupyter notebooks or sequel scratch pads. And we were sharing everything through like spreadsheets and screenshots and slack. And I just started this journey and Paxos a buyer, like I was looking for a piece of software that kind of fit my picture of like the type of platform that I wish we had. And it took a few months of looking and realizing that it wasn’t out there to kind of come around to that. So in a way, x is like a culmination of all these experiences I’ve had. It’s basically the tool that I wish I’d had every face in my career, I would have used it as a user of basically every time every job I had had. So that’s sort of the backstory. So I started the company about two and a half years ago with Caitlin and Glenn, who I both met and worked with at Palantir. Glenn was actually my intern in 2014. We’ve been working together ever since. So I’m very, very fortunate to get to work with some great folks on this.

Eric Dodds 6:01
Yeah. So I’d love to hear about maybe the moment or sort of the experience of realizing like, am I going to start a company to solve this problem? Because sort of going from like a vendor search to embarking on an entrepreneurial journey is pretty bad. Or maybe it’s a sell step. But tell us about that experience? And like, maybe some of the circumstances surrounding it.

Barry McCardel 6:29
Yeah, sure. I started this looking for software. I was like googling a bunch of terms that I figured someone asked who built this thing. And I couldn’t find it. And I wound up asking a lot of friends who are at other companies doing separate data teams, or whatever, like, how did you guys solve this? And everyone kind of had the same answers, which is like we don’t, we’re like, we’ve cobbled together like, Jupiter hub thing with a bunch of other open source stuff. And it’s all really brutal. And but like, it kind of does the thing. And none of this felt really satisfactory. And I had just come off of five years at Palantir, where like, our whole shtick was like, building really great software. And so I think I kind of felt this sense of dissonance, like Wait, someone, someone should be building this and, and I just started off and five years of enjoying building those types of things. And so I came to this pretty reluctantly, I’m not someone who was like, set out, like, I want to be a founder, I just need an idea and a co-founder. Like, it’s almost like to get dragged into it. Like the stars really had to align, like a problem that felt glaring, that I understood, was a clear gap to co-founders who by like circumstance, were also like, getting bored, and they’ll say their things and wanted to do it. Next thing. And so yeah, I mean, it’s funny, it was a pretty dramatic turn, but it was like, it kind of just dawned on me slowly. And when we decided to jump in and do it, it felt very natural. But it felt quite organic.

Eric Dodds 7:55
Yeah. Okay. And then give us just the sort of one click deeper of detail into like what does Hex do?

Barry McCardel 8:04
Hex is a platform for collaborative data science and analytics, we kind of do three things really well, we have an online collaborative first notebook experience, where it’s very easy to come in and connect to data, ask and answer questions, or together as a team, we make it very easy to share your work with anyone as an interactive data app. So it’s literally just one click or I guess it’s two clicks to go from an interesting analysis or a model, you build to something that anyone can use that sort of published web app, whether that’s a simple report or something much more complex. And then we allow this, the outputs of the data work to contribute to an overall base of knowledge within an organization. We can get more into this, but we really think of the art and the science of analytics as contributing to knowledge. And it sounds a little abstract. But at the end of the day, what you’re really trying to do is influence decisions, and help an organization understand the world better than we think there are some big gaps and how the great work that data people are doing everyday sort of translates to knowledge. And so we build a series of features that we can start with contribute to that mission and we’re really excited about that.

Eric Dodds 9:08
Awesome. Okay, I want to dig into that more. But before we get into some of those details, I want to zoom out a little bit. And I’m going to sort of paint a picture of, I would say, two different sides of the same coin when it comes to analytics. And these are two perspectives that we’ve heard on the show at some point, okay. And I’m going to intentionally sort of probably draw the like, make these a little bit overwrought, so forgive me. And our listeners, forgive me.

The first side of the coin is that the analytics game has kind of reached infinity and like most of the hard problems have been solved, right? And whatever the reasons are for that. It’s because of modern storage and separation of storage and compute and like, flexible visualization layers and all this stuff where it’s like, you can do advanced analytics, like, way more easily than you could ever do them before? Right? And so, from that regard, some people would say like, okay, that sort of wait is over. And like, the next big wave in the data stack is things related to ml and ML workflows and all that sort of stuff, right? That’s sort of the next phase of modern data, whatever you call it.

The other side of the coin is the opposite. It’s like, we’re actually in early innings. The advances that have been made are actually just the foundation on which, like, the really cool stuff with analytics is now going to be possible. And so we’re like, pretty early, right? And so, one good example of that is things like the metrics layer, where you’re kind of now seeing this like agnostic stack wide sort of accessible layer, right, that that solves a lot of different problems.

So anyways, I’m interested in your perspective because, as a longtime practitioner and now someone that is really thinking through like solving a problem with a product in the space, what do you think? Is there truth in both of those?

Barry McCardel 11:12
There’s some truth to that, I don’t think I think the narrative analytics sort of world is solved is very far too true. I do, there are parts that have gotten way better. And for folks who have been in the world for a little while, which I’ve been very fortunate to be like, there’s a very dramatic shift from like, 10 years ago to today, in terms of organizations ability to bring data in even just have their data, especially for like SAS tools or other places, like in their possession, have it stored at scale. Obviously, platforms like Snowflake and Cloud Data Warehouses generally have unlocked a lot, they’re able to transform and model it. And the revolution that DBT has sort of propagated over the last few years has been very, very costly and very meaningful there. And so I think we’re just at the actually beginning of the situation where a lot of organizations can claim to even have, like a corpus of like, clean and reliable data that people can actually go tap into at their fingertips. So from that perspective, I think we’re just very clearly still in the early innings. Of that, and there are some solid problems in there, there are some problems that we’re still very far from being solved in there. And we think about a lot of the unsolved problems the analytics world when we see our customers and potential customers come to us, it is very clear that there is a lot of people who still have a lot of friction around being able to ask and answer questions of data, be able to work with data at cloud data scale. And it’s just a lot of the sort of downstream workflows of that, like, how do you collaborate on these workflows? How do you share this work with others in a way that’s actually useful and usable? And so just having the data in the Data Warehouse is a big part, but it’s not like it doesn’t mean that all these downstream workflows are solved. And there’s a lot of innovation happening here. I mean, you mentioned the metrics, I think that’s extremely interesting. And that sort of gets back to, great, you have all the data in your warehouse to like, is everyone looking at the same measures? Or are we looking at these things to say, where are we asking the answer questions in the same way, like, that’s a very unsolved problem. So I think it’s a little naive to say that like, well, check the box, and analytics, and it’s all about ml. That said, there’s a lot of interesting stuff happening in ML two, I’ve been part of some really interesting projects that have leaned very heavily on ML, I’ve also been part of projects where we tried to use a bunch of ml and found out like, a simple scatterplot was actually all we needed. And so I think we’re in the very early innings of figuring out where ml can best be applied. I’m personally a big believer in the idea of sort of this human-computer symbiosis and this idea of, I think a lot of them our techniques can best be deployed and helping people better ask and answer questions for themselves, helping them better understand, I think, I have been around the block with the idea of, let’s just feed all the data and the machine will tell us everything if that’s ever gonna work, and I don’t think that’s particularly imminent. And so at x, I think the way we look at a lot of this is like, there’s a lot of unsolved problems and analytics that we’re focused on. And there are a lot of opportunities to bring to make ML workflows easier and make it easier to bring them all into analytics workflows. And that’s what’s exciting to us.

Eric Dodds 14:20
Yeah. I have another question on that, but I will use self-control. I want to dig in on one more thing, and then hand the mic over to Kostas. Could you dig in a little bit on collaboration because that’s something that it feels like a marketing term that’s been used with analytics, like, since the beginning of analytics has or it’s like, finally collaborate, whatever. And in reality, I think anyone who’s even using relatively modern BI tools, it really still is like a data producer. And then there’s just someone downstream consuming it like it’s pretty hard to collaborate. Practically, at least in my experience. Yeah. So could you tell us what is collaboration actually mean for you at Hex? Does that take on sort of a different form?

Barry McCardel 15:10
Yeah, we have two big, collaborative sort of loops that we think about and focus on. So there’s like, collaboration between two creators or shoot editors. And that might look like I’m going to do an exploratory analysis, or I’m taking a first draft of a model, I want to be able to work with you on that. And Hex is fully collaborative and multiplayer, which means it works just like a Google Doc or Figma, or Notion or other types of tools like that, where you and I can both be in there at the same time now. But the reality is, a little secret about multiplayer is like, it’s very rare that you and I are both working on the same thing at the same time vector can be quite annoying. It’s nice to mix when you need it. Well, we see really, on the editor side is really it’s about enabling review workflows. And what we’ll see a lot of is like, doing code reviews, like, hey, I’ll tag you in it, can you go review this, you can go comments on something, feedback on it, you can iterate on something together, maybe you and I are passing the baton back and forth on working on something. There are also a lot of things that are around versioning, which is sort of famously difficult problem around analytics. And like, how are we managing version control on this, the joke is always, most people are still doing version control for their analytics by passing around a spreadsheet with just like incrementing V5 Final, is the title.

Eric Dodds 16:26
It’s so true. I mean, literally, it’s so true.

Barry McCardel 16:30
Or if you’re very moderate and you’re a data scientist, you’re passing around the Jupyter notebook with the five file and the file name. So I mean, that was one of the first things we really wanted to tackle our x. So we have a great built in version control system that allows you to see versions, you can see the full edit history, it also supports full sync to GitHub. So we have a cleanly difficult file format that I can sync to GitHub, and it can manage the whole thing to pull requests. So again, on that sort of creator, creator editor loop, we think we’ve really sort of, we’re well on our way to nailing in terms of what that should really look like for individuals or teams to be able to work together on analytics and data science projects together. The other loop and this is actually really the was the initial focus packs, is the creator, consumer feedback loop. And this is not necessarily new to Analytics, you mentioned like BI in the traditional BI world, you can create a dashboard and send it out. What’s really cool for us with hex is we’re enabling a much that sort of easy sharing for a much broader set of things. Going back a little bit that the acute pain point that brought us to X was we were doing a lot of work and Jupyter Notebooks when we were trying to share it. And it was incredibly frustrating. We were like screenshotting charts and putting them in Google Docs. Or we were like rendering things as PDFs and sending them around via email. And it was like, what year is it. And so it helps me make it very easy to go from the sort of notebook-type work you’re doing. You can publish this as an interactive data app, it could be something simple and static, like just charts and a narrative around it. And then your consumers, your stakeholders can comment on it directly, they can see live data, much, much better than sort of throwing screenshots around. Or it can be something much more interactive. With hex, it’s very easy to go through and add parameters to your work, it’s easy to sort of have a lot of customization on how something’s going to be viewed. So we see people don’t dashboards, we also see people build very complex, like workflow, apps and hacks, because you get the full power of SQL and Python under the hood. And so it’s very easy to sort of take that and publish that. So what you wind up seeing, and what’s really exciting to us is this, this change in how people communicate their work to their stakeholders and to the rest of the organization and the impact that workers have. And we fundamentally believe that by making it easier to share things, by making those things work useful and usable. By making them discoverable and easy to organize and the knowledge base, you can actually increase meaningfully increase the impact that a doula can happen that kind of bizarre, maybe most fundamental mission.

Eric Dodds 19:04
Yeah. Love it. All right, Kostas. I’ve been holding the mic for too long, please.

Kostas Pardalis 19:10
Oh, that’s fine. There was some amazing questions and answers. There are no worries. So Barry, I have a couple of questions about the product itself, but before that, I want to ask you something else. So you mentioned at the beginning, that you’ve been working on Hex for like two and a half years now. Two and a half years in startup life is like a long time. Can you take us through like your journey, and also a little bit more about the people and like the things that happened from day one where you decided to, okay, now we commit to that until today? Just to get an idea of what it takes to build something like what Hex is today.

Barry McCardel 20:00
Yeah, yeah. So it was about three years ago, exactly that I was sort of in this like, oh, gosh, why is no one solved this? And what should we do? That took a few months sort of go from that to actually quitting our jobs and decided we were all in on this. And I mentioned, Glenn, he and I were working together at the time. And for personal reasons, we were both actually going to have to move to California and find new jobs anyway. So we were both following our amazing significant others who had dream jobs out here. So it was easier for us like we were trying to jump off the ledge Caitlin was gainfully employed. And it took a little more a little more nudging out the door. But the three of us were sort of full-time on it in December 2019. So the three of us were head start working for the first few months. And then kind of in March 2020, which was a very eventful month for everyone both made our first hire, which was someone we had worked with at Palantir was the first engineer. And we’d also raised the seed round. In those first few months, we had actually already built like a functional prototype, and actually already had a few users poking around and using it, and we’re kind of well on our way to get in one of them to commit to paying. So that was enough evidence to the folks that amplify to backups. And then the first year, I would say, really was a very small team. And we were just iterating. And experimenting, and throwing some things away. As you mentioned, it feels like a long time. That first year was very heads down just building and trying to try to figure out exactly what just wanted to be. And it’s interesting when you’re building in a space that says crowded, and with as much going on as in the data space, because you’re constantly seeing other things pop up and other things happening. And we felt through that whole time that we had a really clear thesis, the set of things that we were excited about there, we just didn’t see anyone else doing. And so with a product is complex, you kind of need that, and you need to stay focused, and you need to sort of have some beliefs. And we’re gonna go build this, and this will be the eyebrow. And so that first year was really about getting there. And then in 2021, is really when we started bringing it to market. So it was not much more than a year ago, it was like, early 2021 that we launched, the company announced our seed, announced the product and started taking signups for it. And so all that to say that like that two and a half year span, it’s really only the last year that we’ve really had the product market and really only nice, haven’t even say six to nine months that it’s felt like a full, mature sort of expression of what we had hoped it would be. So maybe the message I would say to anyone else who’s thinking of starting something is like that beginning journey, if you feel like it’s going slow, it feels like it’s taking a while. But if you’re making the right investments, when they start to pay off, they will really start to pay off. And that’s a really good feeling.

Kostas Pardalis 22:49
Absolutely. How many people are in Hex right now?

Barry McCardel 22:53
We’re about 40 people right now. And we’re distributed all over the US and Canada.

Kostas Pardalis 22:57
Oh, nice. So how does it feel from your position? From the three of you at the beginning to being with 40 folks today, how does it feel as a founder?

Barry McCardel 23:11
It feels great. It’s humbling. I’m daily I’m mindful of the trust that all the stakeholders, our employees, our investors, our customers have put in our team and me. You try not to feel that too acutely day to day, but it’s a good reminder that there’s no higher stakes down. But I’m extremely proud of this team. I’m kind of a disbelief every day that I get to work with such a bizarrely talented group of humans. And the joy and pleasure as a founder is you get to spend your time hiring a bunch of people who are better things than you. I can say objectively that for everything I do, there are people on the team that are better at those things than me. And now you get to have all these really smart, capable people working with you get to watch them go and do their best work. And I think of my job is really like, how do I set up and enable this fantastic people to do their best work? That’s what I try to spend my time doing.

Kostas Pardalis 24:04
Absolutely, I think you’re making a very important observation here. And it’s one of the things that we don’t talk about, much like the people that have been founders is like you this privilege that you have as a founder if things go well, to work with all these like amazing people. That’s very, very important.

Kostas Pardalis 25:15
Barry, how did you get from like this initial like experience and the idea that you’ve had, like, what you described with the frustrations that you were going through, right, like, as a buyer, at that point, trying to find a solution that didn’t exist, to actually end up with a product that you can sell, right? Because from the idea, moment, to building and selling, there’s like a huge gap there, right, like, many things need to cash. Can you take it take us a little bit like through this journey? And so understand, like what it takes to do that?

Barry McCardel 25:56
Yeah, I think the core of it is a philosophy that I learned at a previous job that is called commitment engineering, which is the sort of art and science of going from an idea to a product someone’s actually willing to pay for. And the first step is really finding a problem. That’s cute. And that resonates with people. And for me, I just felt this problem myself, and started asking people, Hey, do you also have this problem? We started writing blog posts about this problem, specifically, the problem being around like being able to share and communicate work, that we’re doing this data science, that data scientists and analysts. You’ll find people who are interested in that problem, and then the loop we went through in the loop that I think a lot of successful companies will go through is this commitment engineering loop where you start by talking to someone about their problem. And you basically offer if I came back in a few weeks with the first version, like a first prototype of this, would you take 30 minutes to run through it with me? And if someone says, yes, then you are now in a commitment engineering loop with them. You have asked for a commitment of their time, and you’re in exchange going to go do some engineering for them. And you can kind of ride this loop all the way up to getting them to pay. Like the next step might be, hey, great. Thanks for the feedback on the prototype. If I came back in a few weeks with the next version and addressed all of that, would you take 45 minutes and click through it with me and actually use it for real? And you keep asking for commitments. The next one might be, would you invite your team to this thing? Would you show this to your boss? Would you demo it to your whole team? Or whatever it is. And then the last one is, would you pay for this? And so for us, going through that with our early users and customers was extremely effective. It let us figure out where we were barking up the right tree, like where people were excited to spend more time with us and spend more time with the product or where we were not, which is where people were like, oh, I’m busy. Like, it’s kind of how you know that you’re not doing something that’s actually that exciting to them. And so, that first year especially was really about just trying to find users to be in that commitment engineering loop with and build for them instead of just building in a vacuum.

Kostas Pardalis 28:04
Okay, that’s, that’s, um—

Barry McCardel 28:05
I don’t know if that was that interesting.

Kostas Pardalis 28:07
No, no, no, it’s—to me, at least—it’s very interesting because I have been living the position of like, starting something from scratch. And I know how, like, how hard it is, and how difficult is to get advice on, like, how to start doing these things. It’s even hard to find, let’s say we have models and processes and playbooks for many different things. But these are not like the stuff that you can easily find like a playbook for. So in kind of like this, I think it’s like everybody will be.

Eric Dodds 28:40
Well, I think it also applies internally at an organization, right? Like, even if you’re not going to start a new company and try to raise money or whatever. If you think about, like an initiative that you want to take on internally, like you have internal customers, right. And I’m just thinking about no projects that I’ve thought about trying to start whatever, even in my job now. And I love that mindset of thinking about asking for those commitments like as a litmus test.

Barry McCardel 29:07
Yeah, you’re validating that you’re on the right track, but you’re thinking about the right problem, your product solving, they’re excited enough about it to pay for it. Yep. And this, this solves for the problem that you see a lot of people they do, most people get into, which is, I’m going to build this in a little vacuum, I’m going to build this thing for me, especially for a founder, like me, that had a lot of personal experience. That was the user. It’s very tempting to just build the thing you want. And of course, there’s a degree of like judgment and intuition and taste that you put in something that’s, then you need to make, but I think a lot of people, whether they’re a founder or product manager and engineer, often can be too slow to get into that iteration loop. Or the other mistake you’ll see people make is like, don’t start a relationship, especially very early. When you’re looking for more of like early customers or design partners. They’ll start a relationship saying like Hey, would you pay for this? Well, like, often if you ask that at first, they’re gonna say no, because the product sucks, because it’s early, and all earthly products suck. Like, it’s not personal, it’s just your product sucks because you’ve only been working on it for a couple of months that. So it’s like, that’s where I think like getting into that iteration loop that’s based on you really understanding their problem. And you ask him for those commitments that are time, that’s how you’re going to build up to actually having a customer in those early days. And I think your point, Eric, about that being applicable for internal stuff, as well as I think really great. A lot of data people are effectively building products when we see this with an access people build data apps, they’re shipping and, and, and whether it’s for internal use search appliance or whatever. That’s an iterative process, I think can be concerned people really, really well.

Kostas Pardalis 30:50
Barry, let’s talk a little bit more about the product, how it is today. Let’s say I’m just landing on your landing page and I go through like this signup process. What do I need to start working with hex? Or what should I bring with me to do that?

Barry McCardel 31:08
You should bring data. Hex is really useful if you can connect to your underlying data sources. So whether you’re using a cloud data warehouse like Snowflake, or BigQuery, or Redshift, or Databricks, we have connectors for dozens of these now, different data sources. And actually, really from a from like a Getting Started perspective, really, if you can connect to your data, it’s very quick to get going, we will have people sort of be writing their first queries within a few minutes of first logging in. And yeah, it’s really, really all you need to bring us your data and a great attitude.

Kostas Pardalis 31:45
Okay, that’s awesome. And you mentioned notebooks. Can you take us a little bit into this? And like, first of all, tell us a few things about notebooks in general, because it’s not like everyone. I mean, everyone has heard probably have like a Jupyter Notebook, but it doesn’t mean that everyone has used one, right? So there was a little bit about the history of notebooks, why they were created and like, like what they are, and they are, what shakes brings that is new, and maybe fixing on what notebooks we’re doing so far?

Barry McCardel 32:21
Yeah, of course. So notebooks for the unfamiliar are a format for effectively for coding, which is where the code is broken up into cells. So it’s like chunks, and you can evaluate those cells of code independently of each other. And those cells will show you the output the results of that this is a form of what’s called literate programming, which is a term that basically refers to a programming style where the code and the outputs and the narrative, like the explanation of it are all sort of tied together. Notebooks really excel for workflows that are exploratory or iterative, where being able to run just that one little chunk of code, instead of the whole script is really useful, I’m gonna try a different technique for this, I’m gonna try a different binning, I’m gonna cut this a different way. And you can sort of do that and immediately see that output, which is really great for sort of iterating, through things you’re working on. But notebooks also have a lot of problems. And like, there’s this sort of, like famous set of critiques about notebooks, there was a talk that Joel Gruevski in Jupiter con 2018. That was called, I don’t like notebooks, so it was a bold thing to don’t do a Jupiter QCon. Effectively, he was sort of calling out rightly, a lot of the issues, notebooks have, specifically around state. And because the cells are broken up into these different chunks code, you could actually run these on the border. And they’re running through an in memory kernel, which stores state. So if I have a cell that says x equals one, an Excel says x equals two, if I run them in order, x in memory will be assigned to two. But if I run them out of work, then all of a sudden, maybe it’s one, maybe it’s two, which cell did I run last? Without going too deep in that you can wind up in these really weird spots where like, inconsistent state, and this causes three big problems. There’s a problem around reproducibility, which is, if I want to go rerun this, am I getting the same results? No, books are sort of notoriously difficult to make reproducible. It has problems for interpretability, which is like notebooks can often I think, be very hard to know what’s going on the codes broken up into a bunch of different places and media was run out of orders, like how does this thing work? I’ve had this for going back and looking at hooks I’ve made myself, months later, like what is this thinking we do? But it’s also true, if you’re trying to collaborate on a notebook, being able to understand what’s going on is really important. And then the last thing is performance. The way that a lot of people wind up solving this just constantly restarting and running all like just run all the code again, top to bottom, basically like a script, which is very high overhead. There’s also other So that’s all the problems around state with no bugs. There are also a couple other big problems with the notebooks that traditionally work is one is scale, which are traditionally run in memory kernels, which is great. If your data fits in memory, it’s very fast and snappy, but it’s awful if your memory if your data is bigger than that. And in this cloud data era, we are in a time when people are storing terabytes of data in cloud data warehouses. And that in memory model does not really scale very elegantly. And then finally, I think there’s a problem around accessibility. Because as you mentioned, there are a lot of people I’ve heard of notebooks by now that might not use them. A lot of times, that’s because they’re very hard to use, like you have to like, both understand the state and scale issues. You also traditionally had to be able to like install a Jupiter or Python environment locally, and then install Jupiter in your package management and Environment Management. And we’re trying to roll your own data connections using SQL alchemy. And the whole thing was very messy and very difficult to use. And you’ll often see like a new data scientist start at a company and then have like a lost first two weeks just trying to get all this work. And it’s so it’s no wonder that there are millions of people working in data every day who haven’t traditionally been able to access these workflows. So that’s sort of the background that we walked into this one, if we’ve been longtime notebook users, and I’ve used them for years and years. It’s already very familiar with all these problems. And I think at least part of what we came at hex with was, what if we fix those things? I think a lot of people look at notebooks and they look at all these problems. They’re like, well, we should get rid of notebooks. Everyone should be writing things like scripts or whatever. And we kind of came at it from a different angle, we were like, well, this is stable, you can build a really good version of this because this format actually rocks. And so I don’t call hex, like a notebook company. But I think at the core is that we built a really amazing experience around this because no book concept. And there are a few parts of that. So one, we I mentioned this earlier, but we made up a fully collaborative online hosted so no more setups, one click create a new notebook, we manage the environment for you, it’s very, very easy to get started. It’s very easy to connect to data. So we have built-in SQL cells are really kind of the first to do this, where it’s very easy to set up a data connection, right SQL right in your notebook, immediately visualize your outputs. In hex, you can actually go back and forth between SQL and Python or just work in one or the other. So we actually start to open it up to this universe of people where SQL first users are SQL literate and maybe haven’t necessarily learned Python.

And then around that state issue, we had a pretty big innovation error around what is a we call a fully reactive execution model. And that is where each cell in the notebook is treated as effectively as a node in a DAG. So folks will be familiar with DAGs, from a lot of tools at the ETL orchestration layers like DAGs, or Airflow, DDT, which are all sort of built trying to DAG concept, we bring that concept to notebooks, and we say, we each cell is really just a node, and variables that are referenced between cells. So like x equals one, if I’m referencing x in another cell, that’s a link, then it’s an edge between those cells. I’m modeling a notebook this way. And by turning it into something reactant, which means if you modify one node, only the downstream nodes update, you actually get a lot of advantages, and it solves those three problems, as I mentioned, it is much, much easier to reason about the state. And so your state is always in a consistent, clean place, and it’s reproducible. So it solves that problem. If you run something, it’s always gonna run the same way. It’s more interpretable, we have a nice DAG UI. And next, we can actually see your full flow of your logic in your project, which makes it very easy to literally visually see what’s going on. And it’s way more performant, you don’t have to restart and run all the kernel every time, you can just change one salad only the things that need to be updated, while the so this was a lot of hard engineering work at the backend and the front end that went into this. But the net result is a product that really solves a lot of these problems around notebooks. It’s also just much more interpretable accessible to a big population of people. It’s very, very interesting, very interesting that a lot of our customers, most of the users were not using Jupiter before. These are people who were using SQL scratch pads, or they were trying to do their work in a BI tool or spreadsheet. They don’t even have a baseline of understanding how hex is better than Jupiter. They just know x is great. And I liked it. And so that was really always part of the mission. Frost is like opening up access to these workflows to a bigger group of people. And it’s very cool to see how we’ve been able to bring the great parts about notebooks to like a 10x bigger audience than could have taken advantage that before.

Kostas Pardalis 39:35
Yeah, absolutely. So if I understand correctly, and this was like, my, like the perception that I had about notebooks. There used to be like a tool menu for data scientists or let’s say, not a tool for data analysts, okay, like you had a data analyst who was mainly working in a BI tool. And then you had more niche use cases with data scientists. Unlike some other like people that are using like notebooks, and you mentioned that you see, like, it changed there like more and more people that didn’t have access to that they are using it now like through Hex.

So who is like the typical user today? That’s one question that I have and the second question is, how does Hex as an organization compare to more traditional BI tools in general?

Barry McCardel 40:29
Yeah. So it’s really interesting when you mentioned like notebooks are for traditionally just for data scientists. It’s kind of worth asking why. And I think this, there’s this sort of baseline problem that we see in a lot of places where like, there’s these, like tools. And it’s like, depending on what you’re doing, and what language you’re in, you’re jumping between different tools, like, you got people who are just working in spreadsheets, you got people who are in no code BI tools, you have people who are like SQL scratch pads, or SQL IDs, and then you’ve got people over in notebooks. And it’s kind of these like, artificial barriers just based on like, do you know a certain language? Or can you get like a Python environment working locally, I don’t think there’s anything about the notebook format, that makes it like, uniquely useful for only people who are doing modeling. In fact, I think notebooks are probably even more useful for a lot of like, exploratory analytics workflows, they just weren’t available to people because super high overhead hard to get started, and really only worked if you were working primarily in Python, which is very widely known but not nearly as widely known SQL in the analytics world. So I didn’t just have the first instance like we came at us and looked at this is like, a notebook should just be able to be used by a lot more people. And so when you ask about the core users that we see, Hex given time, maybe most, we don’t have job titles for everyone. But like when we just empirically talk to users, or customers like SQL, first data analysts are a huge, huge part of our user base, they get an 100 utility out of packs. And it’s awesome to be able to do SQL work and x because one amazing thing like you can do SQL on SQL like you can we call it chain to SQL or indexes, club data frame SQL really just means you can have one SQL query and another SQL query that queries the results of it. Anfor the SQL heads out there, this is effectively what you might do if you’re using like CTE is like the with as statement. In hex, instead of writing like a three-page long CTE, you can break this up into cells, you can actually see the results of each step, you can add them chain together. So it’s much more elegant, much more powerful. And so there are all sorts of great things we’ve been able to bring to that sort of like, analytics workflow that has nothing to do with like ML modeling, or I think what traditionally got labeled as quote unquote, data science. And so that’s very, very cool tries to see on the other hand, people who are Jupiter users, people who are Python, Easter’s who spend their day building models, and Jupiter X is familiar and powerful, and has all the best things about notebooks, we fixed a lot of the problem. So I had to really think of the product was is this low floor, high ceiling product that should be accessible to a much bigger population of people, but not artificially constrained you, which is what you’ve seen in our last generation of products. And why you have five different tools and you’re jumping between them, it’s either low for low ceiling or high, high floor high ceiling, you would actually really think that people should be able to collaborate between these personas and these work clothes in a much more seamless way. So to the second part of your question about where it fits in, and most of our customers, we are deployed alongside or very complementary to the traditional BI tool like Tableau or Looker. These are products that have I think, a pretty specific and, and well-understood mission around like, I want to build some point and click dashboards that people can go and look at once a week or maybe there’s a bigger population of non-quote unquote, non-technical people, non-data, people who want to be able to go point and click and look at some metrics. Those workflows are important. They have the plates. But if you go talk to most folks who are in a data analyst, or data scientist, or even just much broader population of people who are just data literate, they’re not actually spending their days and those BI tools. They’re spending their days in notebooks or sequel IDs, or, in many cases, why are they not that good spreadsheets, but they’re dumping data out. Because BI tools are really not built for deep exploratory workflows. They’re not built for the type of flexible off-roading analysis that a lot of people are looking to do. And they’re certainly not built for what you would traditionally think of as data science. And so hex really fits a big gap alongside these now, what we see most of our customers is we fall into this, there’s a lot of workflows that used to wind up in BI it kind of shoehorned into BI, that now can live much more natively in hex. So I don’t think of us as competing with traditional BI but we do wind up having a bunch of workflows move over in some ways, take some pressure off those tools to be like, be everything to everybody. Anyway, you’ll see a lot of people do is like, Tableau or whatever. It’s like the only way I can build a chart that I can like share and give to other people to visualize. So the hobbies really to try workflows like getting data into a chart tableaus, they can just have a UI that x is a much better native way to do a lot of the things most people are trying to do. So at least today, they’re very complementary.

Kostas Pardalis 45:17
Okay, that’s super interesting. I’d love to hear from you also, what do you think about the future? Because the BI market is, after like the acquisition of Looker, like we haven’t seen like that Marquez, things happening there. There was like consolidation happening. We had like, sigh sense and Periscope data like getting there, which was interesting. Also, because you had a tool, Periscope data, primarily used by more, let’s say, data, science, people merging the world of BI.

Barry McCardel 45:52
We have a lot of ex-Periscope customers and team members now at Hex. So yeah, very familiar.

Kostas Pardalis 46:00
Yeah, yeah. So how do you see the future? And do you feel like Hex is part of a new wave of innovation?

Barry McCardel 46:07
Absolutely. We see ourselves as very, very much part of that, I think, I wonder sometimes how useful the term BI event is either extremely broad and encompasses everything that’s a chart, or you can also prescribe it a little more narrowly. And like, it’s, it’s referring to a class of like, reporting dashboard tools like Tableau and looker. Either way, I think we’re coming into a very dynamic, new seeds of a lot of upheaval. And there are a couple of really interesting trends here. One, I think you’re just seeing a much, much broader set of people who were what I would call analytically technical or data literate, where they can think about data and more sophisticated ways. They can reason about tables and relationships between things. They, in most cases, can actually write SQL and we see this not just quote unquote, data, people, but we have a lot of hex users who are peons, or marketers, or salespeople, we see all sorts of different personas using hex. So as population is really growing quite fast, and these are people for whom a lot of the work they want to do just does not fit well in that traditional BI paradigm. Second, there are these bigger secular trends, we’ve seen one being the advent of like the cloud data warehousing, so allowing data to be available at these bigger scales, I think there’s a new set of assumptions around what analytics tools that fully embrace that look like. And then I think, depending on who you’re talking to, it’s called the metrics layer or the semantic layer or something. But there’s, there’s almost disaggregation of BI and unbundling of BI that seems to be happening, where BI have traditionally included both the visualization layer and the metrics and modeling layer sort of coming apart. And there are companies like transform and DBT, and others that are really looking to sort of have that metrics layer either be standalone or integrated into the bigger data transformation pipeline, I think that’s gonna actually have a lot of big downstream effects. We partner very closely with the folks at DBT. And the folks at transform and X has integrations with both of them. I think we, as you see this part of the stack come to fruition, I think we’re gonna see a lot of interesting things happen. And candidly, like, I think it has implications for our product where you can imagine being able to, in a hex project in a hex notebook effectively be able to have the first cell instead of it needing to be an SQL query, or pulling some data in via Python. And give me a metric cell where you’re able to pull in data from one of the sources and start working with a downstream at that point, like, what is the line between that and NBI? What does this mean for BI, how these different layers of the stack are configured? I think there are a lot of interesting questions. And we have a lot of ideas on where this goes. And so we’re very excited about the next, the next chapter of us. And we’re very focused on a bunch of the things we want to go build around that over the next few months.

Kostas Pardalis 49:02
Awesome. And I’m looking forward to see what you’re going to be releasing in the next couple of months.

All right, I have monopolized the conversation. So I think I need to give some diamond specials to Eric because I see he has some questions he wants to ask. So Eric, all yours.

Eric Dodds 49:20
Well, I’m controlling the recording so we can go long, which is always super exciting when Brooks is away.

I’d like to cover one last subject. So very, we’ve been talking about analytics and ml as sort of two distinct, separate workflows, separate teams, even in many ways, like you were talking about Hex users, right? And saying a lot of our users, like, hadn’t ever even used notebooks before, right? Maybe most of them at this point, which is super interesting. And so Yeah, I agree with, like, if you think about the discipline of analytics and like building ml models, like they are distinct, right? Like, they’re, there’s overlap, right, but like, even sort of the workflows, and a number of other things like that, right. But how much crossover are you seeing? So if you think about building an ML model, a huge part of that workflow is getting the data, right, in order to be able to actually build a model, right? I mean, that’s, you sort of have to have that as a starting up, right. And so, if you think about traditional BI, you’re sort of prepping data to show up in a dashboard and Looker, Tableau are great, right? People can click around and sort of learn what they want to learn. But the way that you’re talking about people building analytics workflows, yeah, it’s like, you’re, there’s like, it’s bleeding over heavily into like, Okay, well, you’re basically halfway there to sort of doing the work, the workflow that’s required for model building as well. So it was, it was really interesting for me to hear that, like, a lot of us hadn’t used notebooks. So two questions. Do you think that the Venn diagram will have increasing overlap over time? And then two, do you think that’s a good thing? Like, is that something that you want? Or even are building into Hex?

Barry McCardel 51:29
Yeah, so Wow, that’s an awesome question out there. I got a general philosophy of like, I do think there are two distinct parts. And what we talked about like the data world, writ large of analytics, and then ml engineering, like I think with analytics, your deliverable, your end result, your job to be done. It’s like you want to influence a decision. Like you’re going to try to fight asking me if there’s some question, and you want the results, the answer to that question to influence the decision had to be kind of have a bigger picture of that we say, you’re trying to contribute to knowledge, like how does your organization know what’s true, and, and be able to make decisions based on that, that is kind of fundamentally the story of analytics. And it’s important, and it’s big. And there’s a lot of people who do that every day. And I think that’s just going to become more part of the firmament of how everyone does their job. On the other side, you see ml engineering and the deliverable there is like a trained model. It’s like a prediction, it’s like, often it’s like an endpoint, you may see a lot of ML models being developed to be able to run online, where it’s like a real-time fraud prediction score, or it’s fear or something like that. That is a very different world. It’s a very different toolchain. And there’s a lot of really interesting stuff over in that side of the camp around model training, hyperparameter tuning, and no ops at deployment and monitoring and understanding drift and scaling models out. There’s all sorts of really interesting stuff over there. Now clearly, these two camps share some firmament. In both cases, you’re using data, and then you’re probably sharing data infrastructure, maybe the data in both cases coming out of a warehouse, maybe you’re using something like a DVT, to do the Data Prep and transformation for both of them. But I really do feel like these are separate workflows. And one interesting point here is like, this title data scientist, I think, has often been like conflated or maybe it spans both of these, but you do talk to like 10 data scientists, they like seven of them are really doing analytics. And they might be doing like very statistically rigorous analytics, they might be doing, bringing interesting predictive techniques and analytics. But fundamentally, they’re there to help influence a decision. And I think the people were doing more of BML engineering, I think, are actually just starting to call themselves ML engineers, actually. And their workflows wind up looking a lot more like the software engineers workloads. So I this is sort of a macro theme of where I see this going. Now, it is true that in analytics, I think there are really interesting opportunities to bring ml techniques into the analytics. But that doesn’t mean that it’s converging with ml engineering. And I think, as you were alluding to in ML Engine UI workflow is often the earliest parts of those workflows are doing some analytics to understand the data, whether it’s just understanding maybe a problem that exists that you want to solve it your model or to do data prep and understand it. So we do see a lot of ml engineers use hex in that phase of their work. But it has not been a focus of ours X to like go into like a full stack ML platform, like there are a bunch of really great tools, whether it’s ml flow and weights or by weights and biases, or all sorts of other agents that are really built around that. And so these worlds have some connectivity, but I think it is important to understand where you’re focused and as a product where you want to succeed. We are all in on analytics. We think that it’s a huge market. We think it’s has a lot of unsolved problems, and I’m very proud of having built a product that is just I and really accelerating and improving the analytics workflows of 1,000s of people every day now.

Eric Dodds 55:06
Yeah, I think that’s a super helpful distinction between sort of statistically rigorous analytics. Yeah. And I would say maybe like, another way to rephrase what you were saying is like, actually delivering a model or like the results of the model as an experience, right? Because in order to actually take that and like, deliver it as part of whatever it’s, like, a recommendation on a website or whatever. Like, it is software engineering, right? Like you’re, you’re having to like deliver like a literally, there’s like a development lifecycle, and a lot of software infrastructure required to actually take that and—

Barry McCardel 55:48
Yeah, these people are typically using IDAs, they’re deploying their code through SDLC. They’re running see ICD, there’s a whole world in model, ml ops now around deployment and monitoring, and like it’s got its own similar and parallel world is like DevOps. Yep. Now, I do think there are some really interesting opportunities to bring software engineering best practices to analytics. And I think we’ve seen this as a sort of like a lot of things being defined as code movement, I think one of the best parts of CPT in my estimation, is that you manage everything through pull requests, and it’s all code and like, there’s a lot of great things there. That’s that separate to me that the story of how ml engineering gets really started becoming much more of a software engineering discipline.

Eric Dodds 56:31
Super helpful distinction. Okay, last question for you, and this will be a little bit unfair. I love doing this. Outside of signing up for hacks and making a lot of the analytics workflows easier, for our listeners who work on Analytics or work on data engineering workflows or data teams that are part of analytics workflows. If you could give them sort of one piece of advice, maybe especially the ones who are earlier in their career as a practitioner, who’s now building a product and sort of serving people, you have a unique perspective on that. And so and maybe you could, you could give a couple of pieces of advice because just one is kind of tough.

Barry McCardel 57:14
Yeah, well, I actually do have sort of one big piece of advice that I keep coming back to. I actually think a lot about how data teams can be more impactful companies are investing a lot of money in their data teams, they want to be able to get some impact out of that, I wanna make sure it’s moving the needle, you’ll see this show up a lot, where people asking like, what’s the ROI of a data-driven? What’s the ROI? As if, as if that’s something that can just go be like, penciled out. And you’ll get into these weird exercises where you’re like, well, we built this model. And don’t these five dashboards, and we think they helped do this thing 10% better and it’s like—

Eric Dodds 57:52
You need the data scientist to answer the question about ROI of the data scientist.

Barry McCardel 57:59
And really, I think this is kind of it. I understand why it happens, kind of silly. And so my big piece of advice to people who are on data teams, or starting data teams, or running data teams, is the way that you’re actually going to feel that impact is if your data team is embedded and aligned with the actual functional people in the business. I think the last thing you want to do is set up an ivory tower. Just sit together, they only talk to themselves, they’re off doing sort of like r&d. And the results and the things they’re building aren’t necessarily influencing decisions. And so we built this model, and as this prediction, and what was the impact of it? Well, will it be this? You get that a lot less which folks, and I think personally, it is kind of an org chart thing. But I think like focus on data teams should be really closely aligned, I think they should planning their work. And they should be embedded with teams that I this even gets down to like how you’re setting like things like OKRs and goals, like, I kind of don’t believe that data teams should have big sets of their own OKRs. I think that individuals on data teams should be accountable for OKRs and sharing OKRs with folks on marketing, or ops or product, or sales. And I think if you do that, when you’re asked about the ROI of your data team, you as the data head of data, or the data scientist or data practitioner, don’t have to go and try to pencil it out. You can redirect that person to those stakeholders. Because if you’re doing good, you’re doing your job. Well, the VP marketing or the product manager, the head of ops should be the one standing up and saying, oh, no, here’s why we couldn’t have done this without Amanda embedded with us. This was a huge part of our ability to go and solve this. In fact, you should have those the quarters advocating for you to have more headcount. And so I think that’s really important. And I think last, when people are doing a lot of data work is like, how is this actually going to get used? Is this actually going to move the needle and is the work that I do? When closely aligned with the needs of the business, it’s such an important thing. And I would encourage people to really stay focused on that, hence we have some small part of that. And I think we helped make that data work more useful and usable and easier to share. I think we help influence that. But I think it really needs to start with how you’re thinking about the role of your data team and how it’s organized within the company.

Eric Dodds 1:00:20
Love it. Amazing advice. All right. Well, we are over time. Sorry, Brooks, but Barry has been an incredible conversation.

Barry McCardel 1:00:30
It was my pleasure. Thanks for having me on. And really enjoy the show and hope folks enjoy the episode.

Eric Dodds 1:00:36
Kostas, my big takeaway— Well, there’s so many actually. I’m going to actually—because I’ve already broken so many rules with Brooks gone—I’m going to actually have some self-control here and only have one takeaway. So the commentary around the distinctions between work in analytics and work in ML, was really helpful. And even though we didn’t talk about that for a super long time, but I thought it was really helpful, how he pointed out that in many ways, I think he said seven out of 10 data scientists, if you talk to them about what they do, you can really roll a lot of that into actually like analytics work and it may even be predictive analytics, but it really sort of falls on the analytics side of the house. And that was just very helpful. As you sort of lookout in the landscape and job titles and all the gray areas and crossover. I just thought that was a really, really helpful perspective. How about you?

Kostas Pardalis 1:01:36
I think it’s pretty hard for me to come up with just one thing. I’m good at keeping from this conversation. Over was like a great conversation we talked about, like, first of all the advice around building a product at an early stage. That was great. We talked a lot about notebooks. Like when I hopefully like people, more and more people out there will hear about them and give them a try. It is a very interesting, let’s say, computation model, and combines like shakes really innovate on them and make them more accessible. And we should be able to consume data in a more exploratory and normative way. Right. And that’s great. That’s something that is missing from like the BI tools out there. And yeah, like, that was great. And those are the conversations that we had around like BI and the next wave of innovation there. So yeah, I’m really looking forward to have Bart again on another episode in a couple of months and gets even deeper into these questions and more about visualization and data platform notebooks and beyond.

Eric Dodds 1:02:48
Absolutely. All right. Well, thanks for joining us again on The Data Stack Show and we will catch you on the next one.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.