Episode 197:

Deep Dive: How to Build AI Features and Why it is So Dang Hard with Barry McCardle of Hex

July 10, 2024

This week on The Data Stack Show, Eric and John chat with Barry McCardel from Hex. They delve into the technical, business, and human challenges of data work, emphasizing AI and data collaboration. The discussion also covers Hex’s product updates, the complexities of building AI features, and AI’s impact on data teams. The group explores the unpredictability of AI, the need for extensive evaluation, and the iterative process of refining AI models. The conversation wraps up by touching on industry consolidation, vendor competition, the dynamics of cloud platforms and open source technology, and more. 

Notes:

Highlights from this week’s conversation include:

  • Overview of Hex and its Purpose (0:51)
  • Discussion on AI and Data Collaboration (1:42)
  • Product Updates in Hex (2:14)
  • Challenges of Building AI Features (13:29)
  • Magic Features and AI Context (15:22)
  • Chatbots and UI (17:31)
  • Benchmarking AI Models (19:06)
  • AI as a Judge Pattern (23:32)
  • Challenges in AI Development (25:31)
  • AI in Production and Product Integration (28:43)
  • Difficulties in AI Feature Prediction (33:38)
  • Deterministic template selection and AI model uncertainty (36:21)
  • Infrastructure for AI experimentation and evaluation (40:11)
  • Consolidation and competition in the data stack industry (42:27)
  • Data gravity, integration, and market dynamics (47:12)
  • Enterprise adoption and the bundling and unbundling of platforms (51:03)
  • The open source databases and the middle ground (53:18)
  • Building successful open source businesses (57:00)
  • The fun approach to product launch video (1:01:14)
  • Final thoughts and takeaways (01:03:15)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show.

John Wessel 00:07
The Data Stack Show is a podcast where we talk about the technical business and human challenges involved in data work.

Eric Dodds 00:13
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. With one of my favorite guests we’ve had on Barry McCardel from Hex, it has been a while since you’ve been on the show. How long ago were you on?

Barry McCardel 00:38
A couple years ago, maybe? Yeah,

Eric Dodds 00:39
That’s crazy. It’s been way too long. Well, for those who didn’t catch the first episode, give us a quick overview of who you are and what Hexes Yeah,

Barry McCardel 00:50
Well, thanks for having me back on. At Hex is a collaborative platform for data science and analytics. We built it largely just to make it an incredibly selfish company. Really, it’s just we built the product we always wish we had. I’ve been a builder and user of data tools. My whole career. And the longest stint was at Palantir, where I met my co-founders and a bunch of our team and got to sort of delden Lots of different data solutions for a lot of different data problems. And, you know, Hex is built to be this product, we kind of call it this multimodal product that is able to bring together teams and workflows and sort of unite them in one place around being able to work with data in a much more flexible, fast and integrated way, then the tools that we have sort of struggled with before we go deeper into that, but that’s the high level. Alright,

John Wessel 01:41
So one of the topics I’m excited about talking about is AI, and bi together. So let’s make sure we get to that topic. What other topics do you want to cover?

Barry McCardel 01:50
We talked about a ton of things. I think this is a very interesting time in the data stack. As we’re recording this, the Snowflake and Databricks conferences are just over the last few weeks. So it’s very interesting just being there and sort of a good chance to check in and where data is going. I think the AI topics are super interesting and rich. Tons we could cover. Yeah,

Eric Dodds 02:10
yeah. Awesome.

John Wessel 02:12
Well, let’s do it. Yeah, sounds good.

Eric Dodds 02:13
Okay, Barry, there’s, I mean, a million things I want to cover. But can you give me let’s just say I didn’t even look this up. Shame on me. I didn’t even look at when we recorded our last episode. But let’s just say it was a year and a half ago, which is probably directionally accurate. What are the biggest product updates in Hex? In that time period? Can you rapid fire us? Wow. You guys shed so often, then we said that was a very unfair question. Just if you’re

John Wessel 02:47
at least get a chance to pull up the release notes.

Barry McCardel 02:51
No, it’s good. I mean, I, you know, I’ve got all my head somewhere in there. Yeah. Yeah, look, the, I think longitudinally, like, you know, we started very deliberately wanting to build a fast, flexible product for a data work that sort of filled this gap that we felt was really around, you know, you go into most companies to see a data set, most companies at a certain size will have, you know, a sort of BI tool. It’s very dashboard centric. Yep. And that’s all well and good. But I think every game team I’ve been on or been sort of exposed to, like 80 90% of the data work was happening outside of that, you know, it’s in some menagerie of like, random sequel snippets, and Python notebooks and spreadsheets and screenshots of charts in PDFs of decK sent in emails is like the way to communicate. Yeah. And I just felt that pain, so viscerally. I sort of felt hacks to help solve that the long term vision was, you know, increasingly unify and sort of integrate a lot of the workflows, the way we’ve talked about internally, we want to build that sort of front end of the data stack, where you don’t have to jump around between tools. And both as an individual, you can bring workflows together. Or your workflow can be more sensible, because it’s together, it makes it easier to collaborate as a team and then expose this to more people in the org. So we talked a year and a half ago, I think we’re pretty far down that path. You know, just in terms of maybe the first era of that. Yep. I think people thought of Hex and maybe still largely do think of Hex as a no book product, you know, almost like a super flexible, fast product for that, I think, if we’ve expanded, we’ve grown. There’s a bunch of things. So about a year and a half ago, we introduced our AI assist features, I think we’re pretty early to that. We’ll dig in a ton on sort of how we’ve done that and where we think that’s going but that’s been awesome. They’re Color Magic features and that was a huge thing. We’ll go deeper on that. We’ve built a whole suite of no code tools in Hex. I mentioned this word earlier. Multimodal it’s kind of a buzzword in the AI space right now. But really just means like being able to mix these different modalities together so you know in Hex for long Tiny you’ve been able to mix like Python and SQL like that was actually like, it maybe sounds prosaic, but it’s actually it’s like pretty profound in terms of a lot of people’s rows. Yeah, for the first people to be able to bring that together really fluidly. But since then we’ve integrated more. So we have a whole suite of no code, Sol’s like charts, pivots, filters, right back cells. And now, you know, a few weeks ago, we launched a bunch of stuff that’s effectively being like spreadsheet functions into our table cells. So you can start to work that you can actually do a full end to end workflow in Hex now, fully no code. We built a bunch of tools for governance, whether it’s data governance, or sort of reviews, like get style reviews on projects, endorsements, I don’t know, it’s a long list. Yeah. But effectively, the focus has been how do we expand the number of people who can participate in these workflows while continuing to make the product more and more powerful for that core data team? That’s really at the center of the decisions companies are making? Yep.

Eric Dodds 05:57
Love it. Yeah, I actually didn’t, I had this thought when you were saying screenshots of dashboards. And I was like, How often is slacking? A, literally a screenshot in Slack? Like, it’s like, someone’s main source of data is is literally just,

Barry McCardel 06:15
It’s one of those things where actually, I wonder, I think the majority of the database is probably consumed via static screenshots. Yeah. If you just backup if you think about, like, charts, index, charts and slack, just pay an email, like, it’s got to be totally it’s, you know, in some ways, that’s a very old problem, in some ways. eminently unsolved? Yeah, totally.

John Wessel 06:37
But I think there’s actually a reason behind that. I think part of the reason is, like, say it’s for an executive, they want a signature behind it, right. They want to be certified as Correct. Per X analyst as of this day, you know, like, they want some kind of like, yeah, yeah, that’s,

Barry McCardel 06:54
I think it’s actually a really interesting segue into some of the AI stuff, because I think that you just said they wanted to get a signature from an analyst on it, right? That’s like, yeah, yeah. I think it’s something I’ve thought a lot about , like how AI is gonna show up. I think there’s, like, really reductive like, oh, my gosh, I just saw GPT for the first time, and I saw that it can write a sequel is like, yeah, right. It’s gonna do, it’s gonna be like an AI data analyst. I use a bunch of reasons that are not going to happen, or it’s not going to happen overnight. But my favorite sort of weird reason is like culpability. Yeah, I live in San Francisco. And they’re self driving cars like Wei Mo’s all over the place. And I take them, I love them. But I think it’s very interesting to me that they still live in this regulatory purgatory, which is like, every day all across America, like legions of eager, distractible. 16 year olds get driver’s licenses. Yeah. Like self driving cars. It’s like a multi year slog to get them approved even though like, right, any objective observer would look and be like, a way More is less likely to hit and kill someone than like a 16 year old. Right? Right. Like, I think there’s I think it’s like a societal justice thing. Actually, that’s like, we know what to do with a 16 year old who likes to hit and kill someone. And it’s like, some version of like, our justice system is set up around that. Yeah, it’s like, what do you do when a self-driving car inevitably just like, you know, if you drive enough miles, there’s gonna be an accident? Yeah. What do you do with a self-driving car? Like, total? If you put it in, like a robot prison? Like, yeah, and I bring it back to data. Like, I think about this with AI, like data analysts say AI data scientists, there’s certainly a lot of companies that market themselves that way. It’s like, if I’m like, Hey, should we increase prices, and I’m gonna go do an analysis on that. And I asked the AI to go analyze a bunch of data on that. And I get charts. Like, my, who’s behind who’s standing behind that analysis, if the price increase doesn’t work, or whatever the decision is, like, you know, I think with a human analyst, or human data scientist, like someone’s credibility is on the line. And there’s like a reporting chain of people whose credibility is like, earned the right to do that. And with me, I like what do you do you like, fire the body? I guess, you know, you’d like to turn off whatever AI software you like? Yeah, it’s just a funny thing. And I think there’s, I think our society is actually going to have to learn how to like, develop this tolerance or like way to handle but defect rate, or the inevitably inevitable, like, stochasticity of like, these AI systems, in a way, we’re just like, not well equipped to do today. So

Eric Dodds 09:24
AI capability in the context of data and analytics, if you have an end user who likes engaging with data, who is not an analyst, right? They don’t know SQL, they don’t know Python, ethical business users. Exactly. Yes. Yes. And this mythical business user, and this like, mythical company, where everyone uses data to drive, you know, every decision that they’re making. But do you see that creating a lot of extra work for data teams at companies where you sort of implement because I mean, what there’s, and maybe it’s not even maybe these things aren’t even like fully You’re right. As much as the marketing would lead you to believe. Like, okay, in

Barry McCardel 10:04
The marketing is selling a startup. Marketing is selling capabilities. I went on this website and it says there’s an AI data scientist, I told you there’s no.

Eric Dodds 10:16
Like, you’re just not going to create more work for data teams, or is it like? I don’t know, just does that make sense as a question? Yeah, no, it sounds like because especially for critical stuff it is fuzzy. Who cares, right? Like, it’s directionally accurate, but hallucination can be catastrophic. For certain? Well,

Barry McCardel 10:33
I think there’s a few things. In some ways, this is a really new topic with AI. In some ways, it’s like a very old topic of like, self serve, generally, as a pattern in sort of, like the analytics market is like, Okay, if you set a bunch of people loose, there’s going to be a defect rate, like, you know, it’s going to be like, Well, you didn’t use that metric correctly. And that’s where people talk about semantic layers and governance. And we’ve certainly done a bunch, there’s more we’re going to do, but it’s kind of the same, I think it boils down to effectively the same problem with AI, which is like AI will also sometimes misuse the data. And it’s like, how much context can you give it? And, you know, whether it’s semantic layers or like data catalogs, or whatever, like the kind of pitch of those products that create AI is like, well, this is a place where you can put a bunch of context and guardrails around how humans can use this data, I think is effectively the same pitch for with AI is like, how do you do that? And like, it’s why we’ve built a bunch of those tools into our magic AI features Hex, which is like, you can incorporate, like, automatically sync your DVT docs or your column descriptions from Snowflake, or BigQuery. Or you can enrich it with an x itself to give context to the model. But does that mean it’s 100%? Accurate? No, we do our own evals, our own experimentations, our own internal stuff, like it does dramatically increase the level of accuracy and being able to use that isn’t perfect. No. And so the question you’re asking is like, when those defects happened, is that correct? Work on the data team, I think there’s a, there’s maybe some version of like equilibrium, of like, you’re saving them a bunch of time answering rarely accused, and then they’re having to do that. The way I see it is like, I think the best version of self serve of any type, whether it’s our augmented or knowledge, I really think is, in some ways, very profound and other ways. Pretty incremental is like, you’re taking a lot of the best version of this, you’re not replacing the data team, you’re taking a bunch of this sort of low level stuff out of their queue. So they can focus on the really intensive, deep dive stuff. And in many ways, you know, that’s self serving, of course, I’m CEO of a company that builds a product I’m going to sell to people but like, there’s really well we don’t have to do, which is like the deep dive work. We want to do the best way for data teams to get to them complex answers. Yep. And I think it’s an interesting question of like, does self serve AI augmented or not create? Give them more time for that? Or does it just create more work kind of govern all of that? And that’s, I actually think there’s no one right answer, I think it comes down to a lot of things like, to the way teams structure stuff, the style they want, what their level of comfort is on people sort of coming up with their own answers. Different people have different information. I don’t think there’s one right or wrong.

Eric Dodds 13:07
Okay, I have a John, I know you have a ton of questions. But can we get like, can we get down and dirty and talk about how you built your magic features, and maybe other stuff that you built? Because John and I have actually prototype a bunch of different like LLM, you know, application, AI apps, if you want to call it and it’s just way more difficult than I think, you know, in the startup marketing, you know, we,

John Wessel 13:36
We talked before the show, there’s this, typical like, Hey, I got this working in development. Now. Let’s productionize it. Yeah, there’s that typical workflow. And with AI, I’d say it’s order of magnitudes more between like Tevas kind of works versus production icing. Yeah.

Eric Dodds 13:51
And more context is, this morning. I knew we had a podcast recording, I forgot that it was with you. But coincidentally, I literally pulled up Hex, and I was just doing some experimentation on our own RudderStack data, with the intention of using magic and playing around with it. And it was great. Like, it was yesterday, I had a target table. I had a set of questions. And I was like, Okay, I’m gonna go like, I’m gonna go fight with this AI thing. I see, you know, man versus machine, the new one, or it was great. It was great. I mean, I mean, down to like, the fifth decimal point on one thing, I was like, this was awesome. Awesome. I’m really glad to hear that because it was really cool. But I also know that is non trivial from a product like it’s so it is, in fact, like, I don’t mean, we haven’t talked too much about this, John, but like, when you do that, it’s so simple that it the it’s almost an unhealthy level of obfuscation on what’s actually happening under the hood to make it happen. Good, which was part of a, you know, magic, I guess not to be too cheesy.

Barry McCardel 15:04
Yeah, that’s right. Yeah, I mean, well, again, thanks for saying that we’ve put a lot of work into that. So let’s Yeah, we should have a very practical conversation. Because I think that the big thing over the last couple of years is like these models have been working, they’ve been working really well on doing a bunch of things, and it’s easy to look at them. And it’s easy to set up a quick demo with an open AI API, but actually, building a production AI system is a much different thing. And it’s turning out that it’s really well suited to certain tasks, and not others. I mean, there’s a bunch of stuff here, I think the the pattern we’re using under the hood is something I assume people a lot of people are probably familiar with, which is like retrieval, augmented generation, it’s, as far as we can tell 98%, you know, some really high number of AI apps today are using rag, as opposed to like going and spending a bunch of time on fine tuning. Right. And, you know, under the hood, we’re using base models from open AI and anthropic, and kind of build it to be pretty modular, models wise. And the magic really is in the context of providing it. So it’s one of those things like Right place, right time if I didn’t have Gen AI and my bingo card. But when we started the product, it turns out that like the product format of Hex is actually pretty sweet for this, which is that sort of notebook UI, that’s the core UI that we built Hex around, we were kind of big believers in that we don’t think of the product just as a notebook. But the UI pattern works really well. And it works really well for AI, which is that you kind of have a bunch of upstream context, I mean, yes, you can just start de novo using AI to generate stuff. And that’s great. And the context we’re going to use there is context we have about your schemas, column names, and, sure, but also, we can go and look at things like all the column descriptions, we can, increasingly we’re tapping into other queries you’ve written and trying to pull up look more on text based on your actual workspace. But, you know, as you’re actually working through a project, it’s a tremendous opportunity for us to sort of pull in all of that upstream context. If you’ve been creating a certain table already, or you can use certain columns, you can join things in a certain way. That’s all context, we can provide the model. And there’s some really nice UI things we’ve done as well, like being able and that is the other hard part about AI is getting the UI right. I think there’s a lot of us in this very chatbot centric phase of UI design right now. Or like the default way to build an AI product is like, basically copy paste chat. GPT Yeah, yeah. I am a believer and pretty insistent, that is not going to be like all of SaaS is not just going to converge to chat bots, in wrestling, in terms of like the UI, will

Eric Dodds 17:42
have it right with their just single field. Very good, right? Yeah. Oh, it’s

John Wessel 17:46
all back to that. Right. Yeah.

Barry McCardel 17:48
But even if you guys caught the Apple intelligence stuff, I think it’s actually really interesting to apples. Jen, you’ve done a ton of Jenny, I work and none of it is chat. It’s like the other thoughtful ways to incorporate this into your product. Anyway, you know, even things like being able to add mention tables or data frames, there’s a bunch of stuff that I do, because it helps give context and also helps give confidence to the user like I can instruct this thing. Right. And one of the interesting things designing the UX for AI is it’s not just like an intuitive UX. There’s a really subtle thing of helping the user form a world model of what the AI knows that sounds so anthropomorphized. Yeah. Yeah, I think when you’re using an AI product, a lot of times you are kind of, in the back of your mind kind of thinking like, what instruction? Can I give this? What is Kiko? Right? What does it know about and I think being able to expose more of that you mentioned, obfuscating it. Eric, I agree, I actually think we want to expose more of that to users to help fill out that world model of like, what does this model know, what can it do? What can’t you? How might you want to modify your instruction to get the type of response you want? Yeah, that’s, that’s all really hard. For prompt perspective. It’s also, I think, really tough to get brighter, a lot of hard work to get right from a UX perspective.

John Wessel 19:05
So I have a question. How do you benchmark so all these models are changing all the time? So say that you’re like, alright, we want to use the latest, you know, Claude model, how do you benchmark between models? I feel like that’s a pretty difficult problem.

Barry McCardel 19:20
That’s an incredibly difficult problem. And it’s actually I’m glad you brought the exact example up, because we’re literally doing that now. testing out the newest cloud model. Yeah. 3.5 or whatever. Yeah. 3.5 summit. I think it’s, you know, you read the like, announcements of the blog posts about these models, they all sound like they’re gonna, you know, God’s gift to me. You know, the benchmarks like grades and all that stuff. You have to benchmark yourself. And this is a term that’s called evals, which is kind of just a very specialized form of tests. And we’ve built a pretty extensive eval harness, so there’s open source stuff, so there’s like the spider spider sequel benchmark Switch is sort of an open source set of benchmarks around text to SQL. And then there’s a lot of our own evals, we’ve written as well. And, you know, for us, our magic tools don’t just generate SQL, they generate Python code, they generate charts, they generate chains of cells. That, you know, you ask a certain question, you want to get a SQL query and then a chart after that. And so we have an eval built for a pretty broad set of this, we’ve had a lot of internal mechanical turkeying of like sets out. And what it lets us do is quickly run experiments in our experimentation framework we’ve built internally called spell grounds, which is basically we can very quickly say, okay, great, I want to test this new model out point, spell glance at that model, haven’t run a sample or all of the evals and then get back a results. And we actually see based on different types of generation tasks, whether it’s equal generation, CPython, generation, chart generation, change generation, text generation, whatever the task at hand is retrieval stuff, how good is it at those and what’s really interesting, even just upgrading from GPT, for turbo to four, you actually saw certain things, it was much faster. But you also saw tasks where it got better, you also saw a task that got worse. Yeah, you start thinking like, Okay, do we have to do some prompt tuning for this? And you can get the patient loops of like, okay, wow, it got worse at the end? Is there something about the model to do that? And it’s just taking a step back? I mean, as someone who’s been building software for a while this on so nuts like it is I old and primitive way to program, we’re like, increasingly used to it here I am on a podcast, like talking with authority about how we’re doing evals and prompt me to actually look at what you’re doing when you’re looking at this shit. And it’s like, okay, I’m yelling at the model this way. And I need to yell at you. I need to add another line that says, You will not respond with Markdown, you’re like, talking at it like, yes. You will not include Python. In your response when I asked NoSQL. Like, like,

John Wessel 22:05
yeah, it’s very threatening. A Yeah,

Barry McCardel 22:08
I’m so constructed. It’s like, you’re like the scene from zoo lander, where he’s like getting brainwashed to go, like, kill the Prime Minister of Malaysia. It’s like, you’re brainwashing these models you like, yes. You’re a world class analyst. Totally. Right, CTS for your queries. Yeah. It’s weird, man. And I, you know, here we are building a bunch of expertise and infrastructure and tradecraft on how to do this. But I can’t, I can’t avoid the feeling of like, we’re gonna look back on this whole era of using AI and being like, Well, that was weird. Yeah,

Eric Dodds 22:43
right. The point where I had a very similar, you know, you sort of set back and try to objectively look at what you’re doing was when do you remember we’re working on the content generation project something right? And John was hammering on this prompt hammering and this prompt, then I was like, Okay, this is crazy. And then it became like, a multi multiple steps of prompts to generate prompts, which was, I was like, this is like, that was the moment for me years, like, again, like you’re using an LLM to generate a prompt. Oh, yeah. Or to create a persona to create a persona that can then generate a better prompt.

Barry McCardel 23:28
We go even crazier, I’ll take you I’ll take you another level of sort of craziness on meta, we have adopted a strategy, we’ll publish a blog post about it soon for one of our AI operations, around editing, which is an AI as a judge pattern, which is basically you have the LLM generate something and then you have another model call, look at that response. And judge whether it’s an accurate Oakley

John Wessel 23:53
like a tree, what is that like a tree of thought? Is that what that’s called?

Barry McCardel 23:59
Which is also different. But it’s also a really interesting and weird thing, which is if you tell the model, like think step by step, you get better reasoning abilities. And it’s like,

John Wessel 24:10
yeah, it’s like,

Barry McCardel 24:12
It’s very spooky. And there’s technical explanations for it. I listened to this talk by one of the guys who co-authored that paper, the original paper about that. And there’s people who have theories on why this works, which is like by telling it that it will actually spend more tokens which basically makes it like thinking more, and you’re kind of forcing it to break things down. And it’s not that it actually has like reasoning per se, but it’s by like forcing it to spend more tokens on the task you’re basically making it think more than like, then that opens up this whole thing around different model architectures, that beyond transformers, which are undergirding what we think of as large language models today, which are basically spend the same amount of thinking reasoning compute whatever you have, we want to frame it across the whole generation versus other model architectures that are sort of still in the r&d phase that can more selectively apply that. And so then you can think about, Are there patterns down the road where you can even as part of the prompt, tell it, what parts to think carefully about? Are you able to steer its reasoning more, and it kind of gets in this very weird Hall of Mirrors, but we use this AI as a judge thing, which is separate, which is like calling the model and again, to look at the work of another model. Yeah, sure. Yeah. And it gets really weird. And then it’s like, what if what if the judge doesn’t like it, it sends it back to another generation? Yeah, along with comments from the judge about what it didn’t like. And it’s like, kind of facilitating a conversation between like two total robo calls. It’s like, Wait, what are we doing? You take a step back, he totally forgot for the entire history of software engineering, you’re basically used to these very deterministic things, right? Totally. In fact, any non determinism is like a bug. Like, it’s like, Okay, I’m gonna write this test. Yeah. I, there’s a unit test, when this function is called with these arguments, it should respond with the responses. Yeah, every time. And if it doesn’t do that something’s broken. And now it’s like, oh, you know, yeah, hands up your knees up new things. Yeah. Yeah.

Eric Dodds 26:18
I had a moment like that. This isn’t related to AI, but very similar feeling when I realized, like, like, we’re, you know, we’re, we have these models talking to each other, but I was taking my son somewhere, doctor’s appointment or something we get in the car, my phone automatically connects to Bluetooth, and I don’t know, whatever setting but it just automatically starts playing whatever, you know, audio I had been playing. When so of course, that morning at the gym, I had been listening to Gong, you know, sales calls. Gong on 1.75. And so it starts playing and these you know, it’s like rapid fire discussion using chipmunks. Yeah. My son was like, Are you on a call daddy? And I was like, oh, no, like, you know, and I paused it. And he’s like, Oh, then what was that? And I was like, I was listening to a recording of other people that he works with on a call. And he was like me, he just paused for a second. And then he was like, did they talk that fast in real life? And I was like, No, I speed it up so that I can listen to a lot of calls in a row. And so he was like, You are listening to other people that you work with, on a call that you weren’t on? Really fast? And I’m thinking, I was just like, Yeah, and he’s like, That is so weird.

Barry McCardel 27:41
Yes, son, that’s product marketing. That’s how I know you’re a product marketer, because you’re listening to sped up gone calls. Yeah. 10x 10x product marketer 1.7 1.71 point

Eric Dodds 27:53
1.75.

John Wessel 27:54
Make him into the next product marketer. That’s the key right there. Don’t

Eric Dodds 27:56
think the same thing where like, you step back and like, just like with an objective view of like liking what you’re doing, or like, this is weird when you say,

Barry McCardel 28:04
yeah, when you think about that, it’s weird. Yeah. I mean, I agree. And that’s, I think, going back to the topic of like, getting things into prod, it’s like, I think we learned really fast after chat GPT that it’s not hard to get cool AI done those working, like you had like, a couple of YC batches in a row that were basically like, using around GPT. It’s much harder. And I think what’s interesting to me is to look at some of these apps as they’re growing up. And I’m fortunate to know, the founders are a bunch of users both within data but outside as well. Whereas like an AI first thing, you know, where you’re going to say, Great, we’re gonna build this just a wrapper around GPT and, and evolve the thing versus not, you know, there’s a bunch of these now that are like aI support. Yeah, yeah. Companies, right, like, not within data, you know, customer support. Yeah. You get into it, and you’re like, Yep, there’s clearly a huge opportunity to change the way customer support is done. But a lot of these companies may well have to rebuild a lot of what, like the last generation of support companies did, like, it’s not clear to me that the startups won’t have to effectively rebuild intercom. Yep. And, you know, it’s a question of like, Can the intercoms and Zen desks get good at this faster than that. And, you know, we talked about that in the data space, too. But what I’ve really come to learn and appreciate is how much how hard it is to get this AI stuff working in prod, and how much it is dependent on having the rest of the product because we could not get the quality of what we do with magic, right without having a lot of the other infrastructure on product we built not to mention millions and millions of lines of code. And I looked at hundreds of 1000s of projects that people have built up over the years. We’re not trained, just be very clear. We’re not training these. There’s no IP leakage issue, but even just having the The product in the context that organizations have built up over time is really the thing that’s worthwhile. So that, that, you know, there’s a bunch of like aI specific tradecraft, but then it’s like, how are you incorporating the rest of them? What is really important?

Eric Dodds 30:15
What have you tried that didn’t work? You know, it’s just like, Okay, we have to stop going down this road of productizing,

Barry McCardel 30:24
a bunch of things and a bunch of things. And I think one of the interesting things in those moments is you step back and you say, Does this not work? Because of the models? And like, maybe GPT, five or six will be better at this. Does it not work? Because of when I don’t have the prompt? Just right, maybe one amazing product away from cracking this?

Eric Dodds 30:47
The judge?

Barry McCardel 30:49
Is there immediately i jury to Yeah, yeah, that’s it. I executioner? You know, sometimes?

John Wessel 30:58
I’ve even Auntie’s come about.

Barry McCardel 31:02
Justice System. Yes, bounty hunters.

Eric Dodds 31:06
Give it the context of literally the entire justice system. Yeah.

Barry McCardel 31:11
Because we’ll get one. But do not have the infrastructure ready. So like, there’s things we’ve tried, where it’s like, okay, we actually need to go build up some expertise or infrastructure and be able to do a different type of retrieval to have this work. So like, as an example, our Matt, there’s a feature we have in magic that lets you generate basically chains of cells. And we were the first version of it that we’ve sort of beta tested, let’s just say the model just generates any number of cells you think necessary to complete this task. And it would go and do that. But you would have this drift down the chain of cells. So just conceptually, you can imagine asking that question and having a gender like a sequel cell that retrieves data and then having a Python saw that operates on that data. And then like a sequel saw that reshapes that data using SQL and then like a chart saw that visualizes that or need see, it makes some really weird decisions. And we were getting down this path of like, prompt engineering it being like, oh, no, like, don’t make a bad decision, like think harder about it. And like, you get better at certain things and worse than others. And eventually, we fell back to like, Okay, actually, the pattern, the predominant thing is like actually one of the few templates, and we’re actually going to like, almost like hard code, those templates and have the AI select amongst those templates. Yeah. Now, when you look at that, when I think about that, I go. I know, the template thing feels brittle, it feels limited. And obviously, to me, the long term should be unconstrained change generation like that. That is, I think, like, an almost obvious thing that should work. Yep. The question we grapple with is like, you know, maybe claw 3.5 will be good at it, maybe it will be good at it. Maybe we just weren’t doing enough of anything, since we worked on a bunch of new techniques, people talk about, like, self reflection, we could pull in our AI as a judge strategy. And we think that there’s some ways that that could work better. And so it’s really hard to know. And again, like, I think we, we’ve been my team, and I’ve been building software data software for a long time, we can stare at most features and be like, do sprints, you know, quarter, whatever, you might get pretty quick. I’ve got like, you know, just enough data platter out. I can read an RFC. And if an engineer is gonna be like, This is gonna take six months. I’m like, No, it’s not. No, it’s not Yeah. You say I feature, there’s things that I’ve looked at me like, that’s going to be hard. And it basically works overnight. And there’s things that we’ve looked at and been like, that should totally work. And it’s like, this might be theoretically impossible. Like, it’s difficult to know. Another example is diff based editing. So like, when you asked to edit in Hex today, like if you have a SQL query, and you’re asked to edit it, we will go out and generate or basically pass the current query, or the codebook, or whatever, with your edit instruction, and some other context to the model and say, Hey, edit this code block to reflect this great, we’ll get a new block back. But we stream back effectively, the whole new generation, we show it to you side by side. Now, that feels slow. Because if you’ve got like 100 lines, Python code or query, you know, we’re streaming the whole thing back. So it’s like, well, can we ask it just for the deaths? That all at once is something that I think anyone who spends any time at all arms can very easily imagine like, okay, yeah, we’d send just the lines you want edited, and surely getting it to respond with the right lines and having a certain right place is really hard. And it’s like, okay, we almost gave up on it. We actually did for a bit, and we came back to it with a new technique, and then it worked. And so it’s really hard to know what is actually going to crack these things open and the space is moving so quickly. And it’s not like the models even just get monotonically better at things as I mentioned with GPT forro. It actually got worse at some things and it’s like, where are the lines where you’re betting that you’re just going to get this natural tailwind and rising tide from the model is getting better and you You can just kind of skate to where the puck is going and it’ll catch up versus like, the other hard work you have to do. That is I think what makes building products in this space. Really tough today, I don’t think people are talking about that. They’re not fine. Yeah, maybe. Sometimes I wonder, is everyone else having a much easier time with this?

Eric Dodds 35:17
John and I, yep.

John Wessel 35:18
Back to the template thing. I think there’s an interesting concept here. You’re mentioning like, oh, ideally, like it can just produce all the cells, and it will just work well, but I think there’s kind of an interesting interface thing here, where it’s like, Okay, is there something trivial, I can ask of the human that drastically helps, like on the AI. So maybe the human is like setting context for the sellers, like SQL or Python, like, that’s super trivial. But then with that context, AI is like, oh, like, I’m doing Python? Like, are there other examples?

Barry McCardel 35:48
If you use magic today, and you’re like, generate me a Python cell to do something versus generate me a sequel cell to do something and it will follow your instruction. And it will dutifully do that, because that’s passed in as part of the context. Yes, prompt is actually in this gets, you know, where it’s not just a single shot GPT-3 rapper, like, we have several different agent responses happening as part of that. When you type that prompt off what’s happening under the hood, like there’s a step over selecting the template to use. And so that that context you provide, it will help with

John Wessel 36:20
that. Is that deterministic? Like selecting the template? It’s non

Barry McCardel 36:24
deterministic and not as AI? Response? Like we have the temperature turned on, but okay, got it. Yeah. We pass it a list of templates. We pass it your prompt, and we effectively say, engineers come to like, it’s a lot more complicated than that, Barry, but we effectively say like, right, yeah, yeah.

John Wessel 36:45
For the listeners, like you adjust the temperature of like, the way I understand like, more creative, more varied responses, like that’s right, or more exact responses. Yeah. So it’s that two more exact responses, because it’s like one of the few templates,

Barry McCardel 36:57
right, and because we have that temperature turned on, it’s one of the things that when we do He vows, it’s, you know, if you think about it, just from first principles, like if you’re going to do evals, you actually want to make sure you’re getting the same response effectively, right. Hi. So yes, I wouldn’t call it deterministic, but it should be stable. Yes. Yeah. Okay, certain model version is the way to look at it. But yeah, like, and then it’s like, okay, well, what if the model is uncertain? What if you haven’t given it enough context? Can it respond asking for clarification? Well, it’s like, that’s a very interesting question. How do you, our models good at expressing uncertainty? Some aren’t. There’s other techniques you use to do that you can have a judge lay in you can, there’s like, there’s different ways to do this. And these are really unsolved. And so, you know, looking think about this all the time. And the nice part for us coming out this, again, I think, just kind of right place, right time is like, we’re a product that is really our core focus will always be our core focus, or being being incredible for data practitioners. We want to be, you know, this awesome tool for people who are asking answering questions, trying to do deep dive, you know, get to insights, things that aren’t just churning out dashboards. And in many ways, we can incorporate AI into that product into our product. And it’s okay for us to be a little wrong. Because we’re targeting people who are otherwise be in hats. Yeah, writing and SQL and Python, right? So like, like, Yep, it’s the same as like a co pilot, right? Like, sure. It is not perfect. We use it. A lot of people use it, or people use other cursor. There’s a lot of other AI coding tools. Now. Those are not perfect. They can be a little wrong. But the the assumption is the user knows well enough to or even writing tools, right? Like you use a, you know, ASCII I’d have you read a blog post, actually perfect. But enough, what? Yeah, exactly right. And so this has been really good for us. Because we can iterate on these things like we can be a little wrong and still be really cool for users. And get better and better at this stuff. And I think the question is, like, how tight Are you making those feedback loops? How good are you judging whether you’re right or wrong? How are you? In the tradecraft we’re building like everything we’re building, we run our own vector database. We didn’t write the vector database yourself, but we self hosted a vector database. Now in setting up the pipelines for doing retrieval over and scaling and figuring out deployment of our single tenants, like, that’s really hard, but that’s something we’re good at now. And we can just compound as the walls get better as the techniques get better as new papers are published. And it’s been really fun to see that sort of learning stacked on each other.

Eric Dodds 39:32
Yeah. For listeners who may be tuning out at the drive thru. Just a reminder, we are on with Barry McCardel from Hex, we’re talking all about how hard it is to build AI stuff. So I want to change gears. And because you had a question about AI and bi and I think that’s a great opportunity to zoom out and just look at the landscape. Very you were just said Snowflake and Databricks but before we leave the AI topic One very practical question. And this is, you know, mainly for me and John, you started a selfish company asking for a friend. Yeah, yeah. No, but also for the listeners. Are there any heuristics that you have developed through that process? And I’ll just give you one example. You know, have you found that if you have to keep hammering on the prompt and getting like, really wild with that, is that, you know, okay, when that happens, like we tap the brakes, because we need to step back and evaluate, or, or other things, right, where you’ve sort of learned how to increase the cycle of experimentation by knowing when to stop or change your strategy?

Barry McCardel 40:46
I don’t know if I have a simple answer to that. I think we’ve got a pretty amazing AI engineering team that I think have developed a lot of heuristics and tradecraft around that. I do think there’s a point we’ve seen certainly in, in real life, a point of diminishing returns, or like prompt hacking like, yep. But one thing we’ve done that I think is important, is really focused on setting the infrastructure for experimentation and evaluation up. Like I think that running headlong into eungella prompt engineering in the world and like a playground there, where the rubber meets the road is like, how does that work over a larger set of data points. And I think that’s the fundamental problem of like, getting something working for a demo data set on your machine is one thing, getting it working in the real world for real user problems. And whatever the space data customer support number is much different. And your ability to build quickly to iterate quickly to understand and shorten your cycle time of experimentation is really predicated on that infrastructure build up. So that’s where I was talking earlier, like we’ve developed this thing called spell grounds. We’ve built our own evals. Like, that helps us learn much faster and see like, are we actually descending a gradient on improving by incremental prompt improvements? Like, we can edit a prompt and run it over a subset of our evals? And just get a? Yep, it’s the back like, okay, is this actually working for real? And get that feedback? And there’s times where it’s improving things, and there’s times where it’s not. And so I don’t think there’s a clean heuristic other than to say that, like, the only way to really tell is to have that infrastructure for experimentation in good shape. Love that.

Eric Dodds 42:23
All right. AI and AI John? Yeah,

John Wessel 42:26
yeah. Yeah. So I think coming off of conference season here, you know, we’ve got a lot of, we’ve got a lot of players that are consolidating, I think you mentioned before the show how we have these nice lanes, like, say two years ago of like, alright, we’re in the visualization lane, we’re in the storage lane, etc. So maybe talk a little bit about the consolidation you’re seeing? And, you know, any interesting observations or predictions on whether that continues? What does that look like?

Barry McCardel 42:57
Yeah, totally. I mean, it’s such an interesting time. And we just spent a bunch of time talking about AI. But even if you set that aside, I do think we’re coming through a cycle in the data sack. Here on The Data Stack Show, right? Like, right, yeah, you guys have actually had a really interesting longitudinal look at the evolution of it, because you’ve had a lot of gas on you’ve had people on over time, pay attention in space, obviously. You know, what happens when a lot of capital flows into an ecosystem, you get a bunch of flowers that bloom and I think that’s a beautiful thing. I think people can look at it cynically, but I know a lot of really smart people who built some really cool stuff in the data space. In the last few years, some of those things will continue on, some won’t. But it’s been very interesting to see. I do think we’re coming into an era of consolidation. And it’s not just like interest rates are going up or whatever. I think it’s just like, we’ve learned a lot like people have tried different patterns, there’s things that work, there’s things that didn’t. And I do think that there’s a couple of dimensions of this, there’s sort of like a horizontal dimension, which is, at each layer of the stack, who’s emerging is winners and losers, and what are the actual subcategories that that layer, like? We kind of sit at the front end cooperation layer, you know, what are the actual divisions there? And, you know, you look at the metadata, governance, cataloging layer, what are the divisions there? Is there a standalone orchestration and transformation layer? You know, who, who wins there? And how does that look? And then, at the bottom, you know, the company is running these big conferences, and running the infrastructure, you know, the, you have the cloud scalars, so Microsoft, Amazon, Google, and then Snowflake Databricks are the big players. And then you have a long tail of other independent data. You have like the starburst of the Jameos of the world. And it’s just this question of like, how does this actually consolidate? Are some of these second sectors, categories winner take all or they can you have multiple winners and what you need to do and I think it was very interesting being at the conference and seeing just walking the floor and talking to people For companies that two years ago, let’s say at Summit like, you know, Snowflake summit 2022, we’re advertising like partnerships are talking about how well they work together, or they’re next to each other, you know, respect for each other. And they’re co hosting parties together, are now like, oh, actually, we’re, yeah, we’re gonna compete more than we thought, because you’re kind of being pushed to do that. And this is happening at every layer of the stack. I think there’s a really interesting question around like, data catalogs, governance, metadata, like, does that all become one thing like DBT has? S DBT has their explorer, where you know, how far they go with that, just where the standalone data catalogs live, where two standalone data observability platforms live, like. And I don’t think anyone really knows the answer other than that, there will likely be just by Count, like less count distinct, you know, less players. But so, perhaps like the actual categories will be more of an amalgamation than like these really fine subdivisions that we had in sort of the Cambrian explosion of the modern data

John Wessel 46:03
stack. It’s always interesting to me to think about, like Salesforce, for example, like really dominant CRM, and then you know, HubSpot has got some of that market, too. It’s interesting for me to think like in this space, is there going to be just somebody that’s so dominant, like Salesforce, and then like, maybe like a number two, and then like a long tail? Where do you think it’ll be a little bit more like, because historically, like, Oracle was really strong SQL Server was really strong, you know, open mind, you know, MySQL, Postgres, like, do you think like that are more like a Salesforce winner take all, you know, smaller?

Barry McCardel 46:40
Well, let’s ask the question, why is Salesforce so sticky and durable? Like, I don’t know, if you use the Salesforce UI? I do. Sometimes I try to avoid it. Try not to, I don’t know how to shave my friends at Salesforce. But like, I think they would probably also say that the UX design of like, the Salesforce CRM app is probably not the like, thing that everyone’s Well, it’s

Eric Dodds 47:03
like a class, moving from classic to lightning was like a giant mutiny their own razors were like, we’re not we’re not we are not changing like a decade long project. Right. Yeah.

Barry McCardel 47:12
And so anyway, I mean, why is it so sticky? I had the chance to ask a very former senior, Salesforce to start this recently. And they were telling me, because I was curious about this, and they’re telling me it’s like, it’s the data gravity. It’s the fact you kind of, there’s a lot of startups that have a nicer UI, but it’s the fact that the data lives there. And there’s all these integrations in the industry standard all the way from other tools to systems integrators and consultants. Yep. Vos standardized around that thing. Yep. If you kind of look at the data stack, like maybe the closest thing to that in terms of like, just sort of singular dominance has been Snowflake in the warehousing world over the last few years. And it’s a question of like, how durable and sticky is that data, gravity and governance? It’s like, is there a lot of other stuff there but like, this is why there’s an interesting conversation on Iceberg and open formats is like a lot of buyers see that, while the buyers have experienced being on the sharp end of this with Salesforce, so Salesforce, yeah, absolutely. Like, Hey, hang on, I want more modularity. And a pattern we’re seeing a lot of customers do is actually stand up like a data lake alongside I asked a customer at Snowflake Summit. There’s like, oh, like interesting. Why is that the pattern because with Hex, they’re excited about, you know, we can have one project and it can query from Salesforce, or excuse me from Snowflake and our, our data lake, they were I think they were using Trino for it. And they were like, well, you know, it helps us get some of these costly workloads out of Snowflake, but it also we tell Snowflake that we’re doing it, and then that helps our negotiating? Oh, yeah, sure. Yeah, it was less about the actual net, like we moved this workload from here to here. And it’s this much cheaper and more like, by proving that we can move a workload, we’ve established some leverage, or negotiation, or stuff like that. And so this is an interesting question, right? So you can see that the vendor, the vendors have that layer, like, the last thing they want to be is commodity query engine on open source, format data on commodity blob storage, like that’s a bad situation for them. And so then you start building a lot of other stuff to try to lock people in. And that sounds like a dirty word lock in, I think, the more generous way to say it. You’re trying to provide more integrated value for customers who see that kind of a mile away. And so here’s your question, like what the market wants, and how much that power is like integration while we’ll pull it in as a question for everyone in the data set right now.

John Wessel 49:38
So my theory on it is that there are enough people and it’s not just Salesforce, so you got Salesforce, but then you also have an SAP or an Oracle and the, you know, ERP side of things. So my theory is there’s enough people out there in enterprise that want to avoid the, you know, huge lock in and the implementation fees and the reimplementation fees and the upgrade Read fees and, you know, contract all of the money they have to spend on these systems to where I don’t know if that model works again right now. Right? As far as where Snowflake or somebody could be as big as a Salesforce, let’s

Barry McCardel 50:12
see. Yeah. I don’t know, I look at Microsoft. Right, like so Microsoft, you know, famously was on the sharp end of the antitrust lawsuit. Yeah. Right. You know, 20 years ago, we’re working on that, but, but they’re good at the same things you look at? Yeah, yeah, they’ve got you can go as an enterprise buyers, a CIO, and spend one contract with Microsoft one overall agreement and have leverage in your pricing, because it’s bundled on everything from Windows laptop, to your word license, to getting teams thrown in for free to increasingly, they’re trying to leverage that over now into AI. So you’re also going to have that same thing, not just your Azure Compute, but you know, you get a GPT for endpoint on your own VPC that you get nice economics on. And then while we’re at it, let’s throw in fabric. Yep. Yeah, fabric chords is one thing, it’s, you know, it’s better than Snowflake on this, it’s better than Tableau with this. It’s, you know, whatever. And you can bundle all of that together, that is really attractive to an enterprise buyer. It’s also really scary to enterprise buyers, you’re all in with one. vendor, I think it’s a very real question of like, how that tension plays out. And that’s not new. Like, I think it was like, you know, I’m not, I don’t consider myself super young anymore. But it is funny, I was catching my uncle who’s like a really successful software executive, you know, in the sort of last generation. And we were talking about this, and he was explaining, like how Microsoft like had snuffed out markets in the 90s that I’d never even heard of, like, these are companies I’d never heard of, to him. It was like, it’d be the equivalent of them snuffing out Yeah, I don’t know, like, you know, Snowflake or something. That’s very common to you, because I never even heard that I was not aware of it. This pattern has been going for decades. And it’s going to continue. And there’s this bundling unbundling, and it’s a very interesting time.

Eric Dodds 52:03
Yeah. One thing that is interesting, though, and so this is a question for both of you. In terms of the platforms, I think, you know, there’s sort of a classic like Snowflake Databricks, you know, Battle Royale. And that’s sort of been, you know, sort of, sort of played out in a number of ways that a lot of companies, a lot of large enterprises, like run multiple different, you know, they’d run like several of the big vendors,

John Wessel 52:27
right, like, a division or per business unit. Right,

Eric Dodds 52:31
exactly. But at the same time, you know, I think back on that a couple of years ago, and that probably was more real, where it’s like, okay, we do everyone these workloads on this vendor and these workloads on this other vendor. But every vendor is building on the infrastructure to run like all of those workloads. And so I think it’s more possible now, from a baseline technological standpoint, for there to be like a winner take all than it was previously. Right, because they, but that’s more of a question like, Do you think that’s true? Or do you think that the different flavors of the cloud platforms mean that large enterprises will probably always run? You know, multiple of the major vendors?

Barry McCardel 53:17
Well, if history is any guide, large enterprises will have a lot of things. Like, I asked a customer, like Greg, what do you guys use for data warehousing? And they’re like, everything, you know, literally everything. Yeah. And not just one of everything, like, multiple? Yeah, yeah. And why is that? Well, you know, maybe the company is the result of mergers, or maybe different divisions have chosen different things. I think it’s very strategic, which is like, we want to run different vendors, because we don’t want to be locked in. And multi cloud is a very real strategy. And there’s a lot of enterprises that are very bought into that path, that I have not talked to many enterprises like CIOs, CTOs who are like, Yeah, our strategy is we’re all in on one thing. So I don’t know that that is how it all plays out. And I think you can look at even the current markets that are reasonably mature, like, I would argue that data warehousing is like a reasonably mature market. It is very interesting to observe that the two players that wind up getting the most airtime. I mean, let’s if you set the query aside for a moment, or Snowflake, and now Databricks, neither of them run their own metal. Yeah, right, that’s running on AWS or Azure or GCP. Yeah, and it’s actually just like observing that for a moment, like Snowflake and AWS have an incredible partnership. They go to the market together. I was at the Summit in this exact track event. With, you know, like the senior AWS guy, the senior Snowflake guys, and we’re all there. But meanwhile, there’s teams in each company that are competing on deals like they, yeah. Redshift they make a femur like, and I think. So this is not even a new pattern in the data stack. And I think you know, when I talked to folks at Snowflake as an example They’re aware that they are increasingly building things because they’re worried about being a commodity query engine that, you know, will compete with partners. But I think what’s interesting talking to them is like, yeah, we’re on both ends of this ourselves. I mean, famously Databricks, and Microsoft, at a really great partnership over the last few years, like Databricks, on Azure was a really big deal in many ways that Databricks what it is, yeah, the fabric is like, ripping off a lot of Databricks. Yeah, yeah, sure. So I don’t know. Like, I don’t think that I think these people don’t see it as a winner take all, you know, things will be there. And I don’t think the data world has ever really been that way. But

John Wessel 55:37
we’ll see, in my opinion, in the past, like, maybe 2030 years ago, compared to like, let’s say, 15 years ago, you did have a big market emerge in open source, like Postgres and MySQL, right? Because prior to that most mainly closed source database. Is Oracle. And Oracle. Yeah, of course. Yeah. Oracle, SQL Server, IBM DBT. Yeah, so that was all closed source. And then you had a huge, like, surgence of like, you know, with, like Facebook, for example, right? Like they want open source, you have all these companies that prove like, Oh, we’re going to open source operating systems, open source databases. So that’s a major change. Now, it’s like, like, I don’t, it’s hard to see people swing all the way back. Like, it seems like there’s got to be some kind of middle ground where people aren’t gonna go all the way back like, yeah, we’ll just be closed source, like data, like, we’re just gonna go on Snowflake, we’re not going to think about Iceberg. So I think because I won’t go all the way back. It’s less like have like winter. Yeah. Because?

Barry McCardel 56:38
Well, I mean, opens versus a hole. Really interesting topics like, how and where do you build successful open source businesses? You know, I’ve got a thesis personally, and informed by a lot of conversations with smart people, this isn’t entirely my authorship, but like, you can build a successful open source business can be successful when it’s a pain in the ass to scale the technology yourself. Right? Like, it’s Spark, like, yeah, we lose the pounds, or we use an enormous amount of Spark we scaled itself is a nightmare. Yeah. And that’s why Databricks exists. Scaling Kafka really hard. So getting any type of database typically is hard. It’s why the database vendors, you know, elastic, and all exist. I think, when you look at certain open source technology, there are some in the database that are not hard to scale yourself, right? Well, okay, you got to, why are we gonna make money on this, and it’s hard to make money then on the actual open source. Tech itself, you have to make money on adjacencies. And without naming names, you know, I think that you see some vendors in the database doing this and I think you look at Iceberg as an example. Now, post acquisition. By the way, side note, the announcement about them being acquired by Databricks. Went out, you know, during the Snowflake summit keynote in our corner, too. There’s the vibe at the ice, the tabular booth. Sorry, it was Tabular. That was acquired. Yeah, right. Right. Yeah. But the vibe at the tabular brief was just so funny to watch. Please be like, what do we do? Are we supposed to like, actually pack up? We’re kind of looking at enemy territory now. Databricks, like partisans? It was a very funny situation.

John Wessel 58:24
gist is really hilarious. That’s hilarious.

Eric Dodds 58:26
Was Ryan there?

Barry McCardel 58:28
I didn’t see him back last year, but it was funny to see him like customers are going by their booth and congratulating them, but the team doesn’t know what to say. Yeah, that’s classic. Like, you know, it was like for them like they were not, it’s I don’t think it’s a secret. Like they were not printing money themselves as a standalone. It was like, could you be an Iceberg by itself? technology you can make money on? Or is the money made on the query engine or around it? So yeah, I think you’ve got good points on the open source thing. I think it has to flow from like, where do you actually where does value accrue in open source? Right? And what does it mean like, people’s willingness to self host or self run things or that modularity? I think, the destroyer with Iceberg? And the reason Databricks bought them is like, you want control over that open format? I mean, it is they’re telling you that they want control? Yeah, sure. Because they’ve done such a good job of this with Spark, right? They can ostensibly host Spark, the networking part of Spark, historically was an absolute nightmare. That was the thing that made it really hard. And what Databricks did was very clever. They wrapped spark up in a in an open source governance model where they was open, but they I’m using air quotes, people can see that they controlled it, you know, Databricks is in full control of the spark roadmap, and the networking part they made modular, and then Databricks like hosted Databricks Spark as like a superior networking module. Right. And so you basically have an open source thing it’s way harder to use. They’ve made their version of it, you know, much more scalable. And I think you can see this hat coming from a mile away with this sort of like table format, stuff of like, clearly whoever feels like they are able to control that technology is going to do a bunch of stuff to make it. No less open on paper, but less possible for people to learn and scale and utilize it. And I think that’s the thing to keep an eye on.

Eric Dodds 1:00:23
Yep. Well, I literally can’t believe we’ve been recording for the time limit, because we started early. So this is what we get to do in Brooks’s out of house bus. But okay, Barry, I have a burning question that’s unrelated to anything data that is related to Hex. So you’ve put out some amazing marketing material, one launch video, in particular. And so can you: How did you like who wrote the script for that? Like, how was there a team? Where were you involved in that

John Wessel 1:00:54
come about?

Eric Dodds 1:00:56
I just need a little bit of backstory on that, because we were talking before the show. We like in the office, we have, you know, one physical office at RudderStack. You know, and we all gathered around the computer and we watched it like three or four times. And so can we get a little bit of backstory on it?

Barry McCardel 1:01:14
That’s very, very flattering. Yeah, I think you’re referring to our spring launch videos you put out a few weeks ago, people can find our site. Yeah, so we had a lot of fun with it. And yeah, the backstory is like, when we do these things, I think there’s a very standard sort of Apple Keynote homage style of product launch video that it’s very easy to sort of default to. And with everything we do, we just tried to have some fun with it if people can’t see, but on the video here, you have like a box version of RudderStack. Yes, the

John Wessel 1:01:48
1990s. Where

Barry McCardel 1:01:51
was our coolest booth last year? So we just kind of approach these things. We’re like, can we just have more fun with this and not take ourselves so seriously, while still being very serious about the software we’re making and the value we want to provide? So yeah, the video was really fun. I was involved very closely. I really enjoyed that stuff. I have a bit of a background in it. I had a brief dalliance with thinking I wanted to be in some version of film production earlier in my life. Oh, cool. But we’ve got a great team, internally that we just have a lot of fun. There’s kind of a core brain trust of a few of us that jam on these ideas all the time. You throw out pretty unhinged stuff in Slack. Yes, every day. And it gets refined down to if you ask, actually, that the video we did this sort of Office style skit was like, we were struggling with what to name that’s really basic 4.0 We were like we literally had that internal, like, what do we call this release? Giving you the launch video has this kind of dramatized version of us struggling to come up with a better name and spring release, which is very boring. Yeah. It’s kind of leading into that. You get some more fun stuff coming.

Eric Dodds 1:02:56
So yeah, that’s great.

Barry McCardel 1:02:57
It’s great to have more screenings in the next few months.

Eric Dodds 1:03:01
Yes. I’m super excited. No, it was incredible work. Barry, thanks so much. We need to get you on the show more often. Let’s try not to wait a year and a half until the next time.

Barry McCardel 1:03:10
Whenever you want. Let me know. I’ll see you guys. Thanks for coming on.

Eric Dodds 1:03:15
The Data Stack Show is brought to you by RudderStack. The warehouse native customer data platform RudderStack has purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.