Episode 120:

Materialize Origins: A Timely Dataflow Story with Arjun Narayan and Frank McSherry

January 4, 2023

This week on The Data Stack Show, Eric and Kostas chat with Arjun Narayan and Frank McSherry, Co-Founders of Materialize, for part one of this great conversation. During the episode, Arjun and Frank discuss their journey in founding Materialize, comparing working in research vs starting a company, key measurements of databases, different uses of Materialize, and more.

Notes:

Highlights from this week’s conversation include:

  • What is Materialize? (2:43)
  • Frank and Arjun’s journey in data and what led them to the idea of Materialize (6:22)
  • The good and the bad of research in academia vs starting a company (25:20)
  • The MVP for databases (33:49)
  • Materialize’s end-to-end benefit for the user experience (43:03)
  • Interchanging Materialize in warehouse and cloud data usage (48:25)
  • The trade-offs within Materialize (1:00:02) 
  • Final takeaways and previewing part two of the conversation (1:09:25)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome to The Data Stack Show. Today we are going to talk with Frank and Arjun from Materialize founders, and we are so excited to chat with them. We had Arjun on the show over a year and a half ago, Kostas, and I can’t wait to catch up with Materialize, a product that has gone through a ton. And as a company, they’ve gone through a ton. But the most important question is, what do you think’s going to happen? Because Brooks is gone, and I’m in charge of recording, which means that when the cat’s away, the mice will play and I have a feeling we’re going to go long.

Kostas Pardalis 00:59
Yeah, like, we are going to taste this sweet taste of freedom, don’t panic. whatever we want with a soul, so yeah, let’s do that. I feel like if you made it on give that like my bottoms up lifts, you know, I

Eric Dodds 01:15
I really do too. It’s gonna be a rager A, the podcast version of a high school rager, where we’re discussing, you know, academic papers widely.

Kostas Pardalis 01:29
Move. That’s, yeah. Okay. Well,

Eric Dodds 01:32
First of all, with this newfound freedom, what are you going to dig into? You have unlimited freedom. What are you going to ask Frank and Argent about?

Kostas Pardalis 01:41
Oh, yeah, I think like, first of all, I mean, we definitely need to ask them about the relationship, right? Like, how they made how they ended up being like, founders, and how these regulations in cars evolved as the comeback was involved. Right? It’s, so it’s going to be very interesting. Like to see that and to be honest, like, see their, their story for them. Yeah. And that’s one part. And the other is waka cup and didn’t one here, which is you know, like in STATA Blanc, like a year, especially attended

Eric Dodds 02:23
at 18 months, actually. Yeah, yeah.

Kostas Pardalis 02:27
It’s an extremely long time frame. So let’s see what happens where they stand today. And share about their story, how they came together, and one. All right.

Eric Dodds 02:39
We’ll wrap this up in about four hours. Oh, yeah.

Kostas Pardalis 02:42
Let’s go into the wild.

Eric Dodds 02:45
Let’s dig in.

Kostas Pardalis 02:47
Let’s do it!

Eric Dodds 02:49
Frank, Arjun, welcome to The Data Stack Show. We’re so excited to catch up with Materialize. It’s been over a year. I can’t believe that since we last had you on the show. So welcome to both of you.

Kostas Pardalis 03:02
Thank you. Thank you very much. Okay, we usually start

Eric Dodds 03:05
with introductions, but we’re actually going to switch it up a little bit today, which is very exciting. Brooks isn’t here, driving, so I get to do whatever I want, which is my favorite thing. And I actually because the background and the stories are so good. And both of you bring such a different perspective. So we’re going to do that second. First, though, Could one of you just give us the base on what is materialized? And why would you use it just so our listeners understand out of the gate? Yeah, so materialize is what we call a streaming database. So it looks and feels like a database, you interact with it using SQL command line shell,

Arjun Narayan 03:42
and you write you know, select statements and, you know, CREATE TABLE statements and things like that should feel very similar to a database that you’ve already used. In fact, we’d like to say it’s like a, it’s a streaming database you already know how to use because you probably already familiar with Postgres, materialized looks and feels like Postgres. What’s different about materialized is you create materialized views, right, as opposed to running one off select queries, and you plug materialized to your data sources. So those can be upstream OLTP databases, those could be event streams like Kafka and materialized incrementally maintains very efficiently, the results to your materialized view statements. So that when you ask for the latest result, it is already pre computed for you, and you get your what would otherwise be a high latency, complete refresh over potentially large amounts of input data are very quick. So the kinds of things we can materialized views are not a new concept, right? They’ve been around for been around for over over two decades. What is new about materialized is the breadth and depth of materialized views that we can incrementally maintain very efficiently. So think like a Eat, we join with like four sub queries, right? And all of the underlying eight input sources are changing at low volume, high volume, slowly changing dimension doesn’t matter, you know, within milliseconds, the result of that answer is kept up to date for you. And this is useful in a wide variety of cases where today, you may be using some kind of batch pipeline that is sort of continuously running potentially in a horrendously expensive way. And you’re probably still not getting the latency experience that you want. Even if you’re horizontally scaled out, and using a, you know, unlimited budget, there’s very various queries where you simply through sort of the speed of light of the amount of data that has to be crunched over means that your pipelines are going to be perpetually like an hour out of date. So things like some complex retargeting, the persona segmentation, input pipeline. And the fact that this can now be incredibly maintained with sort of 10s of milliseconds or hundreds of milliseconds or single digit millisecond latency opens up the aperture to the kinds of experiences you can build on top of your data platform. So your user, and materialize is, we believe the the best way to power these real time experiences, it’s available as a cloud, native, horizontally scalable database in the cloud, like many others that you may be familiar with, like Snowflake, or Redshift, or

Kostas Pardalis 06:30
things like that. Very cool.

Eric Dodds 06:33
What you’ve definitely, definitely talked about is a timer to be super helpful. Okay, so let’s rewind, and Frank would love to start with you, because you’ve done a lot of work in the academic space, actually, and so I would love to hear about your journey to materialize. And, you know, prepping before the show, I loved hearing so you both bring a very different perspective to how you came together and materialized come to life. So, Frank, give us your background. And then tell us how you got into it?

Frank McSherry 07:08
Yeah, sure, thank you. So, as you say, originally, academic background, was mostly interested at the time in theoretical computer science algorithms, and designing more efficient algorithms, data structures, what have you, which transitioned eventually into some work with big data, essentially, the big data is a great place to show off the difference between a pretty good algorithm and a really good algorithm or a really bad algorithm. Awesome, got a bunch of those out there. But went from from grad school, off to Microsoft Research, where I did some work on data privacy, there, and eventually transitioned into working on big data problems at Microsoft Research, they were then working on systems called dried and dried link, which is sort of think about what spark looked like when it came out there was it was that but a few years ahead of time,

Kostas Pardalis 07:58
savvier version of MapReduce,

Frank McSherry 08:02
and I was a big user of that, and then sort of picked up some of that DNA and was started working on a system there called Naiad, which is a more more powerful, more expressive, big data processor that allows you to put in loops, and does streaming style work, instead of just large chunks of Apache, Apache data.

08:27
The place I was working,

Frank McSherry 08:29
vanished, the research lab in Silicon Valley finished and it went on vacation for a while and went on a big vacation for a while, so on vacation for about three or four years. And that’s that time. Yeah, no, it was. It’s not exactly I mean, I did a whole bunch of things, a little bit of work, but, and a bunch of the time, though, was actually kicking around with rust, which was just stabilizing at that point, and doing in sometimes version two, of Naiad, you know, changing the things that I wished I could have changed. And breaking it off, every time I was reading a whole bunch of blog posts about my take on the big data space, kind of stuff like that. But eventually, you know, eventually, but like, periodically, was in touch with Arjun, who increasingly was expressing the idea that although it’s probably great fun to be writing, as an individual, these little blog posts, that sort of jab at a needle, various other people, if you want to actually see if the ideas had merit, they would go anywhere. The logical thing to do, the smart thing to do was to attempt to assemble a company. Not just because having a company is fun, so much as this is the right way to get together the set of people with the necessary skills to actually go and translate this from one person’s hobby project into like actual thing that people would want to use in part and you know, substantial amount of credit by realizing that people wouldn’t actually want to use the internet Engine part, like no one actually wants to buy an engine and try to drive that around town. They want a car. And yeah, part of building that requires a whole bunch of other parts that other people are better positioned to put together. And yeah. And then, you know, eventually not immediately eventually agreed, that was an interesting sort of next thing to do. And some other things I was thinking like, should I be doing next? And this is absolutely the most interesting by far thing to be doing. So showed up in New York, and we’ve been here now for like four years and incrementally building buildings materialize.

Eric Dodds 10:34
Very cool. All right. Arjun, your version? Yeah.

Arjun Narayan 10:39
So I was doing a PhD in computer science at Penn. Initially focusing on data privacy. And Frank sort of massively understates his contributions to data privacy as one of the CO inventors of differential privacy, which I was working on. And as part of being a grad student, a good grad student is one that maintains a sort of various lists of people and lines of work, where you diligently follow up on every publication and new citation and things like that. And notice that Frank had drifted away from data privacy into distributed systems, I was working on the confluence of sort of building large scale distributed systems that did that maintain data privacy. So it’s still seemed entirely appropriate to be following that, in 2013, when Frank published the Naiad, paper, the sort of the perfect moment, because I had immersed myself in all of these, you should have to rewind the clock a little bit 10 years ago, where, you know, it was Hadoop, this Hadoop that Apache Spark had just come out, I lose all these various bespoke, you know, here’s the large scale distributed system for computing triangles in this graph. Here’s this large scale distributed system for computing. And it was like, one, one was very confused why one needed 100, bespoke distributed systems. And when Naiad came out, it was sort of, at least from my vantage point, silicides, to make a moment of a unified and subsumed entire classes of what at the time people were thinking of as separate streams of research. It was the first system at least, I think the first one, that unified batch and streaming and sort of capability. So it was able to do everything that the contemporary batch processes were able to do. And also, it was streaming and performance competitive with the batch system, and, in fact, much better performance wise. And so I remember, you know, going up to my advisor and being like, why are we still reading these other papers like these people should stop what they’re doing and rebuild on top of this, clearly sort of superior, theoretically, well principled, and well architected, architected thing, sort of getting the painful lecture of like, that’s not how the world works at all. At the same time, I was also, you know, obsessed with distributed databases at the time, like eventual consistency was, you know, everywhere. And there were very few strongly consistent data systems, particularly distributed data systems. And I think the previous year to the Nyad paper, the same best paper award that you won, was won by Google Spanner, which was the first sort of horizontally scalable, scalable, globally, strong Serializable OLTP database at the time of night, and I found that very, also simplifying, I think the thing that that attracted me to both systems was the both greatly simplified the level of complexity, like one of the big advantages of strong consistency is from the user’s perspective, there’s way less nonsense going around that you have to reckon with and reconcile in your brain to get your job done. And I remember pinging Frank and saying, Hey, are you thinking of starting a company because I assumed my initial assumption was he was going to because it was so clear to me that it was a wonderful platform to build many additional layers of Qlik approaching what a customer would actually use. And he had absolutely no interest. So I ended up stumbling upon it. Small, then I think series, a startup cockroach Labs, which was building an open source clone of Google Spanner. And I was very attracted to the open source in a business model, the building out in the open, as well as all the sort of rigor and the theoretical basis that they were building on top of which had been sort of publishing Google spanner. I have a needless lesson, I think about 20. Less than 20 people joined the engineering team and discovered that I was a mediocre engineer, but I learned a surprising and delightful amount on how to build a database that builds customer trust. Because databases, databases have this sort of no one gets fired for buying IBM property if like, there’s an inherent amount of conservatism, you’re making these choices in sort of a decade long horizon as the purchaser of databases. And it takes an exceedingly long amount of time and engineer hours or engineer years to get to that level of polish and fidelity that customers would buy it. I think the first quarter I joined cockroach, or OKR, was 24 hours of uptime, without a crash on a single node, which did not inspire confidence when it comes to databases. And he was early, right, we weren’t sure, when customers, but what I really witnessed over those initial years was how to communicate clearly how you are building things. And when you will be ready eventually, without, without, like overstating it without, ya know, without any of those little lies that spiraled out of control, like thing that I really loved about cockroach was through honest communication as to where the system was, and how we were going to make it more scalable, and robust, more stable, over the years in cockroach is a very successful company now. And rock solid and deployed are all these sort of stayed conservative institutions that pick trusted technology. And that journey was one that did a lot of counter counter intuitive things, which was, you actually show your warts or you overshare, the things that are broken, because that actually builds trust from the buyers perspective. Yeah, as opposed to pretending it’s more polished than seen, you know, that’s some other mistakes that other folks have made, which is to sort of cover things up, and pretend that their system is as stable and rock solid as Oracle. And you know, it just inherently can’t be missing three decades or four decades of development. At the time, I would sort of periodically reconnect with Frank Gehry and sort of try and sort of check in and say, Kevin, you changed your mind, like, this is really cool. I tried to deploy your rust, build, and it doesn’t build the builders broken. There was a lot of that, and it’s true.

Frank McSherry 17:31
D

Arjun Narayan 17:32
and I kept hearing about us, and I know I’m very happy. I’m on a beach in Costa Rica, and writing code a few hours a day. And I don’t really want things to change, I think kind of grew increasingly frustrated that he wasn’t going to commercialize, build a system on top. And as we had more conversations, and as I probed as to why, you know, you have a lot of sort of questions around like, why even start a company? Why even do this instead, I would patiently explain, Well, you know, you may be totally fine as in totally capable of building a robust distributed system. A guy on the beach occasionally comes to the computer, but the customer may want somebody that he can call up at any moment. And you do not fulfill that SLA. So you need to sort of, and they actually need some education, because right now, the query Ling wages, this library wrote, and you know, there’s lots of people who would want to use your system who do not wish to, or do not wish for that interface to be this bespoke rust. UX that you’ve been to, there are standards like SQL, and things like that. And eventually, we came to this point where I convinced him that these things needed to exist. And what I was hearing from him was that he did not. He agreed that all these various functions needed to happen for the project to be successful and actually change or have an impact on the world. But he didn’t want any of these things to be things that he had to do personally himself and the meeting of the minds was like, well, that’s actually what a company is. It’s a way to align a group of specialists in a variety of different expertise is all sort of working together to make the project successful as a group, and

Kostas Pardalis 19:22
that’s how we started

Arjun Narayan 19:25
materialize. I sort of want to put it in a small plug for the wonderful founders at cockroach who were incredibly gracious and helpful in that first check in was Spencer, the CEO. The second check in was Peter, the CTO of cockroaches. They were very kind and with their time as well as their liberal checkbooks in getting us off the ground with materialized

Eric Dodds 19:49
I love it. What a great story. I have one more question for each of you. And then I want to dig into the product stuff because we have so much to cover. Frank, I’ll start with you. Do you remember the moment where you sort of changed your mind about starting a company?

Frank McSherry 20:07
I remember vaguely, there’s a few moments that I remember. So it was in Vermont where my, my parents live, did not strictly speaking live anywhere at the time, but was there visiting,

Arjun Narayan 20:21
I would have to call his phone, his mom would pick up a landline,

Kostas Pardalis 20:25
landline, and it’s just

Frank McSherry 20:28
the best cell coverage. And yeah, discussing things there with origin is, I think, largely as told were like, the my mental model of what starting a company was like, if you looked at a whole bunch of academic analogues or various people, yeah, essentially, threw out some ideas and hoped a company formed around it and had to run around and shill it for a while or something like that. And that wasn’t really what I was about. But you know, a little bit more chewing and developing, it became equipped to do things again. Clearly, the plan was, obviously, to involve other competent people, rather than just try to cash in on wherever there’s a pretty good piece of work. But also that there was something actually interesting and valuable to build, right. Like, it was not just a shiny up version of the codebase, it was a sequel wrapped around a thing, which is many orders of magnitude more relevant than the shiny engine. And it started to click that it was actually worth producing early, you know, seeing at least if it was worth producing, you know, if you could actually take good ideas, one of the struggles you have, as an academic, is figuring out how to take your ideas, which are very clever. I mean, everyone in their heart, you know, the super clever idea, but how do I get everyone to understand it and appreciate it? Yeah. And a lot of that’s sort of what the proposal was, in many ways. It was like, here’s a mechanism by which we can translate things that you thought were really important. Sorry, this is the Frank centric view of things, of course, things that you think really important into larger

Kostas Pardalis 22:04
benefit, basically, yeah. Yeah.

Eric Dodds 22:06
I love that. Because I think it’s so easy, especially in the startup world, to just default to like, when you have an idea, so you just start a company. But I love the first principles, thinking of first asking, Do we have a strong conviction that this thing should exist? Or that these things should exist? And if the answer to that is yes, like, what is the best way to do that? And, you know, for some things, the answers that company and you know, for some things they may not be? So that’s just really helpful first principles thinking. Okay, Arjun question for you. And the Naiad papers, really, you know, a phenomenal piece of work. And Arjun, I want to know your experience. And so maybe I hope this is awkward for you, Frank. But I’m interested to know, it sounds like when you read that paper, and you went to your supervisor, whoever you were working with, and you’re like, we need to start working on all this stuff and start building on this. What was it like to you know, it sounds like you had been sort of knocking on these doors that were, you know, questions looking for this answer that sort of said, Yes. This is the way that we should do this. What was it like to read that? Was it sort of like a bunch of stuff congealing at one point, you know, almost like an epiphany? Or can you just describe your first, you know, sort of like when everything clicked?

23:25
Yeah, yes.

Arjun Narayan 23:27
It was like that. It was, you know, it’s, there were many papers that I’d read in sort of increasing confusion, the default experience of reading a incremental piece of the sort of

23:41
new, you know,

Arjun Narayan 23:43
a new set of Polish publications come out and you sort of go through them one by one, you read the abstracts, and then you read the ones that seem exciting, and you go through, and each one of these sort of makes you more confused. Like why or no, oh, no. Another thing I don’t understand, oh, no, why does this thing have to exist? Oh, this site’s a whole bunch of other stuff that I don’t know. And I gotta go, you know, it’s sort of like the endlessly expanding tabs version of like, learning about something and you’re just like, Good grief. I started with 14 open tabs, and now with like, 37, open tabs, and like, it’s dark outside. And I feel more confused in the Naiad. Paper was the giant tab closer, right? You read that paper and you go, Oh, my goodness, I get it. Now. These people are wrong. These people are onto a partial solution. These people, you know, are completely subsumed. These people with a slight twist, I have a suspicion would be able to, you know, I’d be able to reproduce that on top of Naiad but very much simplified. And so that was the sort of rough experience that I had reading that research paper. I think what I will credit myself with is a lot of academics and that is very wedded to their own ideas and I wasn’t wedded to mine, right. So a lot of folks really want me to build Cool things. I want to commercialize my cool thing because it’s mine.

Kostas Pardalis 25:04
And

Arjun Narayan 25:07
though I did not make that mistake, I was like, well, everything I have is a piece of junk. But that’s a relief, because I’m worried about how to close that loop at work. The answer is right here. And that was why I was sort of in the mindset of, you know, very willing to throw things away. Yeah. Love that humbleness.

Eric Dodds 25:28
All right, Costas. I’ve been monopolizing. So please take the mic. Down. Thank

Kostas Pardalis 25:33
you. Thank you. So. Okay, I have a bit of a personal question for Frank first. So you’ve been through academia, you’ve done research in big tech. And now you’re also started, a company writes very different experiences, I guess. I don’t know. I mean, I’ve thought about academia a little bit. I’ve never been in big tech. I’ve done something. So how, what like, how do you? How do you feel about like, how, like, what’s the difference? And okay, I would like to ask you something like, what do you prefer today? But I think like, you’ll probably say, like, material wise, of course, but it would be great to share, like, the good and the bad of each one. Because they’re like, you know, like, it’s very rare to find someone who has done all three and so successful, you also show

Frank McSherry 26:31
their definite, definitely, they’re different, I should be clear, the big tech thing that I did was at a research lab, Industrial Research Lab, which was not as different from academia. As you might imagine, it’s a bit like getting your toes in the water with respect to industry, but it still has a lot of the safety nets that academia has. Yeah, it’s a good thing I don’t know, my personal vibe at the moment is I’ve had a bit of a break with academia. And I’m happy to be not not doing that anymore. And part of the reason I think is, from my point of view, and lager, we have different texts on us, but it’s the motivations. Why do you do things? And from outside, I think people think maybe that game is about finding truth and finding meaning. It’s often about finding the right way to shine, the thing that you’re currently holding on to, it’s about constraints where like, you haven’t had a good idea for a year, but you got to write a paper. So what are you going to do? And in many cases, the correct answer would be that you keep your mouth shut,

Arjun Narayan 27:30
don’t don’t go and confuse everyone by saying something weird and complicated. And on that I like to joke that I believe very much that, you know, there’s academic work that we are doing that will be worth writing, once we have done a sufficient really meaningful amount of work that he, you know, in materialize paper is worth writing that has, you know, you’d have a different perspective of academia, where you the moment you have the absolutely least amount of thing that is publishable you push that out, as opposed to, you know, here where we’re like, well, we could certainly publish a paper now, but would it be the best possible version that has explored all the bats? No, not yet. So let’s take more years.

Frank McSherry 28:11
And speaking to that, actually, one of the differences as you go from academia, as I went, at least from academia to industry at Microsoft Research was a bit of a longer timeline, on the work that you’re going to do, you’re expected to do better, higher quality work, but you’re given a bit, bit more time to go and do it, the Naiad paper, for example, got rejected, I think, three times something like that along the way, and it got better each time. And at no point did we like ah, in some panic mode, have to go and throw out the window or something like that, you know, is totally fine. We took our time with it, you know, and had support from the organization from management at the lab. So healthier in that sense, in terms of like actually, trying to find something of value to put back can contribute to, to. And my experience has been as you go now, to a startup, there’s just that much more attention that’s being paid to actually having impact, and meaning a lot of academic work. And even an industrial lab can be a bit inward facing, it’s like, wow, I’ve done a really impressive thing. I’m going to show my friends and see if they’re impressed also. And I think that I really liked about certainly, in the startup setting, but also, as I was transitioning, I guess, from academia through industry, appreciating more and more that computer science is sort of the art of abstraction. It’s about taking a really clever thing and actually, not really having to show someone how clever it is for them to appreciate it and enjoy it. And there’s a little bit of letting go in that because you have a lot of self tied up in the cleverness of the thing that you’ve made great. As you slide along in early days and materialize, I didn’t really help my notes to know what those back to SQL and how grim that was going to be because personal opinions and really come around to appreciate that, like or not, is incredibly useful. As a way to communicate with people who absolutely do not want to have to know how all of the complicated stuff works inside here, are you very advanced Stream Processors to work? And they already know how they want it to work. And 100%. Yeah, that’s very interesting. And like, one, one additional question, which has to do with academic patterns. I’ll start with academia.

Kostas Pardalis 30:24
So you weren’t doing research in privacy? It’s very successful also, like, what made you move into though? Processing

Frank McSherry 30:35
instead? Yeah. I think this is a pretty, pretty easy, non technical answer, actually, which is that doing a really good job at data privacy never resulted in anyone being happy with you. Like, you walk into the room, and like, I have a very important data privacy announcement for everyone in the room. And they were like, Oh, not this guy again. Last time he was here. He said, We have to stop doing everything. And you think you’ve actually done a net positive thing by like, you know, introducing, like, oh, oh, like bad things could happen if you know this, and that, and yes, I can get around it. And people are much happier, five minutes ago, before, before you showed up. You see that now, like there’s a bunch of tension about privacy and the Census Bureau, former data consumers of the census are not super positive about the whole privacy thing. But if you then do something more like big data, you what sets everyone is delighted, possibly irrational. I mean, there’s another big difference actually, which is, as you go from academia to to the rope to Surf’s up, want to misstate this, but like, certainly, the attitude goes from one from a combative attitude of like, I’m also a very smart person looking at your work. And you need to convince me that you’re also smart, and your thing is good, too much more receptive, friendly. Like, wow, no, if you do something that makes my life better, that’s amazing. I will tell you how happy I am. I will not feel threatened by the fact, you might have just ruined my next few years of research. I will be delighted that I can throw away that horrible thing I’ve been working on and replace it with your thing. So the attitude is like this sort of emotional feedback. Yeah. pretty grim. too. Much more positive. Yeah. Yeah.

Kostas Pardalis 32:17
Yeah, I think from my experience also, I think people look at developing much more grumpy than the people can stop.

Arjun Narayan 32:26
I just want to put a plug, I had a delightful experience, I think the PhD, the you know, I finished it, I graduated, it was probably some of the happiest sort of most carefree times where I get to read whatever I wanted to read, sort of had access to experts who I could poke at learn. I decided I wanted to read this large tome, go sit in some, you know, 18th century library that was beautiful, and spend morning to evening drinking, like five cups of coffee, reading three books. There’s a lot, there’s a lot, there’s a lot that, you know, my personal experience was wonderful.

Frank McSherry 33:00
To be totally clear, and very much enjoyed my academic time as well, he was great. I guess I would just say that, as you go from a place where you’re like, you know, satisfied by the things are satisfied, not keeping, you’re trying to translate those same things into finding happiness, delivering real things to be here, you might come to different conclusions, when you show up and try to communicate with people who have real problems and are going to tell you what they think of your answers,

Kostas Pardalis 33:25
you’re going to tell the percentage and like, I don’t say, like, okay, it might sound a little bit like I’m fitting because that’s what you like walking into Starbucks was like, being human, la la land or whatever. And like academia is like, just like, you know, don’t be sad people. That thought is the case. It’s not like the incentives and like, the type of work is different, right? So you have like, when you see, like, there is a reason like people not continuous, like so thoroughly critical about things that’s like how progress is made. At the end, we’re looking like abstract things. So I’m not trying to say that they are the municipality but it can be a very happy place. Okay, enough with academia. Now I want to ask the unzoom like equation that exists, like more products and like startup oriented, you mentioned they are bases and how unique they are, in terms of productizing them and getting them to market. Right. And my question is, what is an MVP for a database? I think

Arjun Narayan 34:27
an MVP per database is like an MVP for many other products, something that’s, you know, you ship too early that you put out there that people see the potential but are unwilling to put into production, right. So the people may not be willing to put your database into production for an actual use case that has an SLA attached to it. But if they can see the potential that this actually significantly accelerates, the time to value for them to build whatever it is they’re building. Then then you will get some signals from the market that it is continue, it is worth continuing to put in effort, right. And what that concretely translates to means that you have to look for signals of success that are not revenue, because you can’t get money for that excitement, what you can measure that excitement, right, you can look at the number of folks who are downloading and using it, you can look at the number of folks who are paying attention. It’s just not going to be revenue. And you have to be very clear that the rough thing that I had in mind when I started was like it takes roughly speaking $100 million of capital raise, and not a lot of it a large fraction of that deployed before that monetizable moment in you. So going into it eyes wide open, I had the tremendous benefit of that advice from Spencer, right. So Spencer had sort of known this when he started cockroach, which was, you know, 100 million dollars is roughly and that doesn’t mean we raise 100 million dollars in one shot, right, like we did it over three subsequent rounds. It’s that each level of de-risking involves showing the world that MVP of a non production piece of software that was incrementally more filled out. And measuring that sort of positive reaction that this is a thing which, when completed, I would be delighted to use.

Kostas Pardalis 36:26
All right, and where is materialized today as a product? How far ahead of like the MVP stays, is a great

Arjun Narayan 36:36
question. We, we have this since we last spoke, I think 18 months ago, we have built a cloud native distributed version of materialized and maybe Frank can get into some of the details about that is currently in early access with a select group of,

36:56
of users.

Arjun Narayan 36:59
If any of the listeners are interested in becoming one of those sort of early access customers, please go to our website and sign up. And we’re onboarding new folks to our cloud platform every week. But it is something exciting, why not give it over to Frank to give us the details.

Frank McSherry 37:17
It’s actually a good transition from like the question of MVP isn’t potentially like sequences of things that you reveal, to build more competence and also exercise. Sorry, assess more excitement about things. About a year or so ago, certainly, last year I talked with Arjun. The state of materialization at the time was a binary that you could deploy on a large computer. And sometimes it was like Postgres, like a type of thing, you know, you got a big machine, you run materialize on it, and it will keep your materials used up to date pretty fast. But had some similar limitations that people bumped into at Postgres, where they got some limited resources. And if you have one of these, and your friend shows up and says, That’s amazing, can I use it? You say, No, it’s mine. Stay away, you know, you’re gonna screw something up, like I have real production jobs burning on here, you can’t. And so going from a thing which had this individual binary to a thing that did a great job of assessing people’s appetite for this, they can see this and like, this is great, I really like to use this and start asking questions about like, you know, what happens when I have the second use case, or if I want to bring in more team members, these types of things, and books had realized, you know, credit, where credit’s due to the people out there like Snowflake and whatnot, that separation of storage. And compute was a great way to go and do this, that if you design a system where the data can live and scale independently of the compute assets that you want to attach to it, it’s super easy to go and turn on additional compute assets, and then bring more people onto your pile of data and give them the experience of unbounded ly scalable system, both in terms of how much data you could throw in there. But also, you know, the person who says I’d like to use the studio, I can just press this plus button right there, you’ll get your own computer, and you won’t screw up my production system either. Where materials is gone, essentially, in this direction of decoupling the architecture of the previous monolithic architecture into, in fact, three layers but like the storage layer, decoupled from a compute layer, decoupled from what’s essentially a serving layer, where you land data in the storage layer. And now it’s not just static blobs of data that you lend there. You’re lending or we’re lending insensitive to you, continually updating histories of data. So how your data have changed over time, with very clear indicators, as the lender what is the exact from our point of view, the exact time at which this change happened, right now, essentially enough information that any two people who were to look at this data would agree, how did the data change and exactly which moments allow us now to turn on these these compute nodes which was the same Compute Engine that was in the monolith? Get materialized, but now on as many computers as you want to big sizes, small sizes, whatever, you know, all of the above reading the same data coming to the same conclusions. So sorry, come to compatible conclusions. exactly consistent conclusions. You know, if one of you gets a count of this data and another, you get kind of that data, and I look at the joint that the numbers will add up exactly, at all times. And this is, yeah, and the same with Arjun mentioned earlier that consistency guarantees strong consistency is very sort of liberating for users. Same sort of thing, I think that a lot of them are looking for in these low latency data systems. Where else you know, you got to look late in today’s system, and you just have to be like, Well, wait, that’s not very useful. I mean, it does a thing, but, and Lily, these three things are sort of our three watchwords, skeleton consistency, low latency, pick three is the new cloud native version of materialized, so you can rock up and be confident that you get that same experience. But it can grow with you as you either get larger use cases or more use cases or more team members, that sort of thing.

Arjun Narayan 41:06
A lot of sorts of things that folks have asked, had asked for when we had that monolithic, materialized experience also sort of fall out neatly from a separated storage compute architecture, for instance, you know, the storage is infinitely scalable, right? So it’s backed by S3. So you can sort of lock land extremely large histories and store them very cheaply. Another one is replication, right? So you can have highly available materialized because with this, you know, permanently stored exact time stamped history, the computation is always replayable. So you can have two of these running at the same time, you can have two of them that have different hardware footprints as well. So you can have an horizontal scalability, you can have a horizontally scaled multi machine cluster, running alongside a small tiny sort of test cluster or a, you know, development cluster, have those sort of give you the exact same answers, maybe this one take longer because it has fewer compute resources attached. And you get this mix and match experience. Kind of like Snowflake, web, Snowflake, you get, you know, can mix an Excel warehouse with an excess warehouse. They’re all connected to the same sources of data. So I’m

Frank McSherry 42:16
gonna go a little deeper on one of our drones examples, because I think it’s really cool. Yeah, I resonate with it at least ergonomically, which is replication. You can have, because we’re computing exactly consistent, exactly identical results. If you want to do rescaling, for example, you’ve got a large machine, you realize you need to go out to two machines or four machines or something like that, you literally just go and spin up another copy of the same computation is a nice index for this. With more resources, it comes up to parity, you turn off the first one, there’s no noticeable interruption to your, your use of this system, at this point in the cutover is consistent and instantaneous. And you now have just rescaled from one size to the next larger one if to accommodate you know, whatever spike you’re seeing or just general growth and you didn’t have to spend 10 minutes. Everything turned off and rehydrated itself.

Kostas Pardalis 43:13
Okay, that’s super interesting. I have a question. So when I played around with materialize a long time ago with the binary that you could download, the feeling that I was getting is that this is like a technology that I can use on top of another database, right? Like I can have, let’s say, my Postgres and I want really good and lol baked and see materializations happen there. So I can attach to their application log, and start doing very low latency like materialized views on top of that, or on a Kafka topic, right? But let’s say there was always something else involved there. Like there was some other like database that I needed to enable then tartans, let’s say, functionality of that database is materialized if I understand correctly, right now I’m talking about like a more end to end system where I can do everything with my data I sitemap realize is this, I get these rights. Do you

Frank McSherry 44:22
actually, maybe I should have said something which is obvious to me that it wasn’t, which is one of the things that a storage and compute separation gives you as a storage layer, we didn’t have one of those. That’s maybe what you’re putting out. So we’d always act essentially, as a cache of some upstream source be that Kafka or Postgres. And by owning the data, now by owning is wrong by pulling in your own copy. This is healthy, both from providing consistency up to people basically, you know, we have to hope that like Kafka doesn’t throw away data or something that just isn’t actually the case. We’re led to provide some guarantees. But you’re also able or interested in lending data directly into materializing and creating some tables, or inserting data into those tables will happily keep those around for you as well. It’s a little, I mean, there’s trade offs in that if you want the highest of throughput, ingestion. Something like Kafka is probably gonna do better than one piece equal connection where you copy paste a bunch of stuff in. Okay, we’ll come up with a trade. Okay, that’s super interesting. So, again, I want to focus a little bit more on like the experience, just like because I want people liked to understand

Kostas Pardalis 45:36
not just like the technology, but like, what they got to do with materialized right? I think it’s important. So let’s say I create an account and materialize. And I will like to throw data into it, can I create like a bucket on S3 and starts pushing it,

Arjun Narayan 45:55
it’s even simpler than that, right? So you sign out, you go to cloud.materialized.com, you get a connection string, you type p SQL connection string you’re in. And then you say, Create table, then insert into table insert into foo values, blah, blah, just as if it was Postgres, right. And these things are inserted, and then materialized will save that in an S3 backed storage engine layer persistence layer, you can again, you know, connect to this using any application that writes data that has a Postgres driver, right. So with any programming language is a stock Postgres driver that you can use, and can start inserting data into materialized, you can then also create these materialized views, you say create materialized view as count star and select count star from the stable join, or join some other table, things like that. One interesting thing is that you can join data from heterogeneous systems, right? So most people’s architecture involves many things, right? So they already have an OLTP database, they already have some web events coming in through that maybe are loaded in a Kafka topic, maybe they have some other systems that are loading data into Kafka, some micro services, things like that. And one of the things that people often want to do is to enrich the streams of data by joining them against another stream of data, right? So you might want to take your web events and then join it against the customer record of truth, that is in your OLTP Postgres data, right, so you might want to take the stuff that you’re learning in these new tables, and joining against those materials is very powerful at doing these sort of join materializations primarily, due to the architecture of differential data flow and timely data flow. D. All of the students that have you interact with using the Postgres sort of client driver protocol, be it inserting data, be it for that sort of control plane of creating those materialized views and creating that stack of sort of data pipelines, be it for the like, create cluster, you know, Excel statements that requisition resources in the cloud, or for reading back from some application or some, you know,

Kostas Pardalis 48:23
BI tool or things like that. Okay, so, if I’m, let’s say, like a user of snow, right, like, I’m using snowflakes, and I’m, you know, like, I have, like, a very specific way, was more of like, you know, collecting my data, and I’m ingesting my data, loading my data, I’m performing my data, I’m consuming my data, right? Now is these low for, let’s say, lifecycle of data, like, you know, traditional clouds warehouse, represented with in materialize is different isn’t the same Can I reuse, let’s say the tools that I have, like using already and just like, you know, throw out Snowflake are BigQuery input while they realize there.

Arjun Narayan 49:15
You could with some caveats, which is that, you know, we don’t support all the yet all the ways to ingest data. So you know, we don’t support five Tran yet, but a lot of Snowflake users are putting data from Kafka into Snowflake, okay. And that very same Kafka topic. can also send the data out to materialized users may be using DBT models to do the staged pipeline creation and Snowflake, those very same DBT models, the materialized mode, most many, maybe most of our customers use DBT, to orchestrate their pipelines and to sort of manage that lifecycle flow of bringing them back up or being Some change and then recreating the downstream ones and testing and all that,

Frank McSherry 50:03
Though, sorry to interrupt with DBT, it is like a great story because people have been trained to express their grief and their needs in DBT and DBT. Behind the scenes, you know, does some head scratching to figure out like, Ah, you really want me to redo all of that work? Do they have this incremental mode that they will attempt to go and

Arjun Narayan 50:22
I believe they require you to write the incremental version of your SQL query, which is

Frank McSherry 50:27
wrought with raro, it’s going to be wrong if you read it. But there’s a materialized DBT driver that will just say, like, you know, what, you don’t actually need to do anything. Just, we will keep your query running for you. And as your data change, you don’t want to redeploy the model, or the model is still the same.

Arjun Narayan 50:46
thing another way, if you’re familiar with DBT, of the different way to frame materializes, and automatic DBT model, incremental iser abt model and make it incremental, and on a millisecond level basis, that sounds exciting, then tell us what you’re looking for.

Frank McSherry 51:05
So in many cases, I would say like, the gold for sure, with materials is to let people take their existing business logic, for sure. And like some of their, say, like, the business analogue of their practices, and transport those into materialize without much friction, but some of my happiest moments. Sorry, this is weird that I get happy with these things. But it makes folks have shown up with, like, the world’s healthiest SQL query? Yeah. Yeah. Was the zero attention paid to like, how would I make this work past and materialize? I just blocked it in and it’s, it runs, and it gives the right answer. And they’re like, well give her answers. And that, you know, he’s like left joins and subqueries, and all sorts of horrible sorts of things that you wouldn’t have encouraged them to start with, they’re not in any of our Quickstart guides, you know, that you should do this. But it’s super pleasant. You know, step one is not like, ah, refactor all of my business logic to, to look different. Yep. And, you know, there’s some data inertia of where, like, you know, if all of your data are in Snowflake, we don’t magically have access to all that data. But in principle, you know, one of the challenging things, figuring out how to change all of your queries and how you’re hoping to interact with stuff. That’s, that’s meant to be as low friction as possible. Yeah.

Kostas Pardalis 52:22
It’s very interesting, because, you know, like, for anyone who has worked in, like, developer tooling and sending it out, I think it’s even more profound when you’re talking about data infrastructure. Like, a big part of the developer experience is like, okay, yeah, like, your technology is great. But like, do you know, like, how much shit I have to migrate, like to move from one thing to the other? I think like, one of the best examples of that is like, Python, pi, Spark and spark, right? Like, there are literally 1000s and 1000s of lines of Python that is moving data around. And it’s just not that easy to, like, you know, like, ask like, engineers to go there and like, just write everything right, like to move to something else. So it is important, in my opinion, like to try and, you know, close this gap and live like, the developer have like an easier life at the end, like choosing the right tool and over like migrating when they have to write,

Frank McSherry 53:26
There’s a very uncanny valley here. And this is sort of its origins point about, like how much of a database you have to build before, you know, if you built 95% of sequel, great, it’s like 100% chance that person’s query isn’t going to work. So you might have just told them that upfront that only to more or less start over.

Arjun Narayan 53:44
There’s, there’s, you know, there’s been many attempts over the years to sort of build these sort of sequels, like, half of SQL that have I think hive is the one that sort of, is the safest one to talk about. Because no one is pretending any more than hive is a sufficient sequel. But there’s many of these analogues exist in the streaming world, which is like, Well, yeah, we have a sequel, as well as Neisser, look at SQL, and it’s like, only do the inner joins only, you know, don’t support this, don’t support that. But by the time you get down the list of caveats, you’re like, Well, what is it dude that you do support, and it’s like the four examples that are presented in the Examples page. And, you know, one of the decisions I feel very good about, in retrospect, was the sort of dogmatic insistence that we’re going to try and do pretty much all of Postgres, and our list of caveats really needed to be absolutely minimal. And we chose the Postgres plug for Postgres. Because the surface area of SQL is so large. If you are going to implement the standards compatible SQL, you have to make a variety of judgment calls On the way, so what do you do with null handling and like this weird edge case. And it’s remarkable to the extent to which Postgres has like very well reasoned and publicly available ly documented, like your reasons as to why they do various things. And so you can go and sort of re implement that to spec. Mice, MySQL doesn’t have as much sort of documentation and thorough sort of rigor in even though it has a larger deployment base, which is very useful when trying to re implement

Frank McSherry 55:32
to spec. Like, I mean, if you don’t mind, let me leap into I’m just gonna riff on a thing Arjun just said, because I thought he was gonna say something else. Which is that we’re Arjuna mentioned, there are a bunch of sort of, not 100%, not even 99% SQL implementations on top of big data systems or streaming systems. You have the same problem with databases and materialized view implementations. There are a lot of them, they’re like, oh, yeah, we support materialized views. Yeah, absolutely. And then there’s a long list of like, Oh, you haven’t, don’t use aggregates other than some uncounted. I

Arjun Narayan 56:02
mean, one is don’t use joins, a lot of them, I just think don’t join, whatever, don’t use

Frank McSherry 56:05
join a lot of them. Like, if you’re lucky, you know, Oracle and SQL server are pretty good there. Like, you can do joins, but like, gotta have primary keys, or you got to, you know, don’t do this other type of Juno’s self joins for some reason that I don’t understand. But all of them have basically the same property that if you plugged your sequel in there and said, Go, it would either say no, or it would say yes, just for a moment. My, my plan has changed to just like reevaluation from scratch or something horrible like that. It just the fact again, that there was, you know, it was not 100% coverage there, that the actual lived experience of trying to use materialized views in existing systems was not one of delight and pleasure so much as we’ll be

Kostas Pardalis 56:49
using and materialization overuse, like is suds, like a complicated story. I don’t think I like people that haven’t worked in big enterprise, like they can understand, like, how big of a problem is like, it’s very good that you mentioned not to like chi, because like, I’ll give you an example. Like one of the biggest problems that people have in migrating away from hive is how to migrate in an automated way, all the view definitions from hive, to whatever system, because obviously, like, yeah, there is SQL out there. But SQL is not one language, there are so many different dialects. And if you also like, into the UDF, that people are like building and like how the semantics of like, the UDF might be different. Like, it’s, it sounds like such a hard problem were thinking about like, you can have like a little prize that has like 1000s of people that like run like Queens every day like 10s of 1000s of queries on the system, like migrating all these views and like all these uda is like it’s just like, too much of work, right? Which means that like, there’s a lot of opportunity for people to do business, which is goods, and also like, do interesting research, probably and come up with interesting ways, like solving this problem. But let’s go back to materialize. Question for you, France. So the architecture as it is right now, if I understand correctly, like I’ll sign up. And I’m choosing I have like to select as a user clusters, or, like, what do I choose there is like a completely serverless like experiences of right half do I have to choose like something like similar to what warehouses in Snowflake,

Frank McSherry 58:35
it’s more like warehouses and Snowflake, yeah. So you’ll log on and you’ll be plugged, you’ll get your own environment, and you’re plopped in this default cluster, which is a relatively small thing that you will quickly exhaust if you try to do anything tremendous. But you have the ability to start to create new clusters. Within the cluster as you create replicas, these are the executors, if you will, the cluster is where you aim the query or in the views that you’d like to maintain. And then you provision them with resources and stuff like that. And you can point your you can either build materialized user, you can build indexes on view, one of them gets sunk back out to S3, and one of them lives in memory in index form, you can build these on the clusters of your choice Kanem jet, you can change the materials use up together. And so you might create cluster, prod, create cluster, test, create cluster interns, you know, go and deploy the various things you’d like to under prod, don’t touch it again. Go around it over to test and maybe in test, maybe you’re doing some sort of LubeZone style deployment. So you test looks a lot like products up, you’re gonna do a few different things to it. You just want to make sure that everything still stays up. You need to learn how you need to size the underlying instance in test rather than in prod. And the entrance at the same time we’re doing like 10 way cross joins on data that they shouldn’t have. And the fact that those computers are going to melt and catch on fire is fine. That’s that they’ll feel bad. But everyone will not be disrupted. Yeah, yeah. Okay.

Kostas Pardalis 1:00:03
That’s interesting. So I think it’s like a riot, like, we’ll give this the opportunity to talk about trade offs. It’s because people like to choose the cluster sizes based on the trade-offs that they have to make considering the workload on one side, and the computing system on the other side, right? So help us understand that trade those there when we are working with materialize.

Frank McSherry 1:00:27
So is this just to double check these trade offs within material as you say, or between materialized and other alternative solutions

Kostas Pardalis 1:00:34
that didn’t materialize within materialized like I’m a user of materialized for the first time, right, she helped me like, make the right choice. He’s, like, have like, in my mind, when I say,

Frank McSherry 1:00:45
Here’s a thing that you could do, when you log into material, you get a bunch of money, you could rent the biggest machine that money will buy. And just do all of your work there. Like this is one way that you could make one cluster bunch of resources and just start building all your stuff in that one place. There’s some pros and cons to this. So the obvious con, of course, is that if you do this, and your friend is in the same place, and they write a crappy query, you never had a crappy word, of course, but like they write a crappy query using shared resources. And you’re going to interfere with each other potentially take down the instance if you go and run out of memory, or just generally degrade performance. So that’s a bummer. But there’s a cool thing that you can do. What I want to say with materialized and not a lot of other tools, which is to build indexes over these streams , continually changing bits of data, and reuse those within and across different data flows that you build queries and data flows. So in some sense, one of the ways I think about materialization is probably wrong. But if you imagine a nice data warehouse, a system like like Snowflake and said, What can I just bring my own compute and build my own indexes, and do my own work there with preformed indexes that I keep up to date, with all of the properties you might imagine millisecond response times, and the ability to run 10 queries over the same relational data at zero incremental memory cost. That’s the sort of benefit that you get reusing a cluster. So if you’ve got five tasks with the same relational data, they’re all going to be looking at joining various things together based on their primary and foreign keys. So you build the smart indexes that you want to build on primary keys, maybe some secondary indexes on foreign keys. Your ability then to deploy increasingly complicated queries that do these are predictable joins you might do between all of these keys is greatly improved, basically, your additional cost of your next query, you don’t have to reflow the entire data set, you don’t have to build a whole bunch of private state in each of your operators. Very handy. And like, if you want to sort of run the leanest thing of you know, something that actually is up and running and isn’t going to fall over putting all one machine that can share these indexes share the memory Central, that’s usually the scarce resource memory. Makes sense. But you might not want it, you might want to start to shave up all of the work that you need to do there into smaller bits and pieces, for reasons of Yeah, isolation, performance, isolation, or fault isolation. Like, here’s a classic example, which is, you run an org, there’s a bunch of people, a bunch of analysts on your team, someone’s in charge of data ingestion, you pull in fairly raw, gross data from somewhere. It’s just JSON, it’s not even parsed out into the appropriate types, yet, some of the data are bad. So you’ve got a big chunky view, materialization task, which is actually there’s a view that I defined on this, which cleans up the data for me, cleaning up the data, takes some resources, you write it back out as a materialized view, into materialism, it’s now available for all sorts of other people to show up and say, Oh, it’s amazing, I’d love to just pick that up, whatever you wrote out, I’m gonna pick that up and work with it. So five people now can essentially put themselves on the end of your pipeline in a way that, you know, if you had some other streaming system, you’d sort of have to copy paste the view ahead of time that those expensive you’d have five people doing exactly the same thing.

Arjun Narayan 1:04:15
And this is how a lot of organizations work, right. Like there’s some set of experts who build the canonical personas. Yep. And so that should be run on clusters, that the management that may actually be so critical to the company, you might run that at some high replication factor, so that it’s extremely highly available. But then that gets sunk into a materialized view, the canonical persona is V. V. 14, or whatever it is, and evolved over time because they’ve enriched it and added some more columns and things like that. And then there’s plenty of downstream consumers. There’s the machine learning team, there’s the fraud detection team, there’s the upsell team, or whatever it is, and all these other teams. They don’t want to be rebuilding the crowd Nikhil personas pipeline, they want to simply consume an always up to date canonical set of canonical personas.

Kostas Pardalis 1:05:08
Alright, one last question about materialized and then I won’t assume needs to

1:05:16
delve deeper into Naiad. Unlike the core technology behind materialization. So my question for both of you is, what are let’s say the moles.

Kostas Pardalis 1:05:33
Not interesting, but let’s say trying to heal like people to decide based on like the use cases that they have, if like, now is like a good time for them to go and give a try. materialized, right? So what are the use cases we should have in our mind as like, a great example for going on materialized right now and like figuring out like the value of the product as soon as possible,

Frank McSherry 1:05:59
I can probably each give an answer. And the ones that I see are people who have already realized, like, you know, what I would do if I could get this data faster. It is not there, folks are already chafing against the like, and it takes 15 minutes to refresh my data. I just can’t use this for interactive experiences or something like that, or I can’t use this for. And so this is maybe too easy now. But like, clearly, if you’re already sitting on an example, use a case where if the data were fresh to within the second, you could turn it around to substantially greater purpose than if you got the roll up at the end of the day. Just, you know, the use cases, for sure they exist for the roll ups at the end of the day. But there’s these new classes of applications for folks, you know, interacting like data apps, basically, where someone has just shown up and said, Hey, I think, and you want the ability to go and grab the current up to date answer and show it to them. You know, if you want to do that without 10,000 lines of microservice crud that you have to build and maintain, there’s a good time in my mind to think about what that would look like in the sequel? You know, could I just write that? And if you can, amazing, and he just straight up?

Arjun Narayan 1:07:12
Yeah. I was gonna say, if you are beginning this journey of building a microservice, I think it was Josh wills who came up with the idea that your micro service should have been a SQL query. Joke. And yes, you should certainly start that way. And not to dig too much on microservices. But I do think 90% of micro services can be SQL queries. I don’t think it’s 100%. But it also frees up your team to focus on the 10% that truly is so differentiated and capability that that

Kostas Pardalis 1:07:47
requires its own bespoke set of sort of code. Okay, okay. So anyone who’s listening to the show, I mean, go and try to zoom the clouds, or bring it if you haven’t downloaded and played around, like, I wish I was closer to that. For me, it was a refreshing and like, interesting experience, even if I didn’t have a use case and had that point buds. Seeing how you can interact with data comes materialized like we use, like I think it’s very interesting. And it can help you identify use cases that have zoom. I’m so I’ll just go and try it. One we’re allowing other

Frank McSherry 1:08:31
users there’s some fun, fun load generator sources that come with that generate various auction data or dpch data. Even if like, initially, one of the big challenges actually was getting people on board with their streaming data. They maybe had an idea of what to do was like, operationally, how do I but there’s a bunch of realistic ish looking data. So you could ask, like, let’s see if I can put together a little mini website that I’m actually building over here. But getting to the point of can I prototype? Part of that in sequel is a lot easier now than it used to be just, you know, some free canned streaming data to to play with the

Kostas Pardalis 1:09:08
100%. Other percent, I think we need at least one, maybe we’ll just talk about that stuff. Like how you get data to try something, how, like, it’s such a huge mess out there. But anyway, I can complain about it forever. So let’s keep it offline.

Eric Dodds 1:09:26
That was awesome. We’re, of course, as we said in the episode, we’re going to break it into two. So let’s close out the first half here. We can at least give that to Brooks for how much pain we’ve caused him. One of my big takeaways from Frank and the origin story is how fundamental the motivation was for each of them. They came out from different directions. Frank had very strong convictions about what he had studied and You know, the projects that he was working on, I mean, he was building it in rust. And there were problems there obviously, that Arjun had highlighted. But it was an area of passion for him. And the way that Arjun described it was, I believed that these things can be truly helpful to the world, right? If we build this technology in a way that makes it accessible to people, and hearing how that shared conviction at a really root level, drew them together, you know, and especially drew Frank, you know, sort of to a place where he wanted to be involved in a business where before he didn’t, that was just such a compelling process to hear about. And I really appreciated the deep thought and the time it took for them to work through that. And of course, you know, as a result, they’re building something really amazing and materializing.

Kostas Pardalis 1:10:52
Yeah, I came up with saying something very happy to say that, like, it’s very exciting to hear their story and also I think, it speaks like a very interesting, true, that’s okay. Some people might say that even like, sounds like, a little bit romantic, let’s say, say that. But it’s, it’s amazing to hear from a person who has already, like huge impact in the scientific community. I mean, if someone goes and searches for his name, and sees just a certain number of likes, citations from, like, his academic work, like a gay I mean, that would be enough for many people to be like, Okay, I am done with contributing to society. It’s exactly, but hearing from shame, the above, like the dialogue that they had with June about if you want, like what you’re building to have the maximum possible impact out there, like the best way to do that is like serum building, product in the company and getting these like, in the market out there. I think that was probably one of the best things that I would keep from this conversation. And it’s also like the, I think, the foundation for and the teaser for the next parts of our conversation, where we will hear like Fran and Arjun but like, it’s I think it’s even more important to hear the electron flag saying how much of a distance someone has to go to take the technology and turn it into something that it’s a product A and can be used by many different people.

1:12:38
So let’s

Kostas Pardalis 1:12:43
keep it here. Because I don’t want to give too many marks, like spoilers for the next bots of our conversation. But, yeah, hopefully, we leave people on the right deep hunger, like, you know, the best shows out there.

Eric Dodds 1:12:58
Yes, man, you’re starting to sound like a marketer. I’m worried. Thanks for listening. Thanks for listening to The Data Stack Show. Definitely check out part two of this one, you don’t want to miss it. We dig into a ton of technical details, and learn all about timely data flow, SQL dialects, etc. and hear from some of the smartest brains in the industry solving these problems. Catch in the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.