This week on The Data Stack Show, Eric and Kostas welcome back Arjun Narayan and Frank McSherry, Co-Founders of Materialize, for part two of this great conversation. During the episode, Arjun and Frank dig into the technical details of Materialize including data flow, what it takes to build a model for users to interact with, incremental computation, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show. This is part two of our long conversation with Frank and Arjun from Materialize. Brooks was out when we recorded this one, so we went 90 minutes over 90 minutes. And Brooks made us split it into two episodes in the first episode, which if you haven’t listened to, you absolutely need to go back. And listen, we heard about the backstory of materialized and actually the individual backstories of Frank, who has an incredible, incredible history, building all sorts of interesting things and has, you know, an academic paper, that’s an unreal number of citations. And then also Arjun, who was studying at a PhD level database stuff and how they came together. And it was an amazing conversation. So definitely check that one out. In this episode, we dig into the technical details. So Costas, give us a little teaser of what we tackle in part two.
Kostas Pardalis 01:23
Oh, let’s hold on. So we are going to get deeper into what timely data flow wheezes by the way, we also have different flavors. Like we have to differentiate differential and Stein data flow. We also like getting to like that. And we will understand and learn more about why Frank got into building this. What is the relationship with MapReduce, yep. And also like what it takes from building a model that can do, theoretically, at least some amazing things, to read the point where these can be used by users. So it’s going to be super, super interesting, much more technical than the previous art. So yeah, I don’t want to say more. Let’s just let the experts log right. Buckle up. Let’s dive in. Alright, let’s talk about Naiad. And let’s talk about the rock level. Okay. I hear it’s our drill mentioning two types of Dataflow he used the data from differential Ed timing.
Frank McSherry 02:46
Yes. Good point. Why
Kostas Pardalis 02:48
Do we have two terms here? What’s the difference?
Frank McSherry 02:50
So it’s good? That’s a good question. So Dataflow, first of all, just for folks who are trying to get her on the same page, right, is the idea that you might describe your computer program, or what you need to do, as passive data through various places that you’re gonna do some work, right. And, you know, this is sort of like an assembly line building of things back to 100 years ago. But both data now move around. And as data shows up at a particular place, you say, like up, as data comes here, I need to go and canonicalize it in the following way. Or I need to do a join with some other data with everything that I received. But it’s a way of describing your program, using usually a directed graph with little arrows and circles, so that you get the answers out that you want, but you’re not too prescriptive about exactly what the computer is, has to go and do in a particular moment does spill that work across lots of different computers, there’s two flavors, lots of things get called data flows is fine. So we had two of them: timely data flow and differential data flow. The right way to think about them or a way to think about them is that timely data flow is sort of analogous to an operating system. Differential Dataflow is more analogous to a database is a terrible amount is a terrible analogy. But I’m gonna finish it anyhow, just real quick. Like Dataflow is the sort of layer that is, I would say, opinionated about what you’re planning on doing. With the data moving around, it just says, Hi, I will data from here to there. Amazing. You want to run this little bit of code over there? I will do that for you. Why are you doing this? I have no idea. But I will make sure that it happens. It is similar to an operating system like you will run a program Great. What’s it going to do? We’ll find out. database has a lot more opinions and says like, before you get to run anything, you have to get past me first, I’ve got some opinions on what you’re about to run. And I also know what the correct answer is gonna be. I’m not just gonna let you go and make a mess out of things. And this is where differential Dataflow sort of differs from timely data flow. It says, I believe that you’re talking about collections of data. I believe that you’re gonna communicate how those collections of data change. And the only thing that I’m going to do with them is predict how the answers to your operations would change in response to the input data. And you could do some crazier stuff that in timely data flow with differential data flows is liberating by saying, I’m only gonna help you do this part, but we’re gonna do it really well. I have, I think,
Arjun Narayan 05:11
restatement that is simpler. And so so timely data flow is a generic data flow system, right take the I like the assembly line analogy, you create a directed graph of operators but timing airflow you can have, you know, arbitrary operators that you write from scratch do a thing on the doodad re Combinator, what is that? I don’t know, it’s black box stuff goes in, think about doodads coming out the other side. Great. And you can write a whole variety of these and some people do. And that’s great. Differential data flow is simply a set of elegantly written operators that are opinionated that we believe or Frank believes or differential data flow, people believe that you might want, right, so one of them, for instance, is called join, you might be interested in that one. One of them is called, you know, reduce, I think there’s a reduce. And these are familiar operators. They are also opinionated about the shapes of their inputs and their outputs, right? The belief in time in timestamp differs from data, right? So the inputs are very different. You could imagine a MapReduce is a directed graph of timestamp data, right? You give it data, it gives you output data, differential Dataflow deals in diffs of data, right. And of course, the view of data, no diff diff is start from zero, here’s all the data. So instead of a generalization of batch compute, and it is a, those operators a lot of care and thought has been put into very performant implementations of those operators. So it’s a library that uses timely data flow underneath timely data flows, the underlying execution engine, differential data flow is a bunch of opinionated implementations of operators of Dataflow operators, that is still very surprisingly general, and useful. And on top of that, you know, you could put another layer, which is the SQL, I’m going to take the SQL statement and convert it into a differential Dataflow program. Now, this is what in fact, materialized says except materialized sits one layer even above, which is like, I am going to run many timely data flow computers for you, every time you type the words create cluster, I will create another time the data flow shape box. And then every time you see CREATE VIEW, or select, or something of that sort create materialized view or a select statement that requires doing some computation, I am going to translate that into props optimize that and do a bunch of transformations, and then come up with a differential data flow program, which then gets installed, run to completion, and then turned off or sort of run continuously and kept running on that timely data
Frank McSherry 07:57
flow cluster.
Kostas Pardalis 08:00
Alright, so differential time flow, their initial data flow built on top of timely data flow rates, and timely data flow, it’s much more generic, like an operating system, as you said, like Frank. So
Frank McSherry 08:16
what like
Kostas Pardalis 08:18
the Express Secretly, I don’t know if that didn’t, but like others like limiting the things that I can do with timely data flow in terms of what I can compute?
Frank McSherry 08:28
Sure. I mean, let me say there’s two answers. This is like, yes, there’s some limits. Absolutely. And the other answer is no, there are no limits. I’ll try to explain. Like, suddenly, the flow forces you to write your programs and in a certain way, and those ways, tie your hands a little bit. And sometimes that might be frustrating. You know, it doesn’t compel you to write structured programs, you can just one Dataflow graph, you can make her just a little self loop. And she’s like, I’m gonna do whatever I want. Because I just send data back to myself and do whatever I want Screw you. It’s not very helpful. When you do that, when you express a Dataflow, or sorry, a computation, as a data flow, you get some cool abilities. From the system, the system is actually more helpful to you, at this point, we can start to distribute work. Once you’ve actually broken things apart into different little pieces. You could have always written whatever you wanted us to one monolithic totally Dataflow operator that just doesn’t really benefit from expressing stuff as Dataflow. But as soon as you break it apart and describe these interoperating pieces, you start to get some benefits. Okay, you start to get currency and parallelism, all sorts of stuff like that. They’re not flip answers. Sorry. There’s a yes you can do everything answer which is not flipped, which is that the thing that that Naiad at a time he did for added on top of existing systems was loops. And loops were sort of the thing that was missing from big data systems to make them fully General. Get Various models of computations. There’s a CPU RAM model of computation, parallel RAM model of computation where you need three fundamental things to be able to read from memory, you need to go right back to memory, you need to be able to go over and over again, based on what you see. So, it turns out when you scratch your head and turn your head sideways enough, join our reads. If you join two things together, you’re saying, hey, go find me this stuff that has this address, let’s call it the key, you know, go look up some stuff. Okay. Reduce is the right. That’s the thing that says, We’ve got a bunch of folks who think that they belong at a particular address. Let’s go figure out what the right answer is. That once you get loops put into there, you now have the ability to write programs externally, you can take an algorithm off the shelf, and say, how would I write this in a timely data flow for sure, often differential data flow, and many of its advances made the reasons it goes fast and beats up on people. It’s because you can just take a smarter algorithm for the same problem. Again, a bunch of dumb algorithms for problems that are dumb, and people know that they’re dumb, but they fit in MapReduce. Yeah. And you spend 10 times more on computers than you really need to, but that’s fine, because you rented 100 times as much with Naiad. And intermediate for the cool thing that we were able to do was use the smart algorithms and be more just more performance, just do less work. Not because of our system building. But because you could transport intelligent ideas that other people would come up with. We’re not inventing these algorithms. We’re just transporting existing known algorithms into the big data space. You can implement, I’m not aware of fundamental limitations. I’m sorry, I’m sure they exist. And I’m sure you put this online on a list of things that people point out. But it’s definitely like a quantum step up over the MapReduce style models, which did not have loops, which are just straight lines. Data Flow graphs.
Kostas Pardalis 11:58
Yeah. That’s interesting. So question here.
Eric Dodds 12:04
So you said
Kostas Pardalis 12:05
Okay, I’m good at differential data flow, which is that I have some operators there that I can use right. And I can either like to have Manali do afternoon, right, which, okay, we will execute fighting. But the real value comes from like, there. Yeah. I mean, oh, obviously, you want to parallelize that. So you can scale, right? And you want to do that like for a developer, you want to use the primitive and not have to worry about how this thing is going to be paralyzed? If the paralyzation is going to be consistent, and sounds like Alyssa, do you? And then like limitations in terms of like the operators? Like, is there like an operator that I can build ads for? And date off? No, we do something that cannot be penalized?
Frank McSherry 12:51
So this is a great question for me. Let me actually back up just a moment, because you said, you use this language so that you can paralyze and it’s actually more complicated or better than that, just not only do you get to paralyze, that’s why you would use MapReduce, or spark or so the reason differential Dataflow would want you to do it is because they automatically incremental eyes, as well. Okay. So all of this parallelism that you got, let’s imagine that you spread the workout across 10, computers, or even a million, you don’t have you like computers, but let’s pretend. And if the input to only one of them changes, you only need to redo the work in that one location. So the real advantage, actually, in my mind for differential Dataflow is that by using this program, for the same reasons that they parallelize, they happen to instrumentalize as well. So these operators that we force you to use joins and reduce maps, filters, stuff like that, tricked you into writing your program in an automatically incremental Izabal form. You can always write cruddy, you can write a Reduce function, right, that says, there’s only one key, you know, true or something like that. Please give me all of my gigabytes of data, I’ll do the function on it. And we’ll see what happens. And you can write that into partial data flow. And you will, unfortunately, be disappointed to find out that if any of your input gigabytes changes, we will show you the gigabyte again, slightly change and say what’s the answer now? Because we don’t know what you’re gonna do with you might be computing a hash of this, in which case, the answer is totally different. And we really can’t help you out. If on the other hand, you were to say, Well, yeah, I’m computing a Merkle tree or something like that, what I really want to do is break apart my data into a bunch of different pieces, hash each of the pieces, put those hashes together, and then get an answer at the bottom. If any one bit of data changes, we’d only need to reflow the changes to the hashes down the tree and you’d have no efficiently updatable things. You can read the credit version, as well, you just won’t be delighted either by its paralyzation or by its incremental innovation.
Kostas Pardalis 14:47
Okay, let’s Well, that’s great. And my understanding, correct me if I’m wrong, but these operators that we are talking about have been implemented as part of differential data flow . They feel a little bit more, let’s say focused on processing data, right? Like we’d have, like, joined MapReduce, were thinking about data sets and trying to run like some aggregations, probably like on top of them, like all that stuff. Others like other types, or let’s see operators out there that have been made IDs that are not like, only have to do with like, aggregations and joins. And like the stuff that we usually
Frank McSherry 15:30
split for sure. In the space of incremental computation, there are different approaches to how you might go and try to convince someone through an incremental program, how you might elicit from them. So in differential Dataflow uses a technique called Change propagation, which basically says, let’s see what the program is and change your data, we’ll see what happens. It’s very data centric, it’s about moving the data through the competition, seeing what happens differently. There are other approaches based, for example, on mobilization. So you have things like Matthew hammers adopt on and he will motorcars various, he has a few different approaches in different languages that are based more on memorizing incrementalism and control flow systems. So these are, you know, if you had a program that has a lot more ifs and else’s, and wares and whatnot like that, while it’s I guess, right, in writing sequel too much, then they’re going to respond much better to that, versus one of the sort of obvious when I say that love, but one of the downsides of a Dataflow program generally is that the Dataflow graph is locked down, like you write that. And that’s what happens to the data, you don’t decide halfway through the execution that really, it should look different, or something like yeah, if you have two things you want to do and choose between them, you write both of them, you have little switch node up front, but you have to write both of them. And that’s super gross. If there’s 100,000, different ways that you could do work. It’s really handy if there’s five ways to do work out of 100,000 bits of data. But these other systems are going to be much more appropriate for control flow heavy. just turns out that data processing is pretty popular at the moment. So the Oscars Yeah. Makes sense. What other like
Kostas Pardalis 17:13
areas, you see these incremental computation. Having an impact today, or you think like, we’re going to see like more of it
Frank McSherry 17:23
happen. There’s, there’s a bunch. Let’s I mean, these are like application areas, you could drop down origin, or just load up on the side. SDN, which is one. Sorry, Software Defined Networking. Yeah. Where you use logic to describe where in the world little bits of packets, I might have just stolen ergens thunder, by the way, who know you know this better than
Arjun Narayan 17:45
I do? So I was actually looking up publicly sizable sources, so that I could learn to check if I was allowed to speak about it, I say, yeah. And
Frank McSherry 17:56
yes, yes. Sorry. So VMware is happily using differential data flow. As well, in prototype for various software defined networking, where your goal is to describe the configuration state of networks, you know, other systems generally, let’s say but like, in worst case, networks, packets need to go from A to B to wherever you really want the property that says, No, it’s not super data intensive, actually, their control plane, necessarily, but you want the property that as soon as something changes, as fast as possible. No joke, you get to the right, new answer. And no glitches either. Don’t scrub. And
Arjun Narayan 18:37
So a good way to think about it is when a VM is moved, the host networking address has changed if you want to precisely cut over all the streams of TCP packets that were going to the old hardware host to the new hardware host. And you don’t want to actually duplicate any packets, you don’t want to duplicate TCP, right? be fine, because it might layer over you and fix the errors. But if these were, these may not be TCP packets, maybe use UDP packets, they want all that stuff to be the control plane to sort of do that, that Indiana Jones swab perfectly.
Frank McSherry 19:12
Does it make sense? Yep. Um, there’s plenty of other places. Like there’s lots of applications, especially now with things adjacent to data. I mean, actually, in the heart of data, but maybe one level up, or you have, you know, all sorts of machine learning, various serving tasks and stuff like that, that people are machine learning, I think often actually is is another example of a different way to do incremental stuff, like a lot of machine learning is based on stirring a pot for a while until you get the answers and if the data changes, like great, stir some more. And, you know, sorry, funny mental image but like, the idea there is that your models the compliment in that sense that as you put whatever data in you’ll get to the right answer so it’s totally fine to throw in a little bit more data and you know, keep going there with a different person criminalisation. You know, there’s a whole bunch of incremental work going on on things like parsing, you know, if you have your 10,000 line source file open and you’re going change a curly brace somewhere, you don’t want to rescan the entire entire file and rerun that thing. So it keeps up. There’s some bits of differential data flow that were used, I think not anymore. But were used and trusted type checkers internally, for example, trying to determine how someone wrote a valid program or not. Again, I think that it being incremental is kind of handy on account of re analyzing an entire code base, but the compiler, but also you like lint and stuff like that just rechecking our code base. A lot of people are essentially a lot of organizations or CI bound, right? Like, you can’t land the next bit of code until 30 minutes have gone through where someone has gone and reanalyzed all of your stuff, and you’ve turned to a bunch of random nonsense if you can turn that into one minute instead of 30. That’s a great feeling. I
Kostas Pardalis 21:01
I have a question about Ross, because I know that you have also contributed some stuff there, like for the compiler?
Frank McSherry 21:11
Kind of but not as much as you might think. But I think it’s very
Kostas Pardalis 21:15
interesting. And it’s really interesting, because I think it’s important for people to understand like how, general these architectural these models of computation, right, and we talk about data here, but I think I like bringing like an example from something that might feel like Alien enough from data, which is like compiler and using like a similar technique to perform something I think like, makes people like understand like, the expressivity of these things that we are like talking about
Arjun Narayan 21:45
I think the country position because I hate one of the jokes I like to make here is we will be successful when we have users who are delighted by materialized but they’re all they know about it is me have SQL, MySQL slow me use materialize me SQL go fast, right, like. And that’s important. Because, again, back to academia, like you have to earn the right of taking up the users time to care to understand all of this stuff that’s below the iceberg below the waterline, right? Well, this stuff is important, we got to sweat the details. But by no means Can you keep your pitch to the user because you’ll get all those wonderful, deep compiler techniques. It’s not that people are dumb, it’s that people are busy, right? They have business problems, they don’t have enough time. And you have to approach them in the data stack that they have with the queries that they already have, and say, Hey, in five minutes, you can auto increment lies as a DBT model and have it be real. And then they’re like, now they’re paying attention right now, like, how did you do that? I might be interested in doing more things like this. And that’s a good time to start talking about some of the things that we’ve started talking about.
Frank McSherry 22:58
Yeah, this point, you’re actually like, it’s definitely great too, if you start and show someone, I can keep your accounts up to date really fast. That’s cool, and maybe eye catching, but like the scary experience is certainly Alright, I’m gonna, I’m gonna do counts anywhere, I’m gonna make it a little harder. Getting whatever it takes to get the competence there with people that actually the horizon for how much you could potentially do with this is quite large. One of the things that we’ve not yet put into materialization because we’re busy is recursive computation. It’s the thing that’s most I think, no one else out there is prepared to put recursive SQL in, especially if you maintain in Genesis, I don’t say easy, that’s wrong, but like 100% that computer planet has prepared for that. And it’s, in many ways, nice to know that Micho isn’t going to be out of date in a year or two, when people realize that they could benefit from, you know, some recursive rules, because, like all the software defined networking stuff uses data log and has recursion in it. Does that mean that you won’t be able to materialize to wherever your that your application takes the next and unfortunately not unfortunately, it’s brought unexpressive And
Kostas Pardalis 24:12
yeah, so one question about that. It’s okay, I’ll ask him a question like that combined. And I’ll get war back to me. I am like a Neanderthal here and I just want like, you know, like, things to be easy. So you built a library, right? Frank so you’ve built something there and I’m saying that because part of like the conversation in the beginning with Angela was like, yeah, like it’s cool you build this thing over there like I got them to come like probably use it but from that to make it accessible like to every people out there like everyone out there like there’s things that need to happen like you’d like sequel for example, right? It’s something I like. More people speak. So what’s like, what’s your experience from that? You build like the library in a very specific mindsets, or wherever you were coming from. And then you started seeing, like, the steps and like the things that need to be built on top of like to make it like even more accessible. So how different it is, and how much work is needed, and how many people are like needs to find the right way to do that.
Frank McSherry 25:31
So there’s a big difference was my conclusion between building a library and building a product in the library got built, certainly, with help of colleagues that I had throughout the years, but I would say, you know, timely data flow differential, they have put together about 15,000 lines of code or something like that they’re not. They’re not large. You build libraries, or my experience has been that when you build libraries, one of the things that’s valuable is your opinion, you get to tell people what the rules are, when they show up, you get to tell people, here’s how to correctly use the thing that I built. And if I think what you’re trying to do is dumb, I’ll find some way to rule it out. Because I think, like, it’s not gonna work out well, for you. Yeah, when you’re building a product, you have to do quite the opposite, which is, people are gonna come to you and tell you, here’s what I’m planning on doing. And if you want to do business, you need to make sure that is accommodated. You know, I would love to delete various parts of the sequel spec, because I think they’re missing features. I’m not allowed to do that. And I have been dragged to the opinion that I’m not allowed to do this. And I need to instead figure out how to interpret the weirdest things that people wrote down in SQL and turn them into meaningful computation that behaves itself. And that’s not easy. Like, there are plenty of other people in the orc who are better at that than I am. And it’s an interesting technical challenge to figure out by translation. Again, cutting ideas here into more pre chewed and easy to use packets. But yeah, very different experiences. One of them, you know, the library very sort of like academia, very inward focus, you know, I’m going to do a thing that I’m, I know how to use works great for me, transitioning to more of a an outward focused, how do I make a thing that brings what we have, that’s cool to as many people as possible.
Kostas Pardalis 27:24
All right, I have over monopolized the conversation. And we are all over our time, but I don’t think it would be a shame to stop the conversation. Because it was super, super interesting. And like, I learned loads, but 30
Eric Dodds 27:45
years, so we actually, I get to make the rules. And y’all are awesome. So I’m super excited about that. But so if Brooks doesn’t quit, we send this file. Well, and not a question that has really intrigued me throughout this conversation. And in there, I mean, wow, I’ve learned so much. But one thing that both of you continually bring up is empathy. And it’s very clear in the way that both of you describe even very deeply technical concepts, that you have a very high level of empathy. And both of you use the word delightful, a lot. And you’re very descriptive and sort of describing experiences. I’m so interested in where that comes from, because you’re very aligned on that. And I think it’s very rare, actually, especially when discussing deeply technical topics, to have delight as such a foundational value. But that’s really I’ve heard, you know, throughout the last 90 minutes, repeatedly, so I just love to know where that comes from, and how and maybe for our listeners, is, Have you learned anything about how to develop or maintain that focus?
Frank McSherry 29:05
Oh, to be totally honest, I have I think intellectual appreciation for empathy. And, you know, practicing it, but it’s, yeah, it’s not where things started. For me, I think, Well, I think if you have a variety of experiences, like I went from being in academia, to being unemployed, to, to eventually being sort of been like, one of the things I was sort of cool about that was getting to bumped into a whole bunch of different people doing different things, different levels of background, the going from talking with academics, to going and talking to people who were, you know, smart, but really quite busy and being asked to do dumb things that you agree are dumb and like you realize, wow, because not everyone had the same experiences that I had. And then you have some of those experiences yourself with a bunch of PRs that people file against your library, you know, just yeah, having access to it. A broader and broader if you can manage it, variety of experiences in life definitely hammers home, how many different people are coming from different places and what’s actually worth doing to make as many of these people happy as you can think a large part of it. So I forget where I heard this
Arjun Narayan 30:19
framing of it, it’s not original to me, it’s it comes from somewhere, I just don’t, I’m forgetting. But the thing I can tell you to remind myself is Imagine you’re sitting down with some people, and these are incredible. You know, you have to not think about it as dumbing down what your contributions are, because the audience isn’t smart enough. And I think a lot of people make the mistake of trying to dumb down things. It’s not about dumbing things down. Imagine you’re sitting down with a bunch of incredibly intelligent folks who have been absolutely so swamped that they have had no time to think about your problem. So they’re fully capable of understanding it. Let’s say you’ve got three Nobel laureates in biology, chemistry and physics in front of you. Right, they are very busy people, because they are consumed with very hard problems. And that is what they think about every single day. And now you have a shared problem, you have to explain it to them. Again, it’s not that they’re not smart enough, it’s that they have zero, devoted zero minutes or seconds. How would you explain things? And I think that goes a very long way to setting a tone, which is you never really talked down. You educate because people are busy. And that’s exactly what the case is in the data ecosystem, right? Like, most people writing SQL queries have shit to do, which is why they’re writing the SQL queries, we can nerd out a lot about SQL and query languages and microservices, and but you will lose your audience, not because they can’t handle it, but you need to at first, have an experience where they’re getting value. And ideally, in a world where they don’t actually need to dig through all of the various ETL, they might have to get into one or two specifics, if it pertains to the specific business problem that they have in front of them. But if you start from the premise of first having to wrap their heads around your entire field, before they can make progress in their field, then you think you are pretty doomed.
Eric Dodds 32:28
Wonderful advice to end on. Thank you, again, for giving us so much of your time, this will be our first double episode, which I’m super excited about. And we’ll definitely have you back on again in the future, to hear even more about what you’re building. So thank you again. Thank you,
Arjun Narayan 32:47
Thank you very much. It’s
Frank McSherry 32:48
It was really fun.
Eric Dodds 32:50
causes that whole conversation. I know we released it in two parts, but it was over 90 minutes. And it really felt more like 20 minutes, I would say, and was just such an enjoyable episode, you know, doing this for two years, it’s definitely going to be one of the ones I think that sticks out. I’m going to my big takeaway, I think, from the conversation is actually something that we discuss right at the very end. And it was remarkable to me how both Frank and origin really independently, I think authentically, independently from an authentic nature because they were talking about very different things, use the word delightful. And when you’re talking about, you know, heavy duty technology, you know, building on timely data flow, and, you know, streaming sequel and all of the crazy stuff we talked about, delightful is not a word that you would expect. And it gave me so much respect for the way that they think about the people using the technology that they’re building, and how they’re keeping that at the forefront. You know, even in the face of, you know, some really heavy duty technology that’s doing really cool stuff. And that to me was just a personal lesson and reminder about that being such a key ingredient of building something truly great.
Kostas Pardalis 34:27
Yeah, and something that I want to see from both parts of the conversation. And I think like Frank mentioned that, like, numerous times, is how many different people are needed with different skills to turn like, let’s say, a scientific paper on the end into something to produce that everyone can go when you I didn’t get value out of it. And I think that’s I mean, you know, like, when we think about that stuff, like, many times we hear, like on the news above, like scientific breakthroughs, and usually that’s like another, like sealed smoke, like in computer science that much. Yeah, hear about breakthroughs and people think that oh, okay, like this has been achieved and like, okay, not like, I don’t know, all of a sudden you are going to have infinite synergy you are like, we will only be like, you know, galaxies and stuff like that. But the truth is that like the point that somebody has been suitable the first time or like someone, something has been described or proposed, right? To get to the point where this can be used by like, everyone out there, like, takes a lot of human effort, like a lot. In and yeah, like, building a company. It’s exactly like bringing all these different people together. To do that, even marketing people.
Frank McSherry 36:06
They didn’t get it.
Eric Dodds 36:09
Yes, even marketing people. I couldn’t have said it better myself. No, I think you’re totally right. And I would say we got a full end to end picture of not only what it takes to get the technology itself to a place where end users can use it. But you know, a really good look at how you build a team that can actually do that work. So what a special conversation. We’ll, we’ll take the wheel from Brooks more often. On behalf of our listeners, and we will catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.