Episode 156:

Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data

September 20, 2023

This week on The Data Stack Show, Eric and Kostas chat with Alex Gallego, the Founder & CEO of Redpanda Data. During the episode, the group discusses the importance of Kafka in data infrastructure and the lack of competition in the market. Alex shares his background in streaming systems and the development of Redpanda. The conversation also includes the challenges of operationalizing streaming systems and the evolution of storage systems, using WebAssembly as the interface for writing functions, the key differences between Redpanda and other streaming platforms, where the name Redpanda came from, and more.

Notes:

Highlights from this week’s conversation include:

Alex’s background in the data space and the creation of Redpanda (4:23)
The cost and complexity of streaming (11:07)
The evolution of storage with Kafka (12:04)
The distinction between streaming technologies (15:10)
Simplicity as a Core Design Principle (27:03)
Cost Efficiency in a Cloud Native Era (30:44)
Removing complexity with Redpanda (34:21)
Migrations and compatibility with Redpanda (40:35)
The Future of Redpanda (43:44)
The Story Behind Redpanda (46:45)
Final thoughts and takeaways (50:25)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show. Kostas, we get to talk about a topic actually, no pun intended, that I don’t know if we’ve dug deeply into on the show, and that’s Kafka. And specifically, we’re going to talk with Alex from Redpanda. And I’ll tell you, what’s interesting to me about Redpanda is that as widely used as Kafka is QCon confluent is really one of the only sort of major successful commercialization of Kafka. But red pain is doing some really cool stuff. So I think we should get a one on one on Kafka, because we haven’t covered a ton of depth on the show. And then hearing about what makes Redpanda unique is sort of a way to manage Kafka. So that’s what’s interesting to me.

Kostas Pardalis 01:23
Yeah, 100%. I mean, as you said, I don’t think we have, we had like a specific episode in the past about like Kafka. We had a few about stream processing, although that’s like, a little bit different because Kafka is not necessarily that much about processing, but more about, like, resilience, transport of data at any scale. Which makes it like a very important components in many, like, pieces of like, many, like infrastructures out there, like from different companies. And you’re right, like we haven’t seen outside of confluence, and okay, like the big providers, like the cloud providers with like systems, like in essays to actually build something to go after, like this particular market. So, having Redpanda out there, I think it’s like, extremely interesting. And there are like a couple of things to talk about. Why do that, right? Why not do that? Why other people don’t do that, right? Why we don’t see like more competition in the space, and see what it means like to build something like Kafka after like, I think Kafka was like builds, like the beginning of 2000 dance or something like that. So what 10 years of innovation in technology, gives us tools like to go and build something similar. And yeah, like, see how it is different, what kind of like, different tooling it gives us compared like to Kafka. And what it means like to go after this market, like and build a company there, I think this is going to be like, super, super interesting. It’s, it’s a very hard technology to get right and a very important one to get right and fail, like with this data. So let’s go and see what Alex has to say.

Eric Dodds 03:15
Let’s do it. And unfortunately, I actually just had something come up. So I’ll let you kick the call off, and then I’ll join if I can. If not, I’ll try to come back.

Kostas Pardalis 03:25
You don’t have to let me enjoy the conversation.

Eric Dodds 03:30
All right, I’ll see if I can make it back by the end. Okay,

Kostas Pardalis 03:33
let’s do it. Hello, everyone, to another episode of The Data Stack Show IAM courses. And as you probably already learned by now, when I do the introduction, it means that I’m going to be alone on the show, unfortunately. But we have an amazing guest today. So hopefully, we will be going to compensate for missing Eric, with shame. So we have Alex Gallagher. He’s the founder and CEO of Redpanda. And we’re going to talk about extremely exciting things today that have to do with Kafka, actinomyces, Redpanda, and technology around that. So welcome, Alex, nice to have you here

Alex Gallego 04:21
at CES. Thanks for having me. Good to be here.

Kostas Pardalis 04:23
So let’s start with a quick personal introduction. Let’s tell you a little bit about yourself. Because you’re not just like a CEO and the co-founder, you also wrote a lot of the code behind Redpanda. And these systems tend to be quite complex. So that would be awesome. Like to hear about your backgrounds and your journey building Redpanda.

Alex Gallego 04:46
Thanks for asking. So I guess by means of introduction, I’ve been working in streaming for almost 14 years, which is mind blowing, that you could work on a single problem for so long and it still finds So much richness. I mean, I could probably build two or three systems following up if I was working on Redpanda. So yeah, you know, anyways, I was I went to school for largely was trying to focus on cryptography, I ended up dropping out of a bunch of programs, graduate programs, and I went to start building distributed systems because I just found them a little bit more fun than breaking things. This was early on in my career. I went to work for an ad tech in New York, where I first started working with, you know, the first couple of versions of, I guess, ZooKeeper and 2010 2011, somewhere around there, and Kestrel, and then Kafka, and so on. And so, yeah, so my journey started really early on, I ended up testing storm. And then that kind of those ideas led to me writing the code for the first startup that I sold, aka mine, it was called Concorde, I also kind of authored the original first part of that engine. And then we wanted to build an R, a small company around it. And so Concorde was cool. It was a computer platform. That was different than, you know, frankly, most streaming platforms today. And so it was really like a container based single threaded C++ execution engine, it was really more like a quasi envoy, you know, the C++ proxy with language runtimes on top. And so it was pretty cool. We sold that company to Akamai in 2016. And so Redpanda came out of this deeply technical background, where, you know, as an engineer, I just couldn’t understand where performance was going. And if you were looking at a couple of computers, like I don’t understand where the latency is coming from. And so the first ideas for Redpanda came about in 2017, where I took two Edge Computers and aggregated them back to back with the single OSPF cable. I know routers, no switches, just two computers and a cable. And I was just like, I just want to measure what’s the gap between hardware and state of the art software? And then I went to when I wrote something in C++, you know, it’s really mostly the core ideas that were there. It’s like, well, what is actually both in throughput and latency? I gave a talk about 2017. In my mind, I was like, you know, there’s, a couple of companies are working on this, they’ll figure this out. They’re really bright. And they didn’t work on those ideas. And so I spent the next few years just trying to understand why, you know, basically, as an engineer, there’s no magic, right? If the job is to save data on disk, and then the job is to save data on disk, there’s no two ways about it, right? So there’s that essential complexity. But when you look deep down, you learn that if you take a new approach, sort of design for the modern hardware, you could get this categorical performance improvement. And so with that came this whole design possibilities, I was like, hey, if I were to start from scratch, what would I do differently? You know, what choices like how would I think about architecting, the next generation of streaming, what would it mean for the engineer and so eventually, that gave the birth of red pen, you know, the company M product, which is kind of fun, because he was first and he started getting us we’ll talk about the the naming later in the show. That’s Redpanda. That’s how he started and that was those were the roots of the technology.

Kostas Pardalis 08:13
All right, so I have a couple of like historical questions to be honest, because I have someone here who has been through the evolution of streaming systems, but also being part of this evolution of streaming systems. So I remember back then and like, let’s say the beginning of 2010 . Maybe a little bit like Erdogan that there was a kind of explosion of systems that came out from places like Twitter with like Storm. We had some czar we had Kafka obviously that became like, like dominated for a while. My question is all the systems that appeared back then okay, like most of them disappeared, right like Lee didn’t make it let’s say at the end outside of like probably Yeah, like okay confident make made it like to, to the public markets. But we didn’t see like, more happening there. Even with some czar, right, like, not sounds sorry, Flink. Now we see again, like some, like the market getting interested again, like into that. What is the reason in your opinion that like in the streaming space, this happened, like, out of all these like products, we didn’t end up with more like successes.

Alex Gallego 09:28
The Achilles heel of streaming has always been cost and complexity. And for those of us in the room that had to trace, you know, like, gosh, in 2011, I was reading the Scala blown dad, Scala and Java deploying down on a closure runtime, which was storm and when like the Nimbus worker decided to stop you’re like, you get a stack trace that is the equivalent of like, 0x that beef, right? You’re just like, I have no idea what this means. And you end up you know, debugging the transitive closures and like if you ended up even You know, important in some of the more sophisticated JVM libraries back in the day like algebra, which you know, thankful to Twitter for publishing some cool stuff. It was just gnarly, really, there weren’t just many. And here’s the thing about computers in particular, I think is storage. We should separate computers and storage differently. And so when I think about Compute, you know, I think about my previous company, Concord, I think about Apache Storm. I think about Flink, I think about new approaches today, like byte wax and materialize and you know, and so on, it writes in waves, there’s like, there’s a huge host decodable as a hosted source, there’s, so compute its own layer, and then storage at its own layer. And I think what was meaningful back, then, is that most of us didn’t have the scale of Twitter. But we were growing super fast. And so we took what they were doing, because he was in the open, and you can just clone it. And now it’s both a blessing and a curse, a curse later when you have to debug it. But blessing because you could get started quickly, right? That was a blueprint. And then you could do large scale things. What most people didn’t realize is that the cost of operationalizing was inordinately expensive. It’s kind of like the promise of the Hadoop world that, you know, materialists were like four companies in the world now. But you know, it was really hard to actually extract value out of these things, it just became computationally expensive, and like manpower expensive, and three people in the company knew and once they left, you’re like, I have no idea how this part of the system works, right. And so costs and complexity have always been the Achilles heel of streaming. And so two things have happened. One, I think managed services, like red on the cloud, and you don’t call them cloud and MSK, etc, has made onboarding some of these technologies easier, not necessarily simply, I’m gonna talk about, you know, easy is different than simple, right? Complexity is a different metric, from ease of use. And then, you know, on the storage side, just to give us a glimpse of that, when I you know, when you first started in you started messaging. When you look at actually the history and evolution of storage with Tip QCon, and solos, it originated with you showing up to the data center, and those of us that had to wire data centers in like whatever Secaucus New Jersey or something like that, and it was like miserably freaking cold, when you showed up to the data center, and you had to wire these things. It was awful. And, you know, kind of fun, because you spent like six days in the freezing cold. But anyways, you would get these computers, and then you would rack them physically. Yeah. And then people would charge you money, right? Like vendors would charge you money for the number of TCP connections. And that just sort of didn’t scale well, with the way modern, you know, I guess now web 2.0 applications like Twitter. And so I think the pain points that Kafka solved. And I’ll talk about the other evolution of the big ideas over the last decade is that they took off the shelf, cheap computers with the spinning disk, and then make the software a little bit more intelligent, so that you could just work with it by like skilling at the time, and then that’s when like most people started really adopting Amazon, you know, silly, a janky experience early on. Like, now, the clouds are so sophisticated for those of us that had to debug, like networking issues back in Amazon, and date, and it was, you know, that then you would just scale by adding cheap computers and invade building and shipping products, super easy. And so, to me, that was the key idea early on streaming is that you had a blueprint that you could copy. And potentially, you could work yourself into a success if you’ve managed to hire really talented engineers. And that was really promising, right? I mean, if you were, we were a young ad tech company in New York competing with Google, and we want for the record, he won, we won, like, I think New York Times, Forbes, Reuters, MSNBC, etc, for a while on mobile traffic. So we’re like, Hey, we’re winning. And to us, we like, you know, what are we onboard this complexity, but we were making money and so that I think that was the Keystone idea back then is like, hey, we can take these ideas. And the software components, and, you know, basically be successful at larger scales, and then people and so it was no longer this research paper or like real systems. And for me, that was a huge source of inspiration into building out to companies in the streaming space. And do you know, probably the next five or 10? I mean, who knows? But I still find it super exciting.

Kostas Pardalis 14:33
200% Okay, I have one more question about streaming systems. So we didn’t like to talk about streaming platforms, and it’s like an umbrella term for all of them, but there are some differences between them, right, like, in my head. Like in my mind. I can’t really compare Kafka to Flink, right? It’s like some fundamental differences like I instinctively think of Flink, whenever I want to do, like some heavy streaming processing with, like a complex state that I want to have guarantees around, like pretty much like what I would do with a SQL query on the data warehouse. But I want to do it like on a stream of data, right? Well, when I think about Kafka, I think more about topics and data and guarantees around this data. And making sure that like, the data is not going to get lost and being able to accommodate, like throughput and latency requirements, right. But I never think like, okay, out of the box, I’ll take a golf cart and start doing some crazy stateful processing on top of it, right? Does this make sense as a distinction between the streaming technologies out there or not?

Alex Gallego 15:55
Yeah, I agree. I was trying to allude to that in my previous answer when I think about Flink as a computer and Kafka as a storage. And so if you think on a storage front, and the reason for this is that streaming overall, is really the idea that you take a little bit of storage, a little bit of compute, and then you sort of chain it together. And at the end, you have something useful, like Uber, or DoorDash, or fraud detection for a bank, or oil and gas pipeline, you know, anomaly detection, or IoT relic, but it is the combinated chaining of combining compute and storage. In most streaming systems, you need both unless it’s something like x. Actually, in all, let me put it, even for the simplest things. The reason why I think computers are a little bit more challenging for, you know, vendors, etc, is that with Compute, you could do anything, right, like you set up a cron python script. And like, you know, maybe the supervision is you get paged when the python script crashes in, but whatever, like, maybe you accept that risk as an engineer as a business, because you have two customers, and they’re paying you $3 a month, and you’re like, well, whatever, you know, I’m not going to pay for the additional complexity. And over time, I think people just tend to graduate to more sophisticated computer platforms like a Flink, you know, base platform. And so now on the storage side, that’s kind of the core thing, you know, and it’s, you can’t really trade off like if, as an engineer, if I sent data to a storage, and did you expect to retrieve back the data, like, first I look at the highest level, this is really how engineers think about it. If you store my data, then I’m going to send you data, and then I’m going to query it back. And so on this storage platform, which is where Redpandas sits today, we borrowed the modeling mechanics of the Kafka API, which for those listening in, you can think of the Kafka API as a topic as an unordered collection of items. And a topic is broken down into totally ordered partitions. And so it’s an unordered of collections of totally ordered sub collections, you can think about, it’s like a map of lists, if you’re thinking in data structures, you know, an algorithm it’s like, and you either consume you typically consume from the head or tail, depending on your mental model, and then you can truncate it, right. And so that’s generally the Kafka model. And that proves to be just enough to be really useful for, you know, data engineers or systems engineers trying to build higher levels, but you need both at all times, you need some form of computer, even if it’s an in house, Python script, you know, supervised by by cron, or I guess now, you know, the cool kids are doing Amazon Lambda or whatever, like that you need that layer somehow, because you need to do something with the data. And you also need the storage now to give you semantics around transactionality or around, you know, safety guarantees, not losing your data or about throughput or latency, right. And those, you can’t really just build it incrementally, right? It is, like most people today, you could, most people don’t go and build the database and then build the business, right? You sort of buy a Postgres or you buy a Redpanda, or you buy a Snowflake, or just people buying storage engines, more. And so that’s where we’re pandas, and hopefully this makes sense. For everyone listening in.

Kostas Pardalis 19:21
Yeah, it makes a lot of sense. So getting back to Redpanda. Redpanda is closer to the storage or the processing, or it’s equally, let’s say both.

Alex Gallego 19:31
Great question. If you had asked me that question, yesterday or a couple of days ago, I would have answered that differently. Let me tell you why. By the time people listen to this, we would have announced our seed will receive funding but so prior to this conversation, you know, we were strictly on the storage side. And largely I still think this is where the largest value that we provide to customers, right? If you think of Redpanda, you can think of it like a drop in replacement For Kafka, but you know, a car is a car. But if you step into it, we’re like stepping into an electric vehicle, right? Like with ludicrous speed mode. So just give an analogy for the people in the room. But, you know, we’re just announcing this idea of keeping the simple thing simple with web assembly. So largely, we’re still a storage engine, but we’re starting to expose some computer things. And the reason is, if you’re a data engineer, you know that the majority of your time is spent doing non sexy things. You take a JSON object and you make it Avro, you take Avro and you make it protobuf, or you take, you know, an endpoint and then you enrich it with the IP address. Is this fraudulent? This is not, that’s just where the bulk of the data pipelines are. And it’s just kind of what it is. And so with WebAssembly, you can now add this storage engine level. And so it’s not designed to compete with the flinx, or the stateful, sort of higher level order databases that are super sophisticated, multi way merges, like you were mentioning, is really designed to be complimentary from a mental model of the engineer building a data pipeline. And so if they have this one way function, convert JSON to protobuf or enrich a JSON object with an IP address or take it an object and give it a chatty PT score doesn’t matter. Like those kinds of simple things. One shot at transforms. Our web assembly engineer is really good at that. And so we just announced, invested in the hotkey to a ton of money, which is going to be fun to see how that matures. And so largely to answer your question specifically, yes, we’re mostly a storage engine, I would just like to expose a little bit of the computer. And then we’ll talk about how I think Apache Iceberg for the data engineers is like, a way of trying to continue to simplify the architecture.

Kostas Pardalis 21:47
Okay, that’s super cool. First of all, congrats on the rounds. I mean, it’s kind of amazing to be able to raise like a growth round right now, where everyone says that, let’s take books or clothes for these rounds. So that’s, I think, like, this is a lot about the growth of the company and what you’re doing there and like the impact that the company has. And also congrats, like on building these new capabilities on the startups engine. And one quick question you mentioned, like WebAssembly, why WebAssembly? What’s the reason for exposing WebAssembly as a way to interface with writing these functions?

Alex Gallego 22:26
Yeah. The web is awesome. I know, to some of the engineers listening to this, they feel like web assembly is like self-driving cars, it’s always coming. And you’re like, Well, what is actually going to come? You know, is it a decade or two years? And in part, I feel a little bit guilty of being people who have been pushing WebAssembly for a while, since 2020, right? We’re like one of the first storage engines, and then we inspired other companies to go on building web assembly and so on. I know, because the engineers worked on those features, do I? It’s okay, how did you do it? And like, you know, this is cool. Let’s chat. So why WebAssembly first of all, multi tenancy, isolation, etc. Like when you start to expose some of the internal mechanics to programmers, there is a person that will write a for loop that is an infinite for loop, and it’ll just take down your cluster, it’s just a matter of time. That’s what engineers, we were much better at, I think, got braking systems and buildings. Now I’m just going to wait, we’re also going to build it. But the point is, if you expose an API to a programmer, they’ll just find a way to break the system. It’s just what programmers do. I hear like, because for fun, you know, you’re like, Oh, well, what happens if I do this? I just had a habit, you discover a product, you have no idea. And so you test it, and then you take down, you know, an entire system is like how many of us, you know, blocked the entire codebase when we were all using Perforce 15 years ago, and you go away for the weekend, and there’s a hot patch and you get called. And so anyway, going back to, in theory, allows people to write in their favorite language. So we don’t have to say you have to write in rust or go or JavaScript or C++, which is our storage engine. You could write in your favorite programming language as long as it transpires to this intermediary representation. We can execute it safely. And so as an engineer, you now get exposed. I guess Redpanda becomes more like a Transformers, you know, optimist Combiner Wars where you have a little robot and then you have different pieces and now you have a bigger robot that you know, is finding Mechatronics or whatever. That’s the idea behind what biosimilars can teach this storage engine new capabilities, domain specific capabilities. An example of domain specific capabilities is GDPR compliance, they may strip your social security number, right before you write it on disk or right before right after you read it from disk. Or maybe let me teach Redpanda data placement guarantees and so if you have a global cluster Right, can do that the programmer that has infinite business context is definitely larger than our team. Can you write a data placement so data doesn’t leave Germany or later data doesn’t leave Paris or data doesn’t live in New York, it doesn’t matter, right. So those kinds of exposing the business constraints, that’s why it wasn’t and so one allows people to write in their favorite language. And two, it allows us to sort of give the developers this Transformers like capability where you just add domain context on to it. In practice, really, we’re going to launch first with go, we tried to launch with JavaScript in the past, and that had some, you know, adoption, but not the kind of adoption that I was that I was hoping for, I think, go strikes a good balance between ease of use from a developer perspective and like reasonable good performance when compiled to WebAssembly. And so those are the practical limitations that we’ve been working on. So we’ve tested almost every available web assembly engine in the world by now. So there’s, that’s why web assembly, it’s, you know, I think we have a lot of excitement pent up.

Kostas Pardalis 26:08
What’s exciting, I love to play around with it, to be honest. Alright, so we already mentioned, like, some of like, the capabilities of like Redpanda, like bringing some of the table like, compared to, like, the older generation of like, streaming platforms like Kafka, right? But again, let’s say WebAssembly is like the new shiny toy, you started, like doing things differently from the inception of Redpanda exactly, because you show the limitations that existed in these products. Tell us a little bit more about that. Like, if you had like to summarize, let’s say, foundation in a foundational, like, context, what are like the three, four, let’s say, very different things that Redpanda does, compared to the other systems that deliver the same service, the same value, but in a much better way.

Alex Gallego 27:03
Going back to the electric car, I think electric cars delivered their fair value, they basically made the hypercars. Obsolete, I lucked out, zero to 60 no longer makes any sense as a selling point, because electric cars are so fast. But I’ll tell you, the ideas that we focused on that are different from their focus on their platform, it’s not that they couldn’t technically do it, or it’s just a different set of decisions early on, and have had this huge ramification in terms of what the final product looks like, right. And so the sucker bottom, there are three core pillars on the product, all of them, the overall umbrella, the way I think about building companies and so on is if we make the engineer hands on the keyboard behind a terminal, the hero of our story will be a massive financial success. And so my job has always been to obsess maniacally over, like, is the engineer actually successful, I know that I can get a CIO to sign a check, a large check, if we make there, you know, product engineers, actually super successful with the platform. So that’s always been my obsession, as a founder and as an engineer. And also because you know, sort of how I grew up, technically speaking. And so there are three core tenets: one was simplicity. And the analogy I like to use is, you know, use all the time, but is the Apple experience, you serve expect your air pods to work with your iPhone to connect with your tablet, and passwords to be shared across all of their Wi Fi devices. And so that’s just the natural expectation. And so, to me, the last mile in systems as a systems engineer, which is what I’ve been my whole life, was really the human experience. And so the first core design principle is, can we make the best developer experience? Can we make this super easy, compare that, you know, or contrast that with existing, you know, other like competitors, where to just print a Hello World, you need something like, you know, a Data Broker like Kafka ZooKeeper record service is schema registry and HTTP proxies, you need four separate systems, just to print hello world, and I was like, that’s insane. Like, I’ve worked on systems that are much easier to use, and probably have the same capabilities. And so for me, it was like if we could deliver the user experience in a single file, so that the mental model for the operators to put it on 123 computers, and you’re done, that’s the deployment model. That’d be a huge win. And probably the reason why people have adopted us the most right like it is like that’s just one example. And we have like a huge portfolio of that kind of example, which is if I don’t want to use it, then we just simply not build it like I will block product releases unless I want to use it. In fact, I time our product releases, it’s like my time to allow for the console experiences to be 60 seconds and the Kubernetes is 130 seconds. And like I could go through an entire portfolio and the job is to wow the engineer within seconds of them. touching the product and service. So as the first one simplicity to his performance, and, you know, the analogy to electric cars is the zero to 60. Right, but for us is being able to take a working example is we took, we just took a company from 400, physical computers of the same type 240 of the same computers, it’s just because, you know, we could do more with less fallston, that was the only change is they turn off, you know, basically 10x or more computers. And so that’s really what performance is. And performance is really the sum of all your bad decisions . like your performance, you know, you just think of latency as the sum of all your bad decisions. And so there isn’t one trick, there’s a book of tricks. One is pre allocating the memory using a track record architecture, you know, using different interfaces, thinking about memory allocation, and pooling and like, you know, ownership semantics, blah, blah, we could talk about that for a really long time. But the second one was performance. And the impact is about 10x. Less computer. And then the last one was expensive. In the context of a cloud native era, can we leverage S3, or Google Cloud bucket or Azure Blob storage to be the true disaggregation of compute and storage? And so if you could deliver something that’s easy to use? That’s fast and relatively economical, then? Like, why wouldn’t you build your application? In the streaming world, right? They’ve just never come across anyone that is like, oh, I want my reports to be at midnight. It’s too fast that that never happens, right? It’s really mostly a historical context of how this technology has been difficult to use and expensive. Yeah, hopefully, it gives you a sense, Oh, 100%.

Kostas Pardalis 31:39
That’s like, like, Let’s push on like we can, I have a feeling like we got a lot, but like an episode for Jeff, like, each one of the things was, like you mentioned there, but I want to go back, like through simplicity, and focus, like a little bit more on that. And the reason is, because that’s like, also something that I have experienced with systems like this, right? Like, back in 2014, when we were building Blendo, we used Kafka. To be honest, like for our use case, like the performance and the cost of DOT points, like we’re not that important. But we get like, deeply about, like, some specific guarantees that were coming with a system like this, and like some capabilities that say, We’re gonna deal with, like, delivering to us. So we decided to go with it. And to be fair, like, anything that has to do with the guards themselves, like they were delivered, right? Like, it was great that like, we want us to have that like, especially for shots like multi MLOps. We had, but obviously, like, the whole experience was like far bye from like, being simple, right? And when I say, in my mind, like when we’re talking about developer experience, and simplicity, it’s like, a very multifaceted, kind of like definition, right? Like you have simplicity in terms of like, what does it mean for like a new developer who is onboarding the team to go and like, build an environment where they can work, right, and replicate that work? Then there’s the simplicity of like, operating this thing. Like you have your salaries there, like this thing, like leaves, like, just because it’s fault tolerant. It doesn’t mean that it’s on autopilot, right? Like someone has to be there, like babysitting these things. And then, that’s like, the parts that, like I would like to talk more about with you, is the architectural simplicity, right? Like it’s all these different components that you need to have in place just to see on your logs that this thing is running. Right. Like you mentioned, you need the schema registry, you need the brokers you need, like, the ZooKeeper, ZooKeeper. Yep. So let’s focus a little bit on that. Especially like, okay, like, ZooKeeper is not exactly like, let’s say where, like, I’m sure, like many people have nightmares, like operating ZooKeeper. But regardless of what it does, right, which is great. Like, it’s not like an easy system to build, like, in any case, right? And an important piece. But how do you remove all that complexity, like with Redpanda, like, let’s say I download the binary? How do I get the things like, each one of these components given to me when I use Kafka?

Alex Gallego 34:21
Because this is such a huge topic. I’m just summarizing my words here. I would say you can. Here’s the thing about complexity, you can’t eliminate complexity, you can only shift it around. And I can either make it your problem, or I can make it our problem, by and large, from a company philosophy from a company standpoint. Us the storage team, and I largely think of Redpanda has been a really sophisticated data storage engine. We are the experts in the trade offs and understand a lot of the nuances is around, you know, like, whatever, where this lock contention or CPU contention or like, you know, or whatever memory, contention, all of these details that manifests in different ways, you know, at a high level, which is why you always over provisioned, right? So, here’s the thing about complexity because you can’t eliminate it, you have to make a choice. Is it either your problem or my problem? And by and large, we’ve said, it is Redpandas problems, and it is our job to make it easy. And so, you know, a big part of why we adopted the Kafka protocol for context is we knew we could make a system fast. Oh, that was, you know, sort of the company’s DNA. We’ve been writing C++ for whatever, 15 years before we started the company, I guess, now 20 or so? And, yeah, and so we could make it fast. We could make all of these things. But the API, right, is a huge ecosystem of existing applications. And if I showed up to one of our customers, let’s pick Akamai, right? And they’re like, Hey, we have this cool new technology, how about you throw away a $2 billion revenue product, they just walked me out the door? Like, that doesn’t make any sense. And so being compatible was part of that simplicity. But to answer your question directly. When we first started working on this, and author, a lot of the original code, I tried other products that are actually other approaches. Right? So I tried, you know, first, I took the FlatBuffers compiler, I extended it with a couple of types. It’s a very like Apache arrow like format, it was our own thing Gator catered to our super low latency with like, basically, you could assume Little Endian CPU and do a pointer cast and like, have a byte array later on. You could do microsecond level latencies with a bunch of these things, and nobody wanted to use it. And it’s like, okay, this is great. And this is really fast. And people you know, they didn’t understand what space but latency spectrum Do you see? You know, it was sort of like a much lower level thing. It was the same thing with the replication protocol. I first started with chain replication, and then you know, then you have to figure Okay, who watches the chain like, you know, what watchman watches the watchmen kind of thing. And so you end up designing a system that looks a lot like a consensus protocol, aka Paxos. And so, you know, then we looked at raft as the protocol implementations, like, Okay, we could reason about these things. And you sort of start to look at all of these ideas. But fundamentally, we’re sticking to product standards and saying, it is our problem. And it is not your problem. So that when you go on installing Redpanda, you don’t have 1000 steps, like the idea you’re at. And a big lesson learned that I took from my time at Akamai was they had very small teams running massively large deployments, right over like, but over half a million computers around the world. And so how is that possible with delivery, no one reads 1000 steps, like you write code to run your code to deploy code, I just kind of how it works. And it’s more mainstream now than it was, you know, maybe, whatever, seven years ago, you know, or five years when I started the company. And so that was the core. And so we avoided that complexity. We boarded a bunch of the things we on boarded our own consensus protocol, we based it off of a raft. You know, we decided to avoid things like the leadership election and the bootstrapping methodology and the cluster. We onboard our own Kubernetes operator. So we tend to onboard the complexity ourselves, so that we don’t give you the complexity as 1000 Euro ISTEP that you have to follow. And if you miss one, then you just have data corruption like that, that idea doesn’t make any sense to me. Yeah.

Kostas Pardalis 38:25
100% 100%, that makes like football terms. Alright, so we talked about like, the, the simplicity, I have a question about, like, the, the technology and the experience around like the technology. And the reason that I want to ask that is because one of the things that I find fascinating with these kinds of systems is like the diversity of people that have to interact with it being like, the middleware in the way, right, where you have your applications have the right data on it, and then you have downstream applications that might be owned by COBOL different things. You have data engineers, or they like to read the data of their to just store again, somewhere else, right. But that’s like creating a very interesting ecosystem inside the company that has to interact with, like this technology. And that complicates things a lot. Because a systems engineer is a different kind of beast compared to an application engineer, or like deadlines in the like, or an SRE. Like everyone speaks a slightly different language, right? And like, it’s slightly different. And my question is, and I would like to ask you that like, we’ve like a concrete example, actually, let’s say I’m a company that I have invested right in like having Kafka inside like my, my system, obviously, like, all these different people are like working with Kafka. That’s what’s one We’re another they have to figure out how to do, how is it for the organization to be like, Okay, now we’re going to take Kafka out into the red pond out there. And I hear you about the compatibility of API compatibility. And I, I get that like 100% of like, like, we got to like the only way that you can do that with such complicated, like infrastructure type of products. But give us like, a little bit more color of like, how does this translate for each one of these different personas that we have inside engineering? Yeah,

Alex Gallego 40:35
so ciphers, the answer to the last demands, and they’re in reverse, because it’s easier. Their migrations are relatively straightforward. And it really depends on how people use Kafka. And so typically, let’s take the example of a financial firm. And I say that because we have a ton of financial firms. And so that way they’ll do it is that they’ll put the two systems and then they’ll run it from 8am to 5pm, or 4pm. Whenever the market closes, the next day, Redpanda has the last day’s worth of data, and they just continue running on Redpanda, right? If you have a stateful migrations, we support mirror maker to, to mirror maker to all of these tools, literally Redpanda looks exactly like a Kafka protocol, Nick, no one could tell the difference. And to date, as a company, we haven’t had a single customer that’s touched their applications to interact with Redpanda. So that means years and years of code, they just simply pointed at Redpanda and went your way, you know, I used to go and cause early on with the product. And I said, Hey, you can change the container from this from whatever you’re using, and then just plug in Redpanda and see if it works. So in fact, the our test container module, for those of you that use test containers, you just can’t you know, from Kafka contest container to a Redpanda test container, and like your entire JVM application just continues to work. It’s just faster, right? And so that compatibility was super strong, and something I take really seriously. If we onboard a customer and they see an issue, they’re like, Okay, it’s an issue with the product. It’s not an issue with you. It’s an issue with us. And we’ll work really hard to make sure that we fix it up right away. And so that’s the migration now, in terms of sharing, you talk about a really challenging thing, which is the governance of the streaming data, who has access? Like how do you interact with the 52 personas? And if you’re a bank, you have to be an ML and AI engineer and you also have your production engineers that are dealing with compliance and regulators across 52 countries. And then there’s GDPR, and data locality compliance. And so it’s just such a gnarly and rich problem. So let me give you just the Hollywood highlights, so that you can build on primitives rather than specific answers. And when I can, I’ll just give you examples on by and large, what adopting the Kafka access control lists, right, so the default Ackles allows people is you can sync an out of band policy mechanism. So let’s say Octa or whatever Active Directory or whatever it is, and we also integrate with Kerberos. Right. And so you can have a centralized place of identity for both users and applications. And so as the system to system communication that is really complicated, it’s no longer, you know, Costas is going to make a query on this and maybe tail the logs and using Kafka to see like, you know, its price dropping, is when you start to connect multiple systems, and like each system has potentially a different security, you know, boundary, and so on. And so the way most people do it today is you have some sort of centralized system, and that’ll sink, eventually, the lowest level of primitive is an alcohol and the alcohol protects people from reading, writing, Korean meta data, and so on. And so your applications are there. Now, from an API perspective, if you use any of the Kafka API DAG continues to work. And let me say one quick thing about the future, which is fundamentally different from every other streaming engine, so it builds on the richness of the previous answer, which is, that’s not enough. And the reason is, it doesn’t meet the developer where they are. And it is my job as a company builder to it’s like, well, you know, not everyone has gone through the pain points or sophistication of truly understanding how to get value out of streaming data. So let me meet you where you are, which is you’re using Redpanda to take data from your API endpoints into some form of database. I was like, let me do that really well. And the way we’re going to do it really well is we’re going to integrate with Apache Iceberg as an in storage format on S3. So that today, you can read Snowflake, and tomorrow Databricks. And next day, whatever is you know, as soon as your whether it’s, you know, ClickHouse or Dr. B or, you know, whatever this is, you know, a really a large set of choices that the developer has on acquiring the data. And so the way we meet those developers where they are today, is in the tiered storage. This is something that we just announced to So literally has anyone been on any other podcast today. So if you’re listening to this, you’re the first person that’s ever listened to that, for me, is the future of our tiered storage format is going to be Apache Iceberg. So that you can go from a stream to SQL, but not our SQL, your favorite sequel, and your favorite sequel today. It could be snowing tomorrow, it could be Databricks, and the next day it could be whatever. And so hopefully that gives you an answer of when you’re interacting in a rich ecosystem. You just have so many stakeholders that largely it could be ML engineers , AI could be like the same department, but it could be your CIO looking at dashboards, real time dashboards and so on. Does that make sense?

Kostas Pardalis 45:39
Absolutely. And it’s great to hear about the integration with Iceberg like a big pain for like a run anyone who has ever felt like to be delighting near an environment with streaming data, at scale, ending up on a Delta Lake, they know how hard this is. So being able to have, like good integrations with these table formats and being able to Okay, not have to worry, every time you are on call that the pipeline will break and go back there and like to redo everything for the money. Reports. Like I think that’s gonna be a huge value. I think it’s for the people who are mounting, like the data infrastructure. Alright, we’re getting closer to the end. And I like to use more questions. First of all, my, I can’t close these shows without asking about the knee, right? Like you mentioned something at the beginning about like Redpanda, but give us like the story behind it’s how you ended up like Ceph, like a extremely cute animal

Alex Gallego 46:45
is like a family thing. Okay. So when I started the project, I was living in Miami. And you know, I had moved from New York, I lived there for a very long time. And I was in Miami, and I just built it right? Like I didn’t envision it being recommended to become what it was there. But I wanted the product to be accepted. And when the only engineers were free, and you open up your laptop, and you just code it, right? Like, you don’t need to ask anyone for permission, you just could write the code. So I did. And then I sent it to a bunch of friends. And this was at the time where, you know, the Uber for x, or the app for X was super popular. And you know, all of your family members were emailing us like, Hey, can you help me build the company? Like I have this idea? And you’re like, Yeah, you know, not really. And so I think we were all tired of getting emails from friends on the names. And so I sent a survey to a bunch of friends and 80% of them. So I embedded Redpanda, as an Easter egg. And I have a bunch of nerd names in between, like, obviously vectorized, which became the first name of the company, and you know, a bunch of I can’t remember anyways. And so I added Redpanda, because I thought it should, like, no one’s gonna feel this thing. And you know, what aren’t so I still send that anyways, most people responded, and 80% of them chose Redpanda. So that became the project name. And of course, in my head, I didn’t listen, I just liked it, so I named the company differently. We started the company, as vectorized. But, you know, the red pond that took this own thing. And my partner at the time, she helped me chase a bunch of design firms around the world from like, you know, Europe and South America in the US. And like four or five firms are working on this really cute eight bit inspired mascot that looks like Mario Bros year, though I like it just that’s how I envisioned that. And so that mascot took off, people loved it. And at some point, we just had to rebrand the company, Redpanda. No one knew what vectorization was. But everyone knew what Redpanda was. And the mascot was just so cute. It was impossible to not like it. So we just had to name it. And here we are. It’s just It took over the company.

Kostas Pardalis 48:51
All right, that’s awesome. That’s an amazing story of like, the power of like, symbols in general and like language as part of like building a brand. Alright, so we’re here at the buzzer as Eric uses his house. So one last thing, I know that you’re like, making some very bold claims around like performance, especially compared to Kafka, but you’re also like one of these people that don’t have to, like make the claims or like willing to be tested on that. Right. So prove that. Can you tell us a little bit more because I’ve seen on LinkedIn, like some messages that have been circulated, like how this can be done.

Alex Gallego 49:39
Yeah. So first of all, for those listening in, if you’re using confluent, email me and I’ll cut your bill in half or I’ll give you money just kind of the bottom line of that in the bottom line up front. It’s usually easier. The TLDR is that you know our main competitor launch and attack on some, like personal blog posts, and I was like, I know what 20 cost to run does it cost you $90,000, it is impossible for you to run this as at least you have the urge of putting it on your main website so that we could talk about it in public. And so as like, you know, up until that point we were released, you know that we would never say that. And so I was like, Okay, well, if you’re going to spend $90,000, I want to tell all of your customers that if you come to me, I’ll cut your bill in half, or I’ll give you money. So I stand behind that claim. Anyone that comes to me, you know, we can post the link of the campaign at the bottom of the podcast notes if people want to check it out. But yeah, super excited to be compatible with all your Kafka workloads. And thanks for having me. It’s been a fun show.

Kostas Pardalis 50:40
That’s awesome. Thank you so much, Alex, and we’re really looking forward to hosting you again on the show in the future.

Alex Gallego 50:45
All right, Kostas. Thanks for having me.

Eric Dodds 50:48
Okay, cast this. I didn’t make it back in time to hear the entire recording. But looking at the internal chat, it seems like at a minimum, they had some huge news about a fundraiser. Which is super exciting. But tell me what you learned.

Kostas Pardalis 51:04
Yeah, first of all, Eric, like, we have to say that even the best couples need somebody’s stones sometimes, you know, well, then we’re gonna be late. Listen to Blair. So I think I mean,

Eric Dodds 51:17
distance makes the heart grow fonder. Do you miss me? Ah,

Kostas Pardalis 51:23
Della, like me , enjoyed talking with Alex like, without you. I didn’t miss you. Yeah, like, obviously, you always give a very unique dimension to the conversation, the conversations that we have, that’s why we’re like, the two of us there. But it was fun to talk with Alex, for sure. He’s a very deeply technical person obsessed with performance, which I think is also a reason may seem like such a good fit to go after this problem. Because it is, like a very critical system, that you need to have very strong guarantees when it comes to both performance and resilience. Right? Like, that’s why Kafka is also such an important system. And it was fascinating, like to chat with him. I experienced his passion for what he’s building. And it pays off, right, like they announced their next round of funding, like they raised like 100 million, like, okay, nothing the best market out there for fundraising at this stage. Which means that they are doing something right. And it seems also that there were good reasons why we didn’t have more competition in space in the past, like 10 years. But now, it seems like there is time for that. So I would recommend to our audience to tune in and listen to Alex talk about these technologies, how they were built, why they were built the way they were built, why we need the new paradigm today. What Redpanda can like offer convert like to, to a system like Kafka, and some very cool new technologies like was, like WebAssembly that they are using and how they are incorporating these new like, infrastructure, like parodying is to really create like a unique new experience and a unique new product that it’s much better, let’s say, addressing the needs of today compared like to other systems. So I would suggest to everyone I like to tune in, it’s going to be that he’s a bright person, very smart, obviously, with very deep knowledge that he’s sharing all these episodes. And, yeah, it will be fun, like for everyone like division to eat.

Eric Dodds 53:47
Awesome. Well, I am so glad you learned about Kafka and Redpanda. I am so disappointed. I missed it, but I’ll be back next. Yeah, you will. All right, well, subscribe if you haven’t told a friend. And we have great episodes coming up. So stay tuned. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 156:

Simple, Performant, Cost-effective Data Streaming with Alex Gallego of Redpanda Data

September 20, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter