The PRQL: The Shortcomings of Apache Kafka with David Yaffe and Johnny Graettinger of Estuary

November 6, 2023

In this bonus episode, Eric and Kostas preview their upcoming conversation with David Yaffe and Johnny Graettinger of Estuary.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show PRQL. This is a short bonus episode where we preview the upcoming show, you’ll get to meet our guests and hear about the topics we’re going to cover. If they’re interesting to you, you can catch the full length show when it drops on Wednesday. That’s this week’s recording is with Johnny and Dave from Estuary. And I think this is going to be a really fun conversation. It’s a topic that we’ve actually covered quite a bit on the show, which is streaming. You know, in particular, real time streaming. But this is really in the context of, I think, what you use streaming for, and we really dig into sort of the Kafka side of the conversation, which we haven’t covered in depth a ton. But part of the SRA story is really reacting to real time streaming needs, evaluating Kafka and seeing some pretty severe shortcomings, which is why they built estuary. Now, what’s really interesting to me is, in many ways, they don’t talk about SQL as a streaming service. You know, they kind of talk about it almost real time. ETL, which is fascinating. There’s some open source technology under the hood. And this is really, I think, going to be an interesting conversation, because streaming is obviously a hot topic. And there are multiple technologies, so really interested to see what the SRA team has built.

Kostas Pardalis 01:34
Yeah, 100% it was like a very fascinating, like conversation, actually, for many different reasons. First of all, it was like, pretty technical, and only, like, in terms of talking about HR itself. Actually, we had a very deep dive into Kafka. How Kafka is built, and some of the issues there that actually Estuary is addressing, like seeing the perspective of the architecture of the system, like, for example. We were talking about how compute and storage in Kafka is like, very tied together, and how this has been, like, changed with using something SRE and like, what does this mean in terms of like managing the system and like, what type of like use cases it’s enabled. So we did like a very interesting architectural conversation around like this type of system. So anyone who is interesting, like to understand, like better how Kafka and like this type of streaming systems are like working, definitely, like should listen to that. And then we talk a lot about also some important concepts like CDC rights and why CDC is important, how we use it, and how they implemented it, because the standard out there. But the folk like artistry actually, like implemented everything like from scratch. And they have like some really good reasons why they did that. And they are talking like through these things, so amazing. People, both John and Dave, like, very deep expertise in this type of technology. And we have an amazing conversation ranging from the technical side of things up to the business side of things. So I think everyone should like listen to them, and hopefully we’re going to have them again in the future because I don’t think one hour was enough to go through like all the different topics when it comes to streaming.

Eric Dodds 03:36
All right, that’s a wrap for the prequel, the full length episode will drop Wednesday morning. Subscribe now so you don’t miss it.