The PRQL: Who Needs a Stream Processing Engine?

November 7, 2022

In this bonus episode, Eric and Kostas preview their upcoming conversation with Zander Matheson of bytewax.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co

Transcription:

Eric Dodds 00:00
Hey welcome to The Data Stack Show prequel. We just recorded a show with Xander from bite wax bike wax is a super interesting technology. It’s stream processing within the Python ecosystem. One question I have for you, Costas, which we touched on a little bit in the show, but there are a lot of tools cropping up around stream processing, which he talks a little bit of on the show. How many companies really need this, though? Would like that’s something interesting, Jimmy, there are a lot of technologies popping up. But it seems like they’re primarily enterprise level use cases. What do you think? Is it going to trickle down? Yeah, I

Kostas Pardalis 00:48
think it will, you have to keep in mind that many times, like we see tech nodes are getting adopted by the enterprise, primarily, because the technology is not mature enough to be adopted by like, a broader audience, and then the right have the resources to maintain and make accessible to the whole organization, this technology, right? Like getting something like seeing and like sending the data out there and running needs and doing that consistently, like, blah, blah, blah, like all that started? It’s not easy, right? Yeah, I’m saying like, also, like with Apache Spark, sorry, Apache Kafka. That’s why you have conflict out there, right, and like, the hosting solution around that. So when it comes to like, de thigns, interim data infrastructure, it’s like, very natural to see, the enterprise’s being, let’s say the pioneers, because it sounds like they have the resources, and the needs because of like, the volume or like, whatever, to go and do things first. And I think, as we will see, in companies focusing more on the developer experience of things, we will see like a much broader adoption now is like, every shop out there is going to needs like a streaming processing engine? Probably not? I don’t know. But we will take, I think there are like use cases out there that are important, I think, anything that has to do with the male use cases where I mean, when you want to actually use the malware, like create the features and like sort of like recommendations, like all that stuff, as well, like streaming is like super important. So I think as a male and AI get like more and more, let’s say be adopted, together without little shields, like streaming, becoming like more and more important, together with like, okay, the rest of the technologies that we have there for like boss processing, and more like static data processing?

Eric Dodds 02:49
Yeah, I agree. I think the other thing, I love this we’re getting, I love it when we get into predictions, because it’s very dangerous territory. That’s a good so but I think the other thing, Xander gave a really interesting example of pulling in logs from a web server. And, you know, processing them for some sort of use case, I can’t remember the exact use case, right, like building like specialization or something like that, which is interesting. I think that as those use cases, and the related technology become packaged, like adoption will go up, right? Because part of the challenge now is that even though the individual components are accessible, right, so for example, like there’s great CDC technology out there, right, let’s get logs. Okay, great. Like you have the logs, right? Can you process those logs in a streaming format? Okay, like, you have even a bite wax, you know, in order to do that, right? Yeah, the stuff downstream in bite wax. But even with modern tooling, it actually is still a lot of work, even though like individually, each of those things have gotten easier. Like it’s still hard to consume an entire use case, right? But imagine if you could just literally hook your logs up to an end to end pipeline. And it’s like, well, you get specialization at the end. Right? So I think is that I think as the ecosystem evolves, and more of those use cases are available out of the box, like adoption will go up as well. Because like, you may not need stream processing for everything, for example, like, we don’t necessarily need, like real time recording on certain things. RudderStack right. But it is really nice if you can do it. Right. And if it’s easy, then why not? You don’t have to wait on batch jobs and all that sort of stuff. So anyways, it’ll be really interesting to see how the ecosystem evolves. Great show with Xander BI wax is super cool. So check out the repo. Subscribe if you haven’t, and we’ll catch you on the next one.