Shop Talk: Kostas Settles the Real-Time vs. Streaming Debate

November 18, 2022

In this bonus episode, Eric and Kostas talk shop around the topic of streaming.

Notes:

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show shop DOT Costas, we have talked with people who built amazing data technology companies like Netflix, Uber, and LinkedIn. But you and I actually don’t record our talks about data very much. But we actually talk about data together a ton. And so Brooks had this amazing idea of just recording some of the conversations that you and I have before and after the show about data and our opinions on it. And really, this has been my favorite things that we do. So welcome to shop talk. It is where Costas and I share opinions and thoughts on a personal level about what we’re seeing in the data space. And it really is simple, we ask one another a question, and the other one tries to answer it. So without further ado, here is shop doc. Welcome to The Data Stack Show Shop Talk where Costas and I talk shop about all things data, and probably share too much of our personal opinions cost us I believe that it’s my turn this time.

Kostas Pardalis 01:09
It is you can talk to me as much as you want to pay the person you might vote, I’m all yours. Okay,

Eric Dodds 01:18
I saw an interesting company launch. This was on Hacker News, recently. And they call themselves a real time API data connector, basically, like I guess it’s like sub they say, like, sub minute, you know, sort of, you never basically have more than, like, sub minute drift in between data sources. And I mean, I looked at and it kind of just looks like a, it looks like it takes this is just sort of theory very high level, I haven’t actually read the docs or anything like that. But just based on the architectural diagram on their marketing site, which we all know, is the definitive source of truth for every product architecture. Yeah, again, it seems like, you know, almost real time API’s are almost like, a streaming ETL, if you will, right, like, what you would traditionally load in batch, you’re now loading, you know, essentially in real time with submit and read and see some sort of data store downstream. Anyways, this got me thinking, and I’m really interested to know, your thoughts on so there are multiple sort of streaming technologies in this sort of vein, right, you know, sort of, you know, you have like, materialized right, which is sort of, you know, streaming sequel you have, you know, a number of other technologies. And one thing I’m interested to know, is, do you think that these will get super wide market adoption? Or is, you know, say sub minute, latency really only a problem for like, a certain subset of companies? For actually, I have a third, I have a third flavor. Okay. Do you think that it will become so cost effective that it just doesn’t matter? Right, like, why not stream in real time if you can, because it’s cheap, like, right now, part of the challenge is, like, the infrastructure to do that at skills, you know, tends to be like, pretty hefty.

Kostas Pardalis 03:21
I mean, but I don’t know, I, I would disagree a little bit without like, I don’t think it’s that expensive to, like, start working with this system, right. Like, this came up and down. Yeah, like, if you have a lot of data, obviously, it’s going to be more expensive. But if there’s more data, it means that like, there is a reason you have more data, hopefully it represents also like what your business is generating what it’s about, like your growth, right? Like

Eric Dodds 03:52
yeah, I guess maybe another way to say it would be like cos has multiple vectors, right? Like, even if you just set up a set it and forget it, like 24 hour job that you never look at, unless it feels versus like managing Kinesis like those are very different.

Kostas Pardalis 04:09
Like, yeah, I mean, okay, I don’t know exactly like how we like works. Like from a quick look at the websites like

Eric Dodds 04:19
Korea moving Lake is the tool. Yeah, I didn’t even mention that.

Kostas Pardalis 04:23
Yeah, it’s just Okay, first of all, we are talking about like a very early product, which means that like they’re still trying to figure out like the product market fit probably right. Yeah. So we need to keep that in our minds when we’re

Eric Dodds 04:41
going to be batch ETL traditional batch ETL and like three weeks when they put it

Kostas Pardalis 04:45
I mean, yeah, I don’t know like I eat. For example, like if you go to the connectors like and select like the Bank of America collectability calm you will see that like the nd these are these are poles like three of them. But they mentioned here are one of them is real time. The other two are not right. Yeah. And that’s because obviously, like Bank of America does not offer like a Lightsey Docker accounts and sub accounts, but it does overlap transactions. So that’s what you’re gonna get like in real life, right? I mean, there’s like the run like a, let’s say, inherent limitations with what, like the systems worldwide to expose in real time and whatnot, right? Yep. The weather they seem like, from what I see here, yeah, obviously, like, you can like pull data from, like, you’re part of America. API accounts like to they did on base. Right. But I don’t think I got them to be generating that much data. Unless you are I don’t normally. I mean, how many transactions a day you can have, like, bank accounts, right? Usually that 1000s Like maybe mediums, let’s. So why I’m saying that they’re going well, I’m trying to say that real time, like we need when we are like approaching these questions like to start always like from the use case, like what are we trying to achieve with these systems? And how we are going to be using data from the systems? Yeah. Right. Like, yeah, like, if I need like the transactions to consolidate, let’s see all the transactions for my p&l like at the end of the month, or whatever. Do I need to do that like real time? Probably not. Do I need like the transactions to, like, create a notification to have like a, I don’t know, like a salesperson can do something. Like as soon as possible. Yeah. Is this as soon as possible? Sub music horns? No, we’re still working with humans. Like, they’re not doing like to Yap like that for us. Right. Do we use this transactions like to do HFT lights? High Frequency Trading? Oh, yeah. Like then. But then again, like we’re looking to build up a completely different type of systems. Right? Yeah. That’s a Yeah. Yep. So realtime, like traditional. Like, streaming is one thing. Real Time is another thing. Okay. Yeah. Straining like great distinction. Yeah, cool, sober pool. And like all that stuff. Like, it’s like, provide, like different ergonomics around like working with your data, right? Real thing has to do with latency, like how fast you will have like to react to any piece of information. Right. Let’s say you have a system that scans the sky for inbounds nuclear warheads from the enemy. You probably want like to react pretty fast, right? And you want to guarantee that right? Slow? It’s going to be fast. You don’t want like, one time to be fast. Another time. We like being a little slower. Yeah. Right. So it heavily depends like on, like, what are you trying to do with the data that you have? Right? And what are like the notifications or like the real time dashboards that you’re going to build a quiz like control that? So my question to you as like a marketeer, which like one of like, the very standard, like go to market strategies, where income like to date, I was like, oh, Monica, do we needs like real time data that he needs? I don’t know, like sub minutes, live answers and stuff like that. Is it true? Like, what do you think like, what’s marketeers needs when it comes to data?

Eric Dodds 08:34
Well, I would start out by saying, I think that the I do think some of these technologies are really compelling, because, you know, from a marketing perspective, or even like a product perspective, you know, you could do the analytic, you’ve been able to do like the analytics thing, like real time analytics, say, for quite some time, right? I mean, real time web analytics, or real time product Analytics, you can sort of, like, there are really great products out there that, that do that. But also, as it’s becoming easier to get more data, you know, together and sort of, to basically compute interesting things with separate sets of data. Some of the like, infrastructure that actually allows you to compute some of the stuff, you know, saying near real time, so that you get, instead of just observing like a user behavior, and then seeing that in a dashboard, you know, sort of direct lines, like product analytics through whatever pipeline, you know, you’re actually doing some sort of compute along the way that includes additional data, you know, which is really compelling, right? Because then you get a lot more insight downstream, even if it’s in lakes. Let’s just say it’s in the same dashboard that you’re looking at, right? Do you have some sort of compute along the way, so data is very cool. Pelling, because the amount of context and fidelity that you can get is way, way higher, potentially, still pretty, pretty hard to do actually, like, technologically, you know? Or I mean, it’s not like the patterns aren’t a mystery. But it’s also like a lot of pieces that you have to put together and run. And, you know, so I would say, I agree with you that it really depends on the use cases, right? So let’s take an example of, like, a situation in the real world where real time, you know, or near real time or whatever, actually, we should probably discuss, like the definition of real time, because actually, it’s sort of at the root of the issue. Well, let’s say you have some sort of app that, you know, like a ride sharing app or whatever, some sort of like transportation thing, whether weather can be a really big influence on that, right. So if you think about, like, customer acquisition from a marketing standpoint, you know, or app activation, right, like we want to increase usage, or get people beyond their first ride, or first interaction, or whatever that is, weather can be a big driver for that, right? So rain is coming, you know, go ahead and book your ride or schedule it or like, whatever that is, right, you know, from the app. So from that standpoint, you actually need to like, pull a bunch of data and run a bunch of computes, and like, in a pretty quick manner, like, send out a message to certain users in a certain location, you know, to sort of try to get them to take some sort of action, right? So you certainly need a ton of infrastructure in order to do that level of, you know, kind of, let’s call it like, creating like a personalized experience based on a site level of context on those particular users particular situation in a particular location. That also includes a lot of context around like their individual usage of your, you know, service or whatever, those things. Sure. I would argue, though, that, you know, the companies who will truly benefit from that level of detail, and that level of infrastructure tend to be like, really large companies with really large user bases, right? Like, that’s not very common.

Kostas Pardalis 12:27
Yeah. Yeah, I agree with you. Well, I would like to ask, like, especially because like, we started talking about this, because I wasn’t moving like moving via Yeah. And, please, like, it’s not like, I’m not trying to say anything bad about them. Like, let’s just make this clear. But,

Eric Dodds 12:50
sure. All your news comments already did that for you. So there’s probably nothing you can.

Kostas Pardalis 12:55
Yeah, like, to be honest, like I have, like shoots or respects like, for someone who’s trying like to build something like this to date, it like takes like a little holiday that likes, it’s not exactly like an easy to finitary rate model. It does get like, there are many solutions out there, right? Like, I have, like huge respect, like for people, but I’m trying to do that. And usually what happens is the late you neglect other sound like a little bit crowd, like you start a company, right? Then it’s how you start with building a product. Like, you have like an idea. Right? Like, you know where you want to be, but at the same point, like you meet at the same time, sorry, like, you need to differentiate enough. So you don’t have a starting point. Yep. Okay. So, yes, you throw something out there, like you try like to create, like a new way of, let’s say, solving a problem. And that’s like, let’s say the conversation started with the market. Does not the streets here, it’s like a conversation started like, yesterday, like we are solving this problem. Is it important for you? Go Come here like that’s how we solve it. Maybe it’s not like the right way to do maybe DS, we don’t know. But sometimes you need to start and that’s what we see like car even like combating like Moonglade and again, huge respect for what they are doing. Because this is like the ugly part of like, building the company where like, everyone can easily have an opinion. Right? I can very easily say well, you like this is going to fail. Yeah, obviously. Like, it’s easy to say at this point the LEC is doing right. But that’s not the point here. It’s not like you’re trying to what you’re trying to do is like start like a dialogue with the market until you figure out like what’s the real opportunity and and and calling Zack and Google Trinity in the market that you have chosen right in these cases like so that’s how we should see these things. And yeah, I’m always going to be like real time is it going to be bots? He’s going to me something else. Maybe it’s more rights, I will say, I think it is going to be like very, very interesting now that like, kind of like a first impression of like, what like is like to reverse that like in six months on another discussion and see like where the product is in like six months from now and seeing, like, try life to understand what happens in between rights, like that’s generated the changes that hopefully we’re going to observe.

Eric Dodds 15:26
We really should do that. And we can like replay clips of this conversation. Yeah. Yeah, you know, and then they’ve raised like a huge amount of money and are super successful. And then

Kostas Pardalis 15:35
they got an old job. Guys have nothing to do that’s like little No, maybe who knows, Mike might find like in June.

Eric Dodds 15:43
Yeah. I’m not moving enough real time transactions from my bank of america account the Adi to to write big

Kostas Pardalis 15:53
checks. Doesn’t have to be a big chick.

Eric Dodds 16:00
Fil A chick, right? Like that is true. That is true. I will say for one, I’m, I’m excited. Like I’m bullish on real time stuff. I think as the experience gets better and better and more accessible. Like, a lot of times we didn’t even know my job is like, we don’t need to know stuff in real time. But it’s really nice to like, it’s really convenient. I think I can go wrong, ah, look at look at stuff too often, like looking at numbers too often can actually be unhealthy or distraction. But when you think about things like campaigns that you’re running, or product launches, or other things like that, it is kind of cool to see like, is their initial resonance? You know, just kind of neat. I don’t know. I’m excited. Yeah,

Kostas Pardalis 16:45
well, that would DOT some of these thoughts, like, moving data around in real time is not that hard. Like what is like much more complicated. And where it’s really like, gets hard to say it’s, let’s say, very sweet this and lays it’s like, well, you have to process the data. Yeah, in real time, like, if you want like to execute, like very complex queries, where you have, I don’t know, like, joins between 10s of tables. And I don’t know, like how many like aggregations, blah, blah, blah, like, like all these things. This, this is hot, right? It’s like, that’s where like, things start getting like, really hard. So yeah, like, moving the data around is one things, processing the data and making the necessary for someone like to consume is a completely different problem.

Eric Dodds 17:41
Because that’s really where all the values created, though, actually. Right. Like is in? Well, I mean, the bear trying to get some sort of insight that requires compute, like, that’s actually where most of the value is created when the data lands downstream.

Kostas Pardalis 17:55
Yeah, I mean, it’s like, you’re trying to build a service like that you model like Zapier routing, right? Where you want like to trigger something, when something happens, that’s one thing, right? You don’t need to do like any crazy kind of like processing, right? Like, it’s more about like, how many requests you can like how much data late means unit of time, like you can, you can process and turning the request. Now, if you want like to get the data and also do very complicated algorithms on the data now, that’s, that’s a different thing. That’s why we usually see like, in these like, the standard, like the Lambda architecture, you see, like you have like the bots, and like this three minutes, or realtime, part of the architecture, where like, most of the use cases are on the real time, more to do with like notifications. Because the notifications usually you don’t have like to go and process, like a lot of like different data do like a lot of robbing around data. It’s more about like, taking a look into the data and see like, oh, is this something like that? I need to act upon? Because like, the temperature is like higher than it should be, you know? Yep. Gap exaggerating a little bit simplifying things too much. But that’s where you see like the thing that I think for everyone who wants left to understand better, right, the distinction between like, the two parties like starting implementations of the Lambda architecture, and Tao Y Combinator. Yeah. Did that. And for what reason? I think it’s an excellent, like starting point.

Eric Dodds 19:29
I totally agree. Totally agree. All right. Well, Brookstone is we’re at the buzzer. I could talk about this for a long time. But we have so many more shop talks to dig in. And next time, it’s your turn. So I can’t wait to see what you ask. You know, cost us we learned so much from the data leaders that we talked to, but I learned so much from picking your brain and actually your questions really make me think really hard. So I appreciate shoptalk. I think it makes me a sharper think

Kostas Pardalis 20:00
Well, it’s it’s fun. Like, I think it’s good to exhaust Seaton Shutterbugs the stuff that we experience. And yeah, I think like, I felt like people enjoy it. That’s why I’ll keep asking for people to reach out, please do is da pork, like, you can’t do that, like, send an email. Yeah, let us know how you feel and like, what are your opinions of like, your experience with the show? So please do that. So, me and Derek, we can keep a copy. Of course.

Eric Dodds 20:38
And of course, we try to take the same types of questions to, you know, data leaders from all sorts of companies, large and small. So definitely subscribe to the main show, if you haven’t yet. tons of really good episodes there. And tons of really good thoughts from data leaders, you know, really around the world. So definitely subscribe if you haven’t, and I will catch you on the next shop talk