In this bonus episode, Eric and Kostas talk shop around the wide world of data.
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co
Eric Dodds 00:05
Welcome to the dataset show shop talk. I believe this is our third Shop Talk cosas if I’m not wrong, and I love this format, because so far, we actually have not done any pre work. And we each bring your question that the other person has had no time to think about, which makes really good conversation.
Kostas Pardalis 00:28
Yeah, it’s fun. I really enjoy it, too. And, yeah, we’ll continue doing it.
Eric Dodds 00:35
Oh, yeah. Okay. It’s my week. So here’s my question for you. What do you think it would take? In terms of data tooling? Let’s say, We’re data tooling, it’s probably not the best word. But in order to for there to be a world where Excel largely goes away Excel Google Sheets, because it’s like the most widely used data application in the world? Yeah,
Kostas Pardalis 01:05
I think. I don’t think nothing will ever go away. I think that’s like a, like, the, why we should do that. Why? I’d read the Excel, it’s such
Eric Dodds 01:17
a thing. I’m not saying we should. I’m just saying what do you think it would take? Like, what would that world look like? Yeah.
Kostas Pardalis 01:26
I don’t think that it has to do with tooling. To be honest, I think it has more to do with, let’s say, x access to technology, and like how easy it is like to use technology. The way that I think about it, it’s like, you can think of it like as a pyramid, right? And sort of like above Excel, like you probably have, like, Okay, people like debating is like Python, what we need for data, or is it SQL? And why we don’t do everything with SQL? And it’s, we also need to Python and I think that’s like a false, like, question. We shouldn’t be asking about, like, what is actually popping now there is that like, as you call it, like, you know, like these Bitnami different needs, remember, like, Moto whatever house call this thing? Like, think of it like, in a similar way, okay. And you have the base, you have Excel. Right? And that has to do with, like, how accessible Excel is? Yep, people out there, like pretty much everyone can use Excel, right? Like, you just need to know how to type in a computer to use it, right. And then on top of that, you have SQL, which is smaller, like, group of people that can use that. But still, like, a lot of people out there can use that. And then you have like a smaller group of people that they can use Python, right. Now, these people also have like, different needs around data. So and that’s like, what is important, like the person on the bottom with like, doing like, Excel stuff, they will never need like to go and doing things with Python in like, what are they going to do? Like, use Python as a calculator? Like, like, what’s the point?
Eric Dodds 03:19
People do more in Excel than just?
Kostas Pardalis 03:22
Yeah, okay. It’s an extreme example, whether I’m gay, yeah. Okay. But like, let’s say, you are doing like, I’m using Excel, for example, okay to do my budget for the month, right? Yep. Are you going to do that like with Python? You can, but are you going to do it? Like, I know, Python? Am I going to use Python? No. Why? Because like, why I wouldn’t do that. Like it’s not built for LogStash. Right. So what I’m trying to say here is like, there’s such like a big diversity of needs around data. Yep. I don’t think that like the whole population of planet Earth is going to turn into data engineers anytime soon. So I don’t see why they would need to use something like Python to do that. Excel spread for a couple days, we have like I don’t wait two three generations of people right now that they have been trained into these like, it’s almost like in intuition to use it. So don’t think about Excel as like the product itself. Think of Excel, the spreadsheet model of like interacting with data, which is like part of the way that we grow up now and like the way that we learn how like to deal with numbers and like how to do like things with data. And I don’t see any reason like this going away. Like it’s, it’s a great tool,
Eric Dodds 04:41
like totally cost us this hierarchy of data needs. Yes. Now,
Kostas Pardalis 04:46
I do think I like that. It’s
Eric Dodds 04:51
Excel SQL Python classes is hierarchy of data. That’s right. Now, okay, so you bring up an interesting distinction Yes, you’re totally right. Like, it is an unfair question, just like Python versus SQL is an unfair question. Right, it doesn’t it, it unnecessarily oversimplifies an issue and creates a comparison that actually doesn’t help answer. You know, it doesn’t reflect the reality of, you know, what’s happening out there on the ground. But I will say, the example you gave, I think, is actually interesting. So you gave the example of like, UNbuilding a budget, you know, like a basic budget spreadsheet, right? I do think that there is a high possibility that the complex use cases that spreadsheets are, and this is a this is getting into semantics, but that spreadsheets or Excel specifically are used for will be displaced. And I will caveat that by saying, I don’t know, if the
Kostas Pardalis 05:58
end, I don’t think the
Eric Dodds 05:59
interface for those more complex use cases will be replaced, necessarily. But I do think the entire infrastructure under the hood, will, in my opinion, likely be displaced. So like, I’ll give you an example. Okay, like a personal budget totally, like people even use spreadsheets for like, planning projects, or whatever, right? If I think about like, marketing, like, you know, but like my budget, right, for the marketing activities, right? I always start by modeling that out. In a basic spreadsheet, it’s really simple, right? It’s like you have 12 months, and you know, the line items, and all that sort of stuff. But once you start to get into more complex equations, and you start to like, involve additional different types of data, and you’re referencing across multiple tabs, and then you get into, you know, like, obviously, the lookups macros, like, you know, you can I mean, people literally build like, software in Excel, which is totally wild. I think some of those more advanced use cases. And I actually think I can’t remember the name of the company. But I think there are some companies that are literally just giving, like, provide a spreadsheet interface that sits on top of an actual database. Right. So, which is really interesting. So I do think that those use cases because, you know, it’s like the power user set, which is like in between, there’s like another layer. We’ll call it Eric’s layer in Costas is hierarchy of data needs. Excel, Eric’s layer, SQL, Python. Yeah. Because really, they’re like, under the hood, modern databases, and tooling, I mean, whatever even interfaces that can, like generate complex SQL, are becoming more and more common, right. And there are more and more patterns around that, which I think is super exciting. Because you can take an Excel Power User and essentially, give them a familiar interface on top of like, a wildly powerful, like, sort of potentially infinitely scalable infrastructure that has all sorts of different types of data. Right? And then you don’t have to worry about file sizes. I mean, like, I think that’s super interesting.
Kostas Pardalis 08:20
Oh, yeah. Like, don’t like me, I need to clarify some things. Yeah, when I’m talking about like Excel, Python SQL, I just consider them like, the API’s. Like the API that like a human interacts like with data, like Cloud Foundation, happens behind the scenes, the light blue, different story, right light, in the same way that’s gonna have, let’s say, spark, and you can use let’s say, Spark seek will but at the same time, you can also use PI Spark, in your light pi spark pandas, like, the the processing engine behind like, the same day, like the data, logical maxes are the same, but like the API that you have, interacts like different. Exactly, because like, the people involved are different. And like, the interfaces that they have learned, and they are, like, more intuitive, and like better for their use cases are different. Right? So yeah, like you can have behind Excel, I don’t know, like, a supercomputer running for like, whatever. Right? Yeah. But what is important, is the interface and like how, and the mental model that people will use to conceptualize like, the data for each one of these, like three different interfaces. So they got just like as an interface, like the rest of like, yeah, I totally agree with you. Like, we can see, I don’t know, like, precede some type of like Snowflake or something like that.
Eric Dodds 09:49
Yeah. Yeah, super interesting. No, that doesn’t surprise me but is really helpful, like the mental model of thinking about those is actually just API’s with a different interface on top. What? Okay, another?
Kostas Pardalis 10:06
Eric Dodds 10:07
So like, I think Google Sheets is obviously like a fairly pervasive spreadsheet interface. Right? Tons and tons of people use it. And I don’t have the numbers. But this is shot talk. So we don’t have to actually be accurate. But I’d be shocked if actual Excel, Microsoft Excel, as packaged software that runs on your hard drive, not in a browser, surely outstrips Google Sheets usage by a massive margin?
Kostas Pardalis 10:43
would be my assumption.
Eric Dodds 10:47
Do you think that? Well, I’d actually this is interesting to think about, I was thinking about your budget. So when you think about Google Sheets, and having like cloud compute power behind the spreadsheet, got it sounded so buzzworthy, cloud compute power your spreadsheet with the power of cloud computing? Am I Am I a product?
Kostas Pardalis 11:13
Guy? Yeah. Like, I I’m waiting to, for the moment that you’re going to use the term hyper scalar, you have an Ah, man
Eric Dodds 11:23
multinode horizontal scaling. And you imagine Google Sheets, but with multinode, horizontal?
Kostas Pardalis 11:28
Oh, that would be Oh, good. Okay, but So
Eric Dodds 11:35
one interesting thing observed the budget example, right, is that, if you take the paradigm, if you basically adopt a paradigm of like BigQuery ml, that runs on BigQuery, that enables non data scientists to do very data scientist, you type things, right? Using simple SQL or whatever. Like, it’s not a huge step to actually think about that same model being applied to a spreadsheet, right? Where when, if you have a standardized, something that you’re trying to do in a spreadsheet, like a budget, or you know, something of that nature, like you could run, like you could conceivably, like, think about a spreadsheet that can like, essentially use machine learning to help you do your tasks or whatever, right? You know, like, optimize your budget, right? Like you have a template in your spreadsheet, and machine learning can actually help you optimize your budget. That’s kind of frightening to think about Google having access to all that data. But do you think that something of that nature, where, like, machine learning type, I don’t even know if assistance is the right word, but like machine learning enabled spreadsheet usage, could drive, like a lot of the offline, packaged, you know, software running on your hard drive online in order to access that type of
Kostas Pardalis 13:01
thing? I mean, I always had to embrace downloads, like Wall Street prompts from spreadsheets. So no, no, no, no, seriously, like, I think, like, the amount like, let’s say, modeling and processing that you can do like almost presents like crazy. I mean, okay, we say the watermelon, we think that like a mindless image recognition or something like that, but like, no, like, 90% of like a million use cases. They’re like, statistical, yes. And for our models that I mean, the final sales sector is doing outside like, for instance, like forever, right. And they are doing them Bigeye in in Excel, like Excel is the ease, like, a very expressive system like that. There’s no difference between like what you can do at the end between like SQL, Excel and Python. Okay, like they are equivalent. And so like, with one of them, you can’t do something more than the other like, the questions like how easy it is to do it or like how well it works with the rest of your tools that you have,
Eric Dodds 14:05
right, whether it’s capable of hyperscale?
Kostas Pardalis 14:08
Of course, of course. That’s what I’m trying to save here is that, yeah, we can see that, like, we can see, let’s see, and probably if you go to the App Store, for Google Sheets, there might be like tools that’s yeah, they optimize your budget. I don’t know. Like maybe, right. I think what is important here is that we need to understand deeply, why we end up like having different interfaces. And what are the needs of the people behind each one of these places. And that’s what will guide us in like building, let’s say, the rights tooling, or like, come up with the right opportunities. For business and like all that stuff, because, yeah, like if you asked me, Do you think it’s possible, like, use Google Sheets as an interface to go and do like a mold training? Maybe it is. But like, why, like, you would be crazy to try and build that stuff? Because no one who’s actually building and training models will ever care about that. Right? Yep. And the opposites, like, can I come up like with the Python library that does budgeting? For my household? Yeah. But like, I don’t know, do you want to go to your father, like, give him like a Python library to install with peep to go in, like, bothered what to buy from Costco next week? I don’t think so. I’m gonna try.
Eric Dodds 15:48
Right, just thinking about just thinking about sitting down to work on the budget with my wife, and I’m like, It’s kept in soften. Right? And really, like, we just need to, like, acknowledge together that we need more milk. No, I love it.
Kostas Pardalis 16:11
Yeah, like, why not? Error. I think these interfaces like, like, seriously not, I think they’re like a very interesting window into light, the needs of the people behind them. And like, humanity, let’s say like, has matured in the half like to have love, like creating clear boundaries between like, different groups of people based on the needs that they have. And that’s where like, opportunities are for productization. Right? It’s like if someone wants to build a business, like figure out like a product of what, like, where the opportunities go, or like, figure out what is missing from there and build it.
Eric Dodds 16:51
I agree. All right. If anyone listening to this has a great idea. Based on this, then we want at least a sliver of the equity since we helped encourage
Kostas Pardalis 17:03
it. Yeah. And please like is you mentioned the hierarchy of data meets royalty in reference to data slideshow. Okay.
Eric Dodds 17:12
Yes, royalties cost it used to work that into his budget.
Kostas Pardalis 17:15
Yeah. Let’s move on by rollin Come on, because.
Eric Dodds 17:20
All right. Well, thank you for joining us on shop talk. We’ll have more good banter for you coming up in future episodes. Catch you on the next one.