In this bonus episode, Eric and Kostas talk shop around the wide world of databases.
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.co
Eric Dodds 00:05
Welcome to the dataset shows Shop Talk where Costas and I talk shop, and just one of our favorite things to do. And also, we should tell everyone cost us. If there’s a topic you want us to discuss, send us an email, and we will discuss it on an upcoming shoptalk episode. And we will send you a dataset show coffee mug and T shirt. So please email us, Eric at datasets show.com cosas. dataset show.com or Brooks the data stack show.com. You’ll probably get a faster response if you email Brooks, but please send us topics you want us to. And we’ll tackle them on shop Jack. Okay. Yeah. Or the counselor itself on Twitter. It’s a one. Oh, yes. True. Yep. For sure. Okay, cost us. Here’s, here’s my question for this week. And this is, this is me as the less technical co host of the show asking you as the more technical co hosts. One thing that is really interesting to me is it seems like there are a lot of new databases being created, like different types of databases, right. I mean, there are lots of like, there are lots of database types out there. Right. And one of my questions is, maybe flavors is a different is a better word to describe it, right. I mean, fundamentally, databases like, you know, have a lot of similarities. But one of my questions is, why is that right? Like building a database system that can be wildly successful, seems like a ridiculously hard undertaking, especially with so many incumbents? And yeah, it’s just interesting. Like, I don’t know, as if I was going to pick a problem to solve. Like, I don’t know if building a new database system would be it just because it seems like there are so many established really good options out there. Let me
Kostas Pardalis 02:09
make sure like, either sounds, the question. The question is about why we have so many different levels, or why the SOAP wants
02:18
to build a database? Okay.
Kostas Pardalis 02:21
Or both? I mean, I got answered, but I don’t like something you like.
Eric Dodds 02:25
Part of my question is, why do Pete, people keep trying to invent new kinds of databases? That’s really more of a question. I’m,
Kostas Pardalis 02:36
I’m not so sure that they try to do that. Like, well, these slides the latest flavor or database that’s you have seen out there that you didn’t know about it.
Eric Dodds 02:48
I was just thinking back on. What was the queen, right, like the graph database?
Kostas Pardalis 02:54
Yeah, well, that was more of like a processing system. It wasn’t exactly a database. And utilizing would say it was adding, like a graph layer on top of a key value store, which already existed, like the server database technicalities that
03:10
you want to store and
Kostas Pardalis 03:12
they put like a real time graph processing system on top of that, right. So that’s, that’s a little bit different. But
Eric Dodds 03:22
how about fireable like? Well, I mean, it is, it’s like a hard fork of ClickHouse.
Kostas Pardalis 03:31
But that’s, that’s really once in the sense, that’s, like, they have added a lot of stuff on the boards to make it Firebolt. So like, five builds is not exactly like ClickHouse. But let me okay,
Eric Dodds 03:45
let me let that make sense. I know, I’m probably my question probably, like, reveals a lot of my technical ignorance, but Oh, no, no, no, no, no, I
Kostas Pardalis 03:54
think it’s a reveals the houses that like be obscurity around like database systems. And why, like, database systems are? How sort of like, there’s like a veil.
04:13
I mean, mystery in Yeah. Yeah.
Kostas Pardalis 04:17
Which I think also has to do with like, how hard it’s supposed to be like to build one, right? But okay, let’s, let’s take these like from, like, from some of the beginnings, data base systems are primarily, let’s say, categorized based on the workloads that they serve best. Okay. And the workload is a document main definitions of like workloads, but it’s mainly what kind of data we are working with, and what kind of processing we want to do on that data. Right. So having a dashboard is something like serving the dashboards. It’s like something like fundamentally is different to doing real time queries on streaming data, right?
05:08
So,
Kostas Pardalis 05:10
okay, fundamentally, all these systems are like database systems in the sense that they operates over like a set of data, they expose, let’s say, an interface where the user can ask a question and process the data and get an answer, right? Obviously, you got, like, they can go, let’s say, you can take,
05:33
let’s say Postgres. Okay. And you
Kostas Pardalis 05:36
can use Postgres to do to use it as a transactional database, you can use it to run analytical queries. You can use it for time series data.
05:48
Maybe you can also use it like with streaming data. Okay. But there’s tons of trade offs. I mean, yeah, like,
Kostas Pardalis 05:57
mainly, how much you can scale? And how much you can cover the use cases for each one of these. Right? So yeah, like, we’re at the point where we need to start, like, specialized. And so we suddenly have like,
06:09
time series databases. Right? Now we have
Kostas Pardalis 06:14
OLAP systems, which is like data warehouses, and then we have data lakes. And then we have graph databases and divine stores and in memory systems. So yeah, we have like different flavors, because we need to specialize in order like to maximize, let’s see what how well, we can solve each one of these problems. And as we need like to do more and more on each one of these workloads, the more innovation we will see there. Having said that, yeah, data bases of like, I don’t know, like may be together with operating systems and compilers, like the three most complex systems to build. I mean, not that there’s a toy, but as a product, right? Probably closer to probably not have liked an operating system, to be honest. Like, it’s not like many commonalities in terms of lights, the difference components and style, like, at least like combiners are like very difficult like to build because there’s like a lot of weight spa boring stuff that you have to do there. But in terms of their architecture, I think they’re like a bit simpler, compared to something like I did on bases and more like an operating system. But data bases like serve many things, like with operating systems, like how they handle memory, how they found, like store routes, and like, how many different systems they need to coexist in order like to
07:38
operate. So I think
Kostas Pardalis 07:42
the fact that today, we have so many different, let’s say workloads and specialize in the needs to specialize in these workloads together with the fact that it’s really hard to build a database is Well, I think creates, let’s say this difficulty, like for people, like focus on why we need, like, all these different databases and why we keep like trying to
08:06
build new ones.
Kostas Pardalis 08:08
Yeah, that makes sense is like shelf.
Eric Dodds 08:12
Yeah, it makes total sense. I think that the
08:16
Yeah, it makes total sense. Yeah, it just do you think it do you see, like,
Eric Dodds 08:23
if a company has I mean, there’s there also seems like a lot of operational overhead right, which is probably why smaller companies just use like a very simple sort of standard set of databases Right. Like it seems like an individual company would use a wider variety as they have the scale and resources to manage that right because like having multiple different database you know, what, like a wide variety would introduce a lot of operational overhead right?
08:49
Yeah, absolutely. I think
Kostas Pardalis 08:52
like introducing David Frost infrastructure in general I get don’t think it’s just like the database that applies to but you like people should always do that when this they scale enough that they have the need to do otherwise you’re just adding like, too much complexity and you’re going to get hurt instead of like solving like a problem. You have to build like careful with that like always try it like in my opinions like always bridge like to try to be even and at the beginning even like being scrappy monkey trying like to go and you know, like buy the latest most sign me show lose some of their to gold and so like a problem that we can probably solve it with Excel. So
Eric Dodds 09:43
if you were going to build a database, what problem area would you focus on?
09:48
Oh, there is find a very interesting
Kostas Pardalis 09:54
topic in database systems that we start seeing more In the transactional databases, while I think we will see more and more of it also like analytical databases, which is
10:10
going completely serverless.
Kostas Pardalis 10:13
So this is like something like super, super interesting, from my point of view of like architecture and also the kind of experience that you can deliver with these systems. There are like some, like, there’s got to be, there are several lists, I was gonna mention CockroachDB Labrys.
10:32
planet scale,
Kostas Pardalis 10:34
probably like a couple of like neon, it’s a new one, but sort of like open source. There are like some very interesting developments there. They’re more around, like the transactional databases at this point.
10:50
But very interesting, both like products and companies to islands. I would. I mean, I’d love like to work on something like that.
Kostas Pardalis 11:01
It’s been fascinating and very challenging, like, from a technical perspective. Yeah.
Eric Dodds 11:05
Yeah. Super interesting. Okay, last question. I actually have no idea how long we’ve been talking for. But this is a super interesting topic. Okay, really hard to build. Like, okay, let’s say you’re gonna go build, you know, a serverless database. Really difficult. Right? Many difficult things that you mentioned about that. So, as I mean, this isn’t unique to databases, right? Like, when you think about new technology, there’s risk in adopting that technology? Because it’s like, well, I mean, if this doesn’t actually play out, right, you know, you have to make you have to basically, redo a ton of work, right. And so for databases in particular,
11:52
you know, that’s, like, if you think about,
Eric Dodds 11:57
like, let’s just take a standard, like, ETL pipeline versus a database, right, two pieces of data infrastructure is like, okay, Is it painful to like, you know, replace an ETL pipeline? Like, sure, that can be painful, right? Like, especially you have to build it or whatever, right. But a database is a much bigger deal, right? You know, because of all the things you would think like, you know, there’s a ton of data in there, you know, formatting implications there, right. I mean, generally, like, you know, critical business functions, run over, etc, etc. When do you think, a database or like a new database technology, sort of, like what are the signals to you that is, like, going to be around, right? Like, what? When would you like, invest in it? Like, what would make you comfortable in terms of investing? Is that like, does it need to be open source? Is it like, a certain level of a dog? I think,
Kostas Pardalis 12:54
that’s, that’s one of the reasons that like, pretty much every database system out there, like, one way or another, there is an open source
13:00
component to it. So
Kostas Pardalis 13:04
and I think we will keep like, super like, new database, like, for example, like, yeah, like they, I think they release the open source before they started, like offering some kind of hosted version. Yeah, it seems like
Eric Dodds 13:14
a common pattern. Yeah. Yeah. Like and I think it will continue to be like, but
Kostas Pardalis 13:20
the exactly for the reasons that you’re talking about, like this is off like a such a big like investment and an important component of every, like, technology out there. That’s okay. Yeah, you cannot gamble patents use something but it’s Iona like will stop like existing in a week from now. So yeah, I think like open source is you can’t
13:45
dance without outside of this. I
Kostas Pardalis 13:49
mean, I don’t know I think like the community is like an important things and obviously, like the company itself, that’s behind right.
Eric Dodds 13:56
Yeah. Yeah. Like if it’s like a like anything. excitability
Kostas Pardalis 14:00
takes time, like I don’t seeing that I don’t know, like, like, for example, like how long CockroachDB has been around, but like building and business around, they’re not these takes time. Yeah.
Eric Dodds 14:14
Yeah, I guess snowflakes like an outlier and that they’re not open source.
14:20
Yes, that’s true, which is pretty interesting.
Kostas Pardalis 14:24
But I did they, they are an outlier. And also, it’s a little bit difference. When we are talking about transactional databases, which you will use to build like your product on top of it and an analytical database.
14:38
Yeah. Where, okay,
Kostas Pardalis 14:41
I mean, you can always move late, the data to another place and using analytics, okay, you can survive without your dashboard like for the day right when
14:53
you die.
Eric Dodds 14:56
CockroachDB was 20 early 2015
15:05
So yes, they are. Yeah, yeah. It’s been a while. And they started like with an open source. And so, yeah, yeah, it’s a common bother. super interesting. All
Eric Dodds 15:16
right. Well, I got such a good education on database fundamental.
Kostas Pardalis 15:21
Oh, yeah. I’m happy to discuss more about that. Like, it’s. It’s a very interesting topic.
Eric Dodds 15:28
It is super interesting. All right. Well, thank you for joining us. For SAP talk. I hope you learned as much as I did, even if we covered things that our listeners already know. And we will catch you on the next one.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.