Episode 135:

Database Knob Tuning and AI with Andy Pavlo and Dana Van Aken of OtterTune

April 19, 2023

This week on The Data Stack Show, Eric and Kostas chat with Andy Pavlo and Dana Van Aken, the CEO and CTO at OtterTune. During the episode, the group discusses the challenges of tuning database systems, the role of machine learning in automating the process, the importance of workload capture, the benefits of using automated recommendations for index optimization and overall system health, the challenges of managing configuration knobs in database systems and more.

Notes:

Highlights from this week’s conversation include:

Origins of OtterTune (4:43)
The problem of knob tuning (6:25)
Roles of machine learning (9:32)
OtterTune’s development and industry recognition (12:03)
The challenges of database tuning and the role of human expertise (16:15)
Tuning in production (20:23)
Observability and Data Collection (23:37)
Data Security and Privacy (29:59)
Optimizing on-prem vs. cloud workloads (35:52)
Performance benchmarks (40:20)
Future opportunities OtterTune is focusing on (43:55)
Importance of automated tuning services (50:45)
Challenges in Benchmarking Real Workloads (58:43)
The Story Behind the Name OtterTune (1:08:58)
Balancing Technology and Human Factors (1:13:23)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show, we have a huge treat for you, Andy Pablo, and Dana Van Aken, of Otter two are going to be on the show, we’re going to talk about database tuning and optimization. But what a privilege, Kostas, to have someone like Andy on the show.

Kostas Pardalis 00:47
Yeah, absolutely. I mean, first of all, it’s always, like, interesting to have people like, from academia, right? It’s like, especially when we’re talking about data systems. And r&d has done amazing, like work and research and his team and like his students over there. But it’s also like, even more interesting when you see people from the academia like to start companies, right. So digging, like, whatever the research is, and you do like a product, and like a business, it’s always like, super, super fascinating. So I think we are going to enjoy like the episodes a lot that we have, like many things to talk about, from what makes a database system and what notes exist out there to what it means like to from being like fixing a classroom database system like to go in a PhD, like then I was doing to go and start back from like, a business that sells software to companies that the wrong database shoes.

Eric Dodds 01:53
I agree. And if we’re lucky, and we have enough time, I want to hear where the name otter tune came from. So I’m gonna try and sneak that in if we can. Yeah, let’s do it. Let’s do it. Dana, Andy, welcome to the data sack show such a privilege to have you on and we cannot wait to talk about all things, databases, and Otterton. Thanks for having us.

Dana Van Aken 02:15
Yeah, thank you.

Eric Dodds 02:17
All right. Well, let’s start where we always do. Could y’all give just a brief background? And then we’ll dig into a bunch of Otter potterton stuff. Andy, do you want to start?

Andy Pavlo 02:31
Right, so my name is Andy Pablo. I’m an associate professor of indefinite tenure. And the computer science department at Carnegie Mellon University. I’ve been here since 2013. I did my PhD at Brown University with Stanza Nanak. And Mike Stonebreaker. And I am the CEO and co-founder of autotune, which is a research project we started at CMU. That was Dana’s PhD dissertation, which we’ll talk about next.

Dana Van Aken 02:56
Yeah, I’m Dana Van Aken. I’m also a co-founder and CTO of autotune. As Andy just mentioned, I started down the path of, you know, working in database tuning and machine learning methods. In 2014, when I did my PhD at Carnegie Mellon University, advised by Andy, that’s where the academic version of the product originated, and now working on it commercially.

Eric Dodds 03:23
Very cool. Well, Andy, I have to ask this, you know, studying under the likes of Michael Stonebraker, is really incredible. Are there any anecdotes or sort of maybe lessons you learned that you still think about, you know, or that still comes up, you know, pretty often, as you, you know, have had, you know, a hugely successful teaching career and are now doing a startup.

Andy Pavlo 03:53
So Mike’s philosophy, I think you’ve got to build a real system and get people to use it. And when it’s an academic, even when you’re in academia, that’s the best way to start guiding your research. And so not just to auto tune for other things you’ve done at CMU, that’s been my main guiding principle, try to build real software, because you just don’t know where it’s gonna take you. And some interesting ideas come out that, like, if you’re just writing stuff to run something in a lab, and you don’t, you know, you don’t see the full picture.

Eric Dodds 04:19
Yep. Yeah. Well, it’s, I can see that influence very clearly on your projects, will take us back to the beginning of autotune. So it started as an academic project, but you were saying before the show, sort of, you know, in the idea phase, you know, early on was sort of gestating earlier than it became an academic project formally.

Andy Pavlo 04:42
Yes. So my PC PG dissertation, Brown, was based on a system called H door, which is commercialized as VoltDB. And in addition to sort of helping build the system out of my research areas in sort of automated methods for optimizing and tuning the database system. So as mid Pickler like, I was looking into, like, how to, you know how to automatically pick partitioning keys for the database and other things. And the challenge I face is the student was getting real workloads to use for, you know, for our experiment and analysis, we did run the standard open source benchmarks that everyone did. But again, we wanted to get real customer data, which is not easy to do when you are in academia. So when I started at Carnegie Mellon, I wanted to continue down this path of looking at automated methods or for productizing. database systems are sort of right when machine learning was becoming a hot thing. And certainly CMU is a pioneer in this area. And so I was trying to look at a problem in databases that I don’t think people have really covered in the previous work, because database optimization is an old problem. It was like the 1970s. And I was looking for something that wasn’t just like index tuning, it’s important, but I was looking for something different. And I was looking at something where I tried to think about how I can solve the problem of actually having real access to the real database, or the real workloads. And so the, in the case of the problem, and we looked at knob shooting seemed like the obvious thing, because it wasn’t until maybe the last 20 years that the systems have gotten so complex with all these different knobs that that people have to tune that is something that was automated. And it seemed like machine learning could solve the problem where you didn’t need access to the workload in the database, because he just looked at the runtime metrics. The telemetry of the database system could use that as a signal to figure out what was going on. So the initial version of auto tune, or even before was called Auto Tune, because that was gained from the event name. The initial version was just doing some basics like a decision tree thing, a method to try to pick out, you know, how to tune like three or four knobs. And it was pretty similar to what had been done 10 years prior by like Oracle and IBM had similar tools that were wool based, like, I’m running multi workload with, you know, these kinds of resources. And it would spit out some basic recommendations. So, when you’re getting started is when we really push hard on this machine learning problem and try to understand, like, you know, how much can we actually tune and optimize? Should we also sort of back up here, the knobs are these parameters that control the behavior of the database system. So memory buffer sizes, caching policies, log file sizes, and so you can think of the database system as a as like a generic car frame, where you can put big tires on it and make a truck to haul things or you put a race engine on it, to make a big, fast race car. So the database system is sort of the general purpose software, where they expose these knobs that they want a user to tune based on what workload they think they’re going to run. And that cheat because that’ll change the runtime behavior of the system. And that you can so obviously, if you ever write heavy workload, you tune in one way for reading heavy workload, do not tune in another way. But the problem is that there’s just so many of these knobs that it’s beyond what humans can kind of reason about. That’s the high level problem that the original autotune project was trying to solve. And then Dana started, I think, in 2014, and sort of pushed the machine learning story about it.

Eric Dodds 08:15
But one quick question, and this is maybe more of a slightly philosophical question on the number of knobs, as complexity has increased? Do you think that that complexity is healthy or even necessary? Or are there more knobs that are helpful?

Andy Pavlo 08:36
So it’s a bipod, which is how people build software. So if you’re looking at the database system developer, you’re adding a new feature. You know, at some point, you have to make a decision about how to do something, how much memory you should allocate for a hash table. And rather than putting a pound to find in the source code that tries to be good enough for everyone, they just kick it out and expose it as a configuration, not because they said, because you just assume someone else is going to come along that knows what the workload wants to do. And they’ll be in a better position to make that decision on how to set that. But of course, that never happens. Right? People have no idea. And so is that your question? Is this the right thing to do? Like suppose everything’s knobs? I mean, it’s a means from an end user perspective. No, you want the data to figure it out for you. But from a software engineering engineering perspective, I think it’s a reasonable assumption of why someone or reason why someone would want to do that.

Eric Dodds 09:33
Yeah, absolutely. Dana, can you pick up the story and tell us how MLW entered the picture? And you know, how autotune became OtterTune.

Dana Van Aken 09:44
Yeah, absolutely. So I believe that we really started looking into doing research on sort of tuning methods, you know, machine learning based tuning methods, starting in 2015. And we were lucky enough to get one of the wonderful machine learning professors, Jeff Gordon, to help out, you know, and provide us, like, you know, some advice about the machine learning piece of it. Since primary, Andy and I are, you know, started as more systems and, you know, I have learned a lot along the way, but we were primarily systems researchers. So, um, you know, after some discussions with Jeff, he recommended Bayesian optimization as being, you know, a really good method, you know, for this problem, primarily, because, you know, we were trying to happen, we wanted a generalizable, you know, method, so that we could apply the same method to different databases. Bayesian optimization is, you know, in a POC, or, you know, black box model, so it, you know, provided us with, you know, the means to do this. The other, you know, big consideration is that collecting training data for configuration tuning is super expensive, in terms of like, you know, the collection time, and also just like, I guess, mostly in terms of the collection time and trying different configurations, because it’s such a big search space. So Bayesian optimization is, you know, known to be a fairly simple method that requires potentially a little bit less training data than some of the other methods that are out there. So, yes, so that was kind of like the direction for auto tune. Initially, there was previous work using Bayesian optimization for configuration tuning, it was called iTunes, and it was shipped not buy booze work, I think a while ago, I think it was published back in late 2009. So we were really excited, because, you know, we added this important element to it, which is automating, you know, a few different pieces. So I can touch on those really quickly. One thing that was different was that we applied some different methods to try and figure out which are the most important obstinate tunes. And that’s really important for reducing the search space. So we provided some, you know, state of the art methods there. And then the other piece is reusing data collected from, you know, different database instances to help tune new databases. So we provided you know, some of the, like, generalizability methods between how you would do that transfer. So, after, okay, so we published that paper in 2017. And then, actually, I’m gonna let Andy explain this part because that he does such a good job explaining this part where we kind of were, you know, we became,

Andy Pavlo 12:54
we need a smoother transition, then like, Oh, let Andy talk about it.

Dana Van Aken 12:58
Sorry, where are you? I love it. When you tell this part. I was getting excited.

Andy Pavlo 13:03
Okay, so, as Dana said, we published the first version of autotune. The idea of the origin project in 2017, and SIGMOD, ACM sigma, which is the top conference in the field research conference in fetal databases. And when it came out the other academics like, oh, this seems kind of cute, write the paper, but nobody in industry really paid attention to it. And then I, I met the guy that runs all Amazon’s machinery division, he now runs all of Amazon’s like database in AI division, this guy Swami, through a former colleague of mine, Alex, Smalltalk, was a professor here at CMU, and then went to go work on Amazon, on their, like, auto Glue and stuff. Anyway, he introduced me to Swami. I got five minutes of his time, just to thank you for giving Gaynor a couple $1,000 to run her experiments easily and he was like, great. Can you write a blog article for us? We just started the new Amazon AI blog, we need material. So we converted Dana’s paper into a blog article that they published on the Amazon website, it’s still there. And when it came out, that’s when everybody saw it. And we started getting a ton of emails saying we had this exact problem. We’ll give you money to fly students out and set it up for us. And so they were very appreciative of that experience. We had a lot of crazy people. And Dana, being a student didn’t realize you don’t respond to the crazy people that email you quickly learned. But anyway, so that so since 2017, is when we, you know, we realized that there is something here. And we then sort of tried to make an improved version of autotune that tried to encompass more information about the hardware and so forth. And that one, the challenge there was, it was, you know, for not having real workloads, like for running with synthetic benchmarks. You know, the basic version auditing could do a really good job. We thought that we wanted to run this on real workloads and really push it and see what happened. And so, right before the pandemic, we actually partnered up with people that run the dailies division at a major bank in Paris, in France, who were all keen about using auto tune. And so we did a collaboration with them actually running autotune on a real workload in real databases. This was an oracle at the time. But it was interns on prem. And the main end that we end up publishing another paper that was a kind of final chapter in DNS thesis, the main finding we found out there was, it’s actually the machine learning algorithm doesn’t make that big of a difference, like Dana talked about using Bayesian optimization, we try to deep neural networks, reinforcement learning, in the end, they all do about the same job. The real challenge is not the ML side of things, but it’s actually interfacing with the database itself, and collecting the data and reasoning about the state of the system when you make changes that are like, invalid, or like trying to understand what how’s it actually reacting, responding and incorporating that information or feedback back into the algorithms and the models? That’s even both in the academic version? And since we’ve gone out to the real world, we’ve since learned that the hardest challenge here, not the ML stuff.

Eric Dodds 16:15
Fascinating. Want to dig into that. Dana, could you just give us a brief overview of tuning? You know, I don’t know how many of our listeners have a lot of experience tuning a database, I kind of think about it, you know, to me, using an algorithm to do this is intuitive, because it’s, you know, I almost think about it, this is a horrible analogy, but like a carburetor, you know, like, there are people who have a skill set for tuning a carburetor. But if you hook a computer up to it, it takes care of most of the hard stuff that, you know, is difficult. What’s the skill set around tuning? And sort of where do you know, where does the human skill stop and the algorithm begin?

Dana Van Aken 16:59
Absolutely. So I would say, you know, in addition to, you know, automated tuning tools, like auto tune, this has traditionally been, or tuning the database has traditionally been a problem that was solved by database administrators. So, you know, medium to large size companies, even some smaller companies would, you know, bring on a database administrator, one or more of them to the team. And then, they would do a very manual process to tune the database, right, depending on you know, whatever they were trying to improve performance, or, you know, whatever the goal was there. So, what this looks like, you know, I’ll just give you an example, for configuration tuning, is that you basically, like, want to test a different, like, a single configuration parameter, you want to, you know, potentially tune the setting. And then observe the workload after you’ve made that, you go change that setting. So it’s a very iterative process, because it’s often recommended that you only change one configuration at a time. And then the other reason is just because like, you have to then observe the workload again, and that takes, you know, a long time. So the DBAs, you know, make these minor changes to the settings continuously until they’re happy with the performance. Also, for any companies that you know, might not be able to afford to deviate, or don’t bring a DBA on for other reasons. You know, what is like, aside from automated tuning tools, the sort of method there is while you go read the docs, and you go read blog articles, you go find resources, and you do it yourself. So

Andy Pavlo 18:50
you’re being very clear here that like, like when the when you say, the who’s doing this, so they don’t have a DBA a lot of times an auditor, it’s developers, it’s the DevOps people, people that aren’t database experts, and it’s whoever set up MySQL, Postgres, or whatever, DBS using it, the last job, they draw the short straw, and they have to maintain it for the new job. And Dana said, they go read the docs, and

19:10
good luck.

Dana Van Aken 19:13
That’s right. And they’re not doing this proactively, right? It’s like something goes wrong with the database system. And then they’re like, oh, you know, oh, crap, like, and, and that’s when they begin the research to figure out what to do. And then the process looks very much, you know, like, what a DBA would do with this, you know, man, these manual changes, but, you know, might take even longer because you gotta figure it out

Eric Dodds 19:35
along the way. And, I mean, this is, you know, forgive my ignorance, but it’s an interconnected system, right. So even though changing one knob at a time helps an individual isolate the impact, it stands to reason that actually, you know, understanding that multiple things need to change is where you can get significant gains, especially around the speed at which you can optimize performance. Is that accurate?

Dana Van Aken 20:03
That’s correct. So the benefit of the machine learning models is, you know, they can learn complex functions, right and complex behavior and understand it. So it definitely expedites the process of, you know, helping to understand which, you know, knobs are related to one another. And the interactions between them.

Andy Pavlo 20:24
Do so well. Another aspect of that, how people do tuning, like what you’re supposed to do versus what people really do. And this is actually one of the things that we’ve learned that, like, we made an assumption in academia, and then we went in the real world, it just turned out to be not correct. What you’re supposed to do is take a snapshot of the database, capture our workload trays, run on spare hardware, do all your tuning exercises, on that spare hardware, once you think you’re satisfied with the improvement, then go apply the changes to the production database. And obviously watch to see if that was correct or not. Very few people can do that. The French bank we talked about before, they could do that, because they had a very massive infrastructure team. They were using Oracle, which for people may not want to hear this oracle had really good tools better than Postgres, and MySQL to do this kind of workload testing. So they could do this. And we thought, okay, let me go out. And more startups, commercial versions, people would be able to do this. And that’s not been the case, people need to run a production directly in production databases, because they don’t have data, even if they have a staging or dev database, it’s not the same workload, they can’t push it as hard as it can the production database. So any optimizations you make on the staging database, may actually be insufficient on the production database. So. So again, so the thing that Dana mentioned, in her when it was a research project, about reusing the data across it, other databases, that matters more now in in the commercial world, because people aren’t going to have you know, aren’t going to have staging databases that can run a lot experiments on them. In some cases, we can always reuse that training data. And so that we just need to be more careful in the production environment, like, what we’re actually changing, let me do the search. That makes

Kostas Pardalis 22:11
a little sense. I have a question. And actually, I’m super happy. I have like two people who are coming from academia, because I can satisfy my curiosity around definitions. Either way, we didn’t like to use terminology. And if you stopped, like, you know, going deeper, like everyone is using a little bit of different meaning around them. And I think it’s important to have a common understanding of what we mean, when you use some terms, and you use both of you. I’ve used it like that, like workload, right? And real workload. So what is the workload, like when we are talking about database systems? Like what defines the workload?

Andy Pavlo 22:56
So yeah, so our view of the workload would be what are the SQL query that the application executes? On the database system to do something, right? But it’s more than just the SQL queries? It’s also if you’re looking to foreclose, it’s the transaction boundaries as well. So like the begin to commit the aborts, so you would you want to use a tool that does workload capture is when I, when we say workload captures, like that would be literally collecting a trace of the here’s the, here’s all the SQL commands, that the application set at different times dance from these client threads, and so forth.

Dana Van Aken 23:29
It’s also important to capture, you know, a period if you’re using that method of, you know, high demand, that’s typically what you want to optimize for.

23:37
Yeah. Okay, that makes total sense. And then I, you mentioned at some point about, like, the observing the system like, right, the database system. A while I’ve seen in my experience, when it comes like to try and like to figure out like performance and like collect like data to use for like optimizing like the database system, usually, what I’ve seen, like engineers collecting the results of like, the query optimizer, like what we optimize or creates, like, as a plan together, have correlate with some measurements or rumbling latencies of like, how long it takes like to and how much data like it’s been like process, and some statistics around like the tables, right? But observability is like, especially when we’re talking about systems in general, like, it’s what’s more, there’s much more stuff that someone can observe out there, right? So that’s what information you are seeking to observe on the database system as it works, to fit this data into your like algorithm.

Dana Van Aken 24:39
So, typically, when a user starts, you know, begins the tuning process. They tell us what they want to optimize for, you know, it might be a couple of days at a high level, it’s, you know, maybe performance or cost. So, that could be latency that could be CPU utilization or you know, the, you know, the cost to use edge, even collect from, you know, the cloud through API’s. So, in addition to so that, you know, that’s the primary metric that we’re going to use to help guide the optimization, but there’s a lot of other really important things you have to take into consideration, which is why we collect a lot of additional information, including, like all of the runtime metrics, you know, in the system, we collect the configuration options at each step. And then the performance schema is in both MySQL and Postgres, exposing just a ton of information there. And we try to collect as much of it and also at different levels, right. So you can collect statistics at the database level, index, statistics, tables, statistics. And, you know, like, what do we look at when guiding the tuning process? Well, a lot of these other metrics provide a good signal also for performance. And in addition, I actually want to mention, in addition to those metrics, you know, we also incorporate some domain knowledge, you know, to kind of, to also make decisions about the settings that we’re recommending. So what came to mind here is, for example, one important parameter that you can tune is changing the log file size in the system. So, you know, if you increase the size, it typically improves performance up until a certain point. But, you know, as you increase the log size, you’re also increasing, you know, the time it takes to, you know, replay the redo log or, you know, recover the database in, you know, in the case that it goes down. So, we also have to take these practical considerations or practical aspects of tuning into consideration,

Kostas Pardalis 26:47
and how do you perform these observability on the database system? Like, how do you correct it now? I want to get a little bit more like into the product conversation, right? Like, yeah, I get like what other students do, but like, how he does that, right? Like, how do I go to my RDS database? And so toppled are trying to collect all this data?

Dana Van Aken 27:12
Sure. So the way the other team works, and I’m going to discuss it in terms of the current product, because I think it’s just a little bit more intuitive of how it works, given that a lot of people are on the cloud now. So for example, we’re going to collect, well, we support AWS, rds, Aurora, MySQL, Postgres. So in addition to collecting the internal, you know, metrics from the database system, we’re also going to collect CloudWatch metrics from AWS. So we’re getting multiple sources here. So at the very beginning, you know, like I mentioned, the user is going to go in, you know, pick what they want to optimize for, and, you know, maybe a few other settings. And then the next thing that they’re going to do is provide us permissions, you know, both to, you know, access this data from the cloud API’s, and the database system. And so, you know, for the cloud API is, that’s pretty straightforward. But for collecting internal metrics from the database system, nobody wants to make their database publicly accessible, right, which means the auditor, yeah, be able to directly connect to it and grab the information. So we provide an open source agent, for people to deploy in their environment. It’s, you know, a similar setup to data, dog, New Relic and a number of other observability companies. And so they deploy that, and then, um, you know, that table that directly connects to the database, collects all the mentions, or all the information that I mentioned previously, and then sends that back to the otter tune in a more secure manner. So you know, once we have the proper permissions to collect all this data, the way it works is, you know, we observe the database for a period of time. In real life, we like to observe the database for, you know, a given configuration for 24 hours, at least, because all of our workloads are, we see a lot of diurnal workloads, you know, potentially, like E commerce sites, you know, there’s a number of industries, but they’re, you know, kind of busy, you know, starting at 9am, and then, you know, they hit their peak demand, and that kind of drops off in the evening. So, you know, capturing 24 hours worth of data, just make sure that we’re being consistent. So it’s just in the very first iteration, we collect the current configuration and also observe and then begin observing the database for 24 hours. We see that information. Well, we started, you know, we store all of the state in our database, and then using all the data we’ve collected so far, as well as some other data we generate machine or we build machine learning models that then generate better configurations and

Andy Pavlo 30:00
I just like to add also to that, like, the data we’re collecting is this runtime telemetry the data system, or foe CloudWatch. CP utilization, pages, read pages, written things like that. It’s all behind the information. We’ve done deployments at the French bank and other ones in Europe and their InfoSec. People look at what we’re collecting, and there has not been any GDPR issues. So, you know, it’s something our team doesn’t need, you don’t care about your data. Like we don’t care about your user to user data. And anything we collect, like a query plan to send back for, you know, query tuning a feature, we strip out anything that’s identifiable, because again, you don’t want, we don’t care.

Dana Van Aken 30:37
Right. And we also make it really easy for users to kind of switch on and off what information we can collect and kind of adjust our recommendations accordingly.

Kostas Pardalis 30:47
And he’s the product like generating recommendations, or is able like to go back and automatically tune the database,

Andy Pavlo 30:55
there’s the current version, and there’s the new version, the current version, it can automatically configure it, configure knobs, and that’s it, then we have additional health checks provide high low recommendations about other things like, here’s some unused indexes each drop them, here’s an auto vacuum settings and Postgres that are messed up, you should go fix those things.

Kostas Pardalis 31:16
But they’re not as precise or specific

Andy Pavlo 31:20
as we want them to be. So the new version is taking a broader view about the lifecycle of the database system and providing the automated recommendations, index recommendations and so forth. But also looking at the overall behavior, the data system over longer periods of time, and try to provide guidance recommendations, so that people know that they’re running the best practices. And so the new version autotuning is, it’s like, as Dana said, we tell him when optimizing performance, we still do that. But we also provide guidance about what’s the overall health of the system? And are you running the best practices that may not always lead to the best performance, or for some of our recommendations, like, if backups have turned off, you should turn them on, that could potentially make you run slower, but it’s the right thing to do. So the newer version auditors is trying to, in looking at the broader view of the database, not just like, how can I make it run super, super fast? And I think that coming from academia, like all we care about, do we make the graph go up, make it go faster? And then in the real, what we found is like, yeah, that matters, the pianist. And people come to us and say, they just don’t know what they don’t know what they don’t know. They don’t know what they should be doing. Yeah. And we see an updated basis that I think we’re in a position to provide recommendations along those lines.

Kostas Pardalis 32:35
Yeah, yeah, that’s actually a very interesting point. It does Europe, like perception of what performance is changed through your experience without Tertullian going out to the real, very, I mean, the market out there and seeing like, compared to how you were perceiving both of you like performance as like, academia, people.

Andy Pavlo 32:55
I mean, for raw data is performance now, like, the end of the day, like, it’s your throughput go up p 99. You know, going down. serialization is probably the one thing that we didn’t think about, that matters a lot. Also, in some cases. One anecdote, we did a deployment@booking.com. And they wanted to reduce utilization so that they can do consolidation. Right. And so even after a human expert in house, they’re my sequel expert, optimize the data system, auto tune was able to squeak out another 20 to 25% reduction. And so I think it was a cluster of 20 machines. And, you know, if you shave off 25%, now, you’re turning off three or four machines. And they had a ton of these clusters, so that for them that was a big deal. So in terms of raw performance, I still think the things that we focus on academia still make sense. But it’s these other, you know, mushy, fuzzy feeling things about databases, that it’s hard to measure, in academia, like and write a paper about, like, oh, yeah, someone feels better about their database. Okay. How do we measure that? And auto tune is basically the new version doing that for them. Right? We tell them up fun, like this is the healthier database, here’s the things you take care of. I’m not recommending them because you know, Dana has a PhD in databases, Randy’s read the textbook, whatever, it’s things that we see on an updated basis that we know are the right thing for you to be doing.

Dana Van Aken 34:22
We’ll also add a quick note, what we’ve learned from our customers that, you know, is much different from back in the academia is they’re really looking for peace of mind and also stability in the recommendations. So, you know, it’s not just about optimizing for, you know, the absolute peak performance over a small period of time, like some configuration knobs can provide some peak performance, but then they’re sort of in stable as you know, different background processes kick in or you know, something else happens in the system.

Kostas Pardalis 34:51
Yeah, 100% There’s this thing called on call that nobody’s happy to have to do right and like the last thing that you want is like Getting a pager duty mission that oh, now I have to go and like, figure out what’s wrong with my database or like my system or whatever. And so peace of mind is like, super, super important, like 100%? Like, I don’t really understand that. And Alright,

Kostas Pardalis 35:17
So one question that has to do with the systems that you’re working with, where I’ve heard you like talking both for like on prem installations, like you talked about, like the database or the cloud, like, they’re all like infrastructure there. And like, you’ve talked a lot about AWS and rds? Is there a difference between the two? Like, have you seen deploying, like the products? So far? Like, is there a difference between like trying to optimize like workloads like on prem and like trying to optimize like workloads on the cloud, from the machinery hours perspectives,

Andy Pavlo 35:54
there is no difference, right? It’s just numbers. Right? The challenge, though, is what I was saying before, it’s actually interacting with the database in his environment, the harder part, and so we only support RDS right now. And what that provides us is a standardized API or an environment that we can connect to retrieve the data we need, and make changes accordingly. You can’t change everything. But in terms of knobs, you can modify the parameter groups needed through a standard API, even asked to support on prem. And then we start talking to the customer about what they actually want, how we actually apply the changes we’re recommending. And everything’s always different. It’s always like, Oh, I got this, you have to write to this Terraform file on GitHub, and that fires off an action or you have to write for this other thing. And it’s all this one off bespoke custom stuff that people implement. And it’s not that we couldn’t support it, we expose an API, which we weren’t mentioned will do. It’s just, it’d be a bunch of engineering stuff that we have to not be easily reusable across different customers. So for that reason, we only focus on AWS, because it’s a standard environment.

Kostas Pardalis 37:02
Yeah, yeah. 100%. And when we’re talking about services, like up vas, like, okay, marketing, can usually over promise, but like, the idea was like, you don’t have to monitor your database, like AWS is gonna do that. But apparently, that was exactly the case, right? Like, there’s still a lot of things that need to happen.

Andy Pavlo 37:23
We had customers tell us, they thought Amazon tuner data is for them already. And like, like, Jeff, these, this is not doing that for you trust me.

37:30
So why is this happening? Like, why is AWS not doing that, like you mentioned at the beginning, for example, like, the access to the working state has a very high right, like they have access, like to all this information, like do these, so why they’re not doing it.

Dana Van Aken 37:46
So AWS has in other cloud providers something that they call the shared responsibility model, which kind of means that like certain, you know, for as far as managing the database, they’ll handle some parts of it, but not other parts. And more specifically, the parts that they won’t handle are typically, you know, looking at customer data or anything where they have to, you know, read or interact with customer data. You know, I’m not saying that this could change in the future. But I think that a lot of the, you know, sort of recommendations that they do provide, you know, configuration tuning or other types of tuning, they’re able to do it without really looking at customer data. So for example, they’ll just, you know, they’ll improve the default setting of a Postgres or MySQL or other database knob, just based on the hardware, right. So you can get a little bit of bang for your buck on that, because a lot of the default settings are meant for like, sort of the minimum hardware requirements of the system. But they do any, you know, they don’t do tuning to the best of my knowledge based on the workload. Yeah. Yeah.

Kostas Pardalis 38:53
And can you start with us, like a few teams, like, from your experience, so far, like, which knobs are like let’s say the most? How’s it most used for like, going and increasing like performance, like, I’m pretty sure you’re running late. Some statistics on your data or metadata don’t particularly like the data that you collect for weight of the world, the workload optimizations that you’re doing, right, so what happens in like, for the systems that you’re working with, like, a particular sense, MySQL, for example.

Dana Van Aken 39:27
So with Postgres and MySQL, you know, vote for both of these sort of, like, you know, disk based systems, the buffer pool is going to be very important, the log file size is going to be important. Certain, you know, parameters around checkpointing tend to be important, you know, I can kind of name off of you like those are, you know, targets that you’ll very frequently see in like blog articles, also those that impact, you know, the quality of the optimizer, and then you have ones that are you know, really important specific to a database system. So some knobs that are really important there are the Postgres are the autovacuum knobs, tuning those correctly to make sure your tables don’t get bloated. And those are yeah, those are some examples.

Kostas Pardalis 40:18
That makes sense. And

Andy Pavlo 40:20
also to that, like the. So we’ve done our own benchmarks. And when you compare against the default that Amazon gives you with RDS and Postgres, MySQL versus like the origin optimizes, we can still get to x better performance. And again, going back to academia, one of the challenges that Dean and I were facing were like, we didn’t know what the starting point should be, right? For example, Okay, how much better is autotune? And we would have this debate, we’re like, Okay, well, people, you know, he would do some tuning and wouldn’t be this, you know, really bad configuration. And we overestimated what people would have actually, in the real world, we like to auto tune better in the real world than we thought it would be in academia. The other interesting thing, too, about Amazon RDS, is that they obviously have Postgres and MySQL, Aurora. And in that case, you know, Amazon has cut off the bottom half of these database systems and replaced the storage layer with their own proprietary infrastructure. And so this removes a bunch of knobs that oftentimes you see, you know, major improvement, performance, improvement in performance for vanilla, or stock, Postgres MySQL, running on rds. So that’s been one of the challenges that I like. But also, I can’t prove this, but the Aurora knobs actually look like they’ve done some tuning, much better than the default for rds. And so we think that like, you know, people go from RDS and Aurora, and they see this big bump performance, and they go, Oh, my God, Aurora is amazing. How do they actually just turn the knob better for you? And they charge 20%? More?

Kostas Pardalis 41:50
Yeah, that’s super interesting. It’s like what I wanted like to ask thanks, because we see like more systems that are like, let’s say, serverless rights, like, as they’re called, like, let’s say something like planets key rights, or if we go to the OLAP systems like something like Snowflake. So in these environments, where there’s like, an even more abstraction is happening, like between the developer and the database system itself? Like, what’s the space there? For autotuning? Right? Like what auto tune is doing?

Andy Pavlo 42:24
So I mean, all these systems have nots, Snowflake doesn’t expose the knobs. They’re there. I know, because they told me, right? It’s like 600, or whatever they have. And basically, what happens if you’re a customer Snowflake, and you have problems, you call, you know, there’s poor people. And then the Scorpio talks to the engineer and the engineer says, oh, yeah, tune these knobs. So the knobs still matter is whether or not you have access to an exposure to them. And so I agree with you, if you abstract away enough of the system, and the knobs aren’t there, then, you know, there’s nothing in tune. But oftentimes, again, there’s other things to do, you still want to tune in a database system? And this is what the commercial version auto tune does, that the academic didn’t, version didn’t, right? We can tune indexes, really, we’re starting to ask for tuning queries. And again, there’s other cloud stuff that you should just be doing. That, you know, that like, again, like, I wouldn’t call it a knob and as a content chain payment system, but it’s the right thing to do. So the newer version, auto tune that we’re working on now starts to look at the lifecycle of the database beyond a single database by itself. And so what I mean by that is, oftentimes you will have multiple database instances. And you they, they may not actually, like, say, physically connected, but they may not be like, you know, Amazon doesn’t know that the replicas of each other, or they don’t know, Amazon doesn’t know that. Here’s the US version of the database. And here’s the European one. And it’s the same application, the same schema. It’s just, they’re disconnected. And so the, this is where we’re going next of looking at the database, in addition to the schema and all the other metrics that we’re collecting, and understanding, trying to understand what’s above it in the application stack, and starting to make recommendations and advise users about how they should be using their database, what to expect coming up. So an example would be, we had a customer that had a database in your database in the US, and it was actually the same schema, because it was the same application just running two different versions of it. And what happened was our twin identified that the US database was 10x, faster than the European one. And at the version originating at that time, we couldn’t figure out why. And the customer eventually figured out Oh, could they pick out the bildet index on the European database, so it was the same schema to someone who had to do migration and add the index. And so that’s where we’re going next with this is like, okay, now I understand that here’s two databases, they have the same schema, roughly the same workload, but they’re physically distinct instances. And so we can Start making recommendations like, okay, these should be in sync, you see this is something everyone here, you should make the same change over there. You can also at the start, do the same thing for staging, testing and production databases. So for example, people often do schema migrations on the staging database. And then a week later, two weeks later, at some later point, they apply the change to the production database. So again, Amazon’s not going to know that these you know, the staging and the production database are logically linked together, that the user knows that the customer knows that. But you can start doing things like okay, well, I see that you’ve done a schema migration on the staging database. And I know the things that you’ve done, you’ve added an index, you’ve created the table, so forth. And so our recommendations could be things like, Okay, well, you’re gonna make this change of the production database, because you’ve already made it on the staging database. And for these changes, like renaming a column, you can do that right away, that’s cheap to do. But for adding a column or dropping home or something, those more expensive operations, you should be doing that during your maintenance window at this time, like start making those recommendations that like, A, if someone can see the entire view of all the fleet of the data that they have, and know how the customer is actually using them, you can start making recommendations that like a human actually wouldn’t even be able to do now, because it’s just at scale. It’s not possible. That’s, that’s the big new vision of what auditors the new version auditor is going to start doing this year.

Kostas Pardalis 46:22
That’s very interesting. I’d like to because you mentioned earlier that, like, these cloud providers have like this model of like, okay, there are some staff that I’m going to manage for you and some of like, the customer service managers, right. And I’d like to take like, both, from both of you, if you stick around and like your opinion on these new servers, these models. I agree, like everyone agrees that like having hundreds of like configurations for a system out there is probably more of like the best way to expose this functionality to where humans use it, right. But there isn’t, like, let’s say, instead of going to the other extreme of like having like everything, let’s say, like, abstracted Is there, like let’s say, some kind of balance that it’s like at the end, like better, like expose on robes, let some users like through an API or something like that on something like balance scale, for example, like to go and do the tune, if you want to. And leave this like to the user and some other nodes that like it should be drunk. It’s better, like for the infrastructure provider, or the like to go like and mamas. Right? Like, it will always be like a better option to have the infrastructure provider like to do that. What’s your take on that? Because we then like in this industry to go through extreme SOAP, right, like, Let’s build all UIs. Are all we see allies, you know, like, but it’s like true, like, somewhere in between? So I’d love to hear your opinion on that.

Dana Van Aken 47:50
Sure, yeah, I guess I can start. So you know, you’d mentioned serverless, we actually support Amazon, Aurora serverless. Right now, and it further reduces the knobs are just kind of to your point. But what you’re really asking, I think is, you know, what’s the right balance, like, for knobs that you don’t expose to the user versus knobs that you do expose? And, you know, ultimately, how do you handle the configuration of those? It’s a really hard question to answer both in terms of just like methodology, but also, in terms of practical reasons, I’ll go over both really quickly. So at a high level, you can imagine, I think that this is kind of the route that Snowflake, Snowflake takes, you know, which is Andy mentioned, they don’t expose knobs to users is, you know, those, you know, the values that they’ve chosen, you know, behind the scenes work well, for most of their workloads. However, it’s definitely true, because as Andy said, I spoke with him about this at some length in the past, that that’s not always the case, there are definitely customers where configuration values are inappropriate. So, you know, what happens, then, while they have, you know, sort of database, you know, sort of an administration team that goes in, and we’ll configure on a, you know, case by case basis, you know, and if there’s like a, you know, a big performance issue. So, it’s kind of like trying to find the right balance between, like, you know, if you’re not going to expose enough to users, it needs to be really generalizable. And so, you know, maybe some of those configuration knobs where you can automatically tune it, you know, in the like, in the database, you know, the system itself might rely on hardware, I can imagine that they would be able to do that. I think it’s much more difficult for, you know, I would, I would say would be the a lot of the knobs that were lie that should be tuned for the workload as well. And so it’s kind of this balancing act. As far as the practical implications, just managing a ton of knobs is really difficult just from, you know, like an engineering perspective. So you have to deal with, like, deprecating knobs in the system, as you know, different, like components change, and you end up adding new knobs, some get deprecated. Like these knobs no longer have any impact on the system. There’s just a lot of management that goes along with it, which I think just adds to the complication of trying to kind of split up and expose some but not all knobs.

Kostas Pardalis 50:44
That makes sense.

Andy Pavlo 50:45
So you know, Dana’s focusing on knobs here is the same. There’s other stuff to tune. And the question is, what is the what is exposed to you, as a developer, like, one extreme would be like, I have broad access to the box, I can SSH into it, do whatever I want. Yep, nobody does that anymore. Right. The other extreme would be like I haven’t, I’ve only exposed an API where I can do basic things, and therefore I can’t even write it, you know, raw SQL query, the future is going to be somewhere, obviously, in the middle. And even if it’s serverless, you know, I’m fully in the relational model camp. So like, it is going to be a relational database, most people are gonna listen to HBase in through SQL. And if you have SQL, then that’s a declarative language that abstracts away the underlying physical plan, or what the system is going to use to execute a query. And then so that means someone’s gonna have to, you know, tune that accordingly, or know what the, you know, the physical design of the database is going to be. And so I think there’s always going to be a need for an automated tuning service, something like OtterTune in the future, because that’s how mostly we’re in databases. SQL was here, before we were born, it’s gonna be here, when we die, it’s not going away. And, you know, it’s because it’s declared that you need someone to actually index. Yeah.

Kostas Pardalis 52:00
Yeah, I totally agree. And like, the reason I’m saying that is because like cube, like, from my experience, I have experience like a little bit of like extreme, like seeing like something like Trino, for example, which there’s a lot of configuration or happen, there are many different levels, going to the cloud version, like Starburst galaxy, which is like completely well, I’d say your pocket to the user. And at the end, I still get as a product person, I’m not talking about like an engineering person here, that the user still needs to have some control, it doesn’t mean that like, you have to over saturate like the user with like too much control, like as we usually do, like in the enterprise, right. But going to the other extreme, it’s also bad at the end for, like, the experience that the user has, and causes, like a lot of like, problems with frustration there. And I think it’s like a mother of like, figuring out exactly what are the right notes, right, like, put out there to the users tendencies, like a bottle of like a brother conversation, but and I don’t have to get, I think it’s important when we’re talking about, like developer experience. And that’s, I think, like, what’s differentiated, like to the user experience, like, you don’t have to be completely abstract and make everything like, completely apart? Right? Like, the user still needs to have, like, some kind of control over it? When they’re, like developing, like, engineering solutions. Anyway, we can talk more about that. But I want to ask something else. You mentioned all this data, like all these models that you’re building. And these are like systems that are very complicated, like teams have been working on that, like, for a very long time, like Postgres has been there, like for, like decades. Do you see some kind of synergy with these themes? Like do you work? Like with them? Do you see the data that you collect? Or like the experience that you’re building, like,

Kostas Pardalis 53:55
helping them to build even better systems at the end? Like, have you seen something? Something happening? Or an opportunity in the future?

Andy Pavlo 54:06
I mean, we have not interacted with the Postgres MySQL community. I think there was none, we did a deployment once where we think there was a MySQL nabab Adaptive hash index, where it’s on by default, and it actually shouldn’t be. And there was I think, we brought that up on Twitter, I think some MySQL people looked at, you know, investigating whether we should make that beat off by default. Yeah, I think, do we have not interacted a lot with like, directly with the developers and based on our findings, and autotune. The where we do want to go we haven’t got there yet, is actually interact with more with the developer communities for some of the major application frameworks that we were using like Django, Ruby on Rails, no J S stuff, because, again, because we see the schema and understand what the queries look like. We know what, in some cases, what application framework you’re using. So we haven’t completed this analysis yet. But we want to sort of identify what are some common mistakes we see or how these programs are using the database. And then either, you know, some things we can fix, like, Oh, you’re missing an index that does, some things might be more fundamental with what the application is generating the SQL query. So that we think that’s later this year that we want to go next is that it can reach out to these communities and say, hey, look, guys, if you’re running a Django application, you’re gonna hit these problems and solve some of them. But other things, I think you guys have been fixed in the library. And there was a decK, going back to the university, I did have a major research project. We were building a database system from scratch, based on it trying to be completely automated, to remove all the things that in his work had attune. Can we build all that stuff internally, and have machine learning components on the inside of that? That project has been, we abandoned that project, because of the pandemic has been too hard to build a system of graduate students, you know, during lockdown, anything, it was a lot, a lot of things we learned in autotune, fed into the design decisions we’re making on that system. And something that we would revisit

Dana Van Aken 56:15
might just quickly add, like, as far as collaborating, or, you know, just even talking and working with Postgres or MySQL. One interesting story we were talking about, or talking with the former Product Manager of MySQL, who was basically in charge of, you know, working on configuration knobs. And MySQL added a configuration, Nominet. I’m forgetting the name of it, but it’s a dedicated server. Thank you, Andy. And basically, what happens is you enable this, and then it sets four knobs, like, you know, according to your hardware, some other metrics that they collect. So this, you know, this is, you know, they deal and potentially, this is why they reached out and, you know, providing them advice or like, our insights from, you know, the data that we collect could be helpful here. But they also mentioned that just this single change took, like hundreds of engineering hours to implement, I think it was hundreds, it was at least dozens. So it’s, you know, these systems are so complex, that it’s really difficult to make the changes internally and with the open source communities, it’s hard to know whether they’d want to prioritize something like that.

Kostas Pardalis 57:33
100% 100%. All right. Well, one last question from me, and then I’ll turn the microphone back to Eric. And I’ll go back to the beginning of the conversation that we had. And I’ll ask Andy about that. Because you mentioned at the beginning that it was always like, really hard to go and find real workloads out there, to use in research and drive like the things that we were doing. And as a person, the personality that I have pretty much likes to memorize the dpca toplist grease off.

Andy Pavlo 58:06
I wrote tvcc Four times when I was in grad school. Oh, yeah.

Kostas Pardalis 58:11
Did you see an opportunity for the whole industry like to move forward and have like more to link for everyone out there was like trying to build to be able like to benchmark or like measure performance, or just have like data rights to go and like, build the system on top, like becoming like a reality. Do you see like, there’s like, it sounds like to escape from the standards, like depreciates and have like something more meaningful, other. I mean, it’s,

Andy Pavlo 58:46
there isn’t a go. You know, there isn’t a consortium that people put together, like, Hey, here’s this great treasure trove of data that like everyone can use. It shows that they have different pockets. Like, again, here at CMU, we have a bench based framework, it’s a bunch of gators benchmark, summer synthetics are based on real workloads, that it’s a single platform that people use to rent experiments, the duck DB guys at a twi have a public di benchmark dataset that they collected with Tableau. So it’s, there’s bits and pieces of it that are out there. The one thing that we think we like from transactional workloads, really understanding the amount of concurrency and that, you’d see like you’re in a real workload, that’s really hard to get and that you need a road trace or something like that. And even to that, an origin. We don’t have that. You can’t, unless you sit in front of the data server and see all the queries coming in. Like we don’t even have that

Kostas Pardalis 59:44
makes total sense. All right, Eric’s I monopolize the conversations here.

Eric Dodds 59:53
One, one question that’s so interesting. Is the datasets that you can use to speed up the cycle of, you know, tuning a database, right? So you have datasets from other, you know, other processes where you’re in tuning. Do you see, you know, I’m assuming at least the way that we’ve talked about it that those are, you know, sort of actual datasets from, you know, real world optimizations that you’ve done previously, that you can apply moving forward? Is there a role for synthetic data? You know, can you use those datasets to actually generate additional synthetic datasets that can sort of take that even further, is that part of your vision,

Andy Pavlo 1:00:38
as we teach synthetic mean, like, take the real datasets we’ve gotten and then convert that like to reverse that back to SQL?

Eric Dodds 1:00:46
Sorry, and then sort of creating training data sets that are, you know, sort of manipulated through machine learning, like creating a synthetic data set that is based on real world datasets, so that you have a larger sort of repository of training data? Yes. So

Andy Pavlo 1:01:08
That’s essentially what we’re doing now. Right? Like if we don’t take TPCC, run, experiment, and then use that to figure out for real world customers how to tune that, and like, we looked directly at the real world database. The challenges though, and this again, as I mentioned this, while you look at 24 hour periods, like by default, in some cases, you can turn this down. The challenges that aren’t in can make recommendations. Again, just focusing on knobs, we make recommendations on how to tune your knobs, and then we apply the change to measure performance. It’s very hard if you’re looking at the production database to determine whether a change in performance is something that Ottersen did or something that occurs upstream. So like, if we change a bunch of knobs, and the next day, the queries are 20% faster? Is that because we did something or is that because of they deleted a bunch of data and therefore the date that the indexes are smaller, they, you know, they’re running queries of the added index, this is why you have to have a holistic view of the database in a way that we didn’t appreciate in academia. So we know what changes they’ve made, you know, we obviously can’t add it says like, they make certain changes on the application that caused them the queries you’ve never seen before, we least can see that and identified, okay, something has changed. But we can’t attribute that to the production performance. So this goes back to what I was saying at the beginning, like, in academia, we assumed that people could capture the workload and run on spare hardware, that was the same as the production database. So that way, you always have a baseline to compare against, and it’s the same workload over and over again, in the real world, you don’t have that. So you just have to do additional statistical methods and be able to figure out, okay, things have changed. And I’ve seen enough data to be an attribute of the benefit that they’re getting to us. Yep.

Eric Dodds 1:03:01
Them that makes total sense, is the optimization of those sort of, like, let’s call them contextual factors that are, you know, directly related to the knobs themselves, you know, so upstream changes, or like you said, you know, the context of DEV versus production database, etc. are those factors more fragmented? Like, is it a more difficult problem? Like, if the knobs themselves are fairly well defined? Is that context more fragmented? And what’s the approach just sort of solve for that if it is more fragmented?

Andy Pavlo 1:03:32
What do you mean by fragmented? Sorry?

Eric Dodds 1:03:33
Yeah, the, you know, it’s, there’s less you know, everything about knobs, there can be like consistent definitions across a database, right? So cache size or something like that, right. But the difference between Dev and prod maybe is more subjective, and not necessarily like a setting that you can observe technically.

Andy Pavlo 1:03:58
Yeah. So what makes the Deb database in some cases is that there’s a tag in rds, they can tell us this, the name is Dev, the name is prod we see them. So we, the current version, doesn’t do any sort of deep inference based on that, like those tags. Where we want to get to we don’t do yet if we come back and ask the user, hey, we think this is a test database, is this true? And they say yes or no. And then we know something about it and not. And then we can make recommendations accordingly. So an example would be in our current version, we identify unused indexes, like if you’ve never run a query on an index, or at least never run a read query and index, then you probably don’t need it. But if it’s a staging database, and you know, they’re not, they haven’t run any queries, because it’s only when the developer wants to test something that a bunch of indexes look unused because you got to use them. Yeah, right. So we need to be more mindful of those kinds of things. We don’t ask the user yet K, please let us know. What is it nodejs datasets, where we want to go next with losses to addition to asking whether it’s debt dev testing, staging, whatever it is, is that when I said that it was a logical link, like the schemas are the same, we’re seeing the same queries? Are these guys these two databases? Are they brother and sister? Are they related? Right. And, again, that’s a prompt, you’d have to ask the user, it’d be very difficult to reason about this stuff. And this gets into another big challenge in machine learning is the external costs, factors that you just can’t know automatically, you have to be told these things. It’s a limitation machine learning into, you know, how to say this. You know, if Dana mentioned that, there’s that sort of bunch of guardrails being put in the algorithms to make sure that it doesn’t make certain decisions or optimizations that could have consequential kinds of problems that we don’t see when just measuring performance. Like, if you turn off writing things to disk, you’re gonna go a lot faster. But now if you crash and lose data, people are gonna, people are gonna be upset. The others can’t reason about that. That’s an external cost. So we have to put in our domain knowledge about database people to know that these are the things we should be doing. So this same idea applies to this. The staging versus Dev.

Eric Dodds 1:06:13
Like, that makes total sense. Okay, two more questions. One of them is about the context in which otter origin enters the picture. Right. And so, Dan, I think you mentioned previously, which makes total sense, like, Okay, you’re tuning when there’s some sort of problem, right? Do you see otter tin, moving the conversation to a place where you’re implementing this ahead of time to avoid those problems in the first place, and sort of changing tuning from a conversation about we have a problem, you know, in performance cost, whatever it is, to pay, we’re gonna use this to actually sort of, you know, you mentioned this earlier, any, like, you’re running a Django app, like, you should just do these things, so that you get, you know, really good performance out of the box.

Dana Van Aken 1:07:02
Yes, definitely. I think that proactive tuning benefitted seeing all of our customers, essentially. So I would put our customers in kind of two buckets, some, like a portrait of our customers, or, you know, ones that, like Andy mentioned, their developers, this is a lot of, you know, companies, maybe these are small, medium sized companies. And so nobody’s like, directly, you know, managing the database. And then the other, you know, sort of group would be those with a dedicated, you know, database administrator, one or more, that are, you know, doing performance tuning sometimes, but like you mentioned, it’s not like proactive tuning. So in both cases, it’s beneficial. You know, in the developer case, you just kind of, you want to write code, you want to do your engineering job, you don’t want to be pulled back into this database, you know, to always solve a woman’s problem. So it’s super beneficial there. Just like Andy mentioned, we’re moving more towards a holistic view of the database, where it’s just, you know, peace of mind. So that’s really helpful in that case, in the other case, that when you have just when you have a large fleet of databases, or even a medium fleet, the amount of time that you can spend tuning them is just not very significant. So this is also very beneficial to people even who have dedicated DBA groups.

Eric Dodds 1:08:30
Absolutely. Okay, last question, because I know we went a little long here, but you know, this has been such a great conversation. Status, can you not be right, yeah, exactly. Exactly. And our producers are gone. So. So we’re definitely gonna go by the name Otterton. So Andy, you mentioned that Dana came up with it. I love the theme. I love the brand. And I love the stories around naming things. So give us the backstory on the name.

Andy Pavlo 1:08:59
Dana does the name and I’ll talk about division.

Dana Van Aken 1:09:02
Okay, okay, great. Yeah. So Andy and I were sitting in his office, in fact, we were sitting in the same place where he’s sitting right now, in person. This was I think, before the pandemic, of course, and we were trying to come up with a name, which is just a huge deal in research, you know, what are you going to name your system. So, we really wanted to come up with something that we both liked. And my husband and I had recently visited San Francisco. My favorite part of it was the Otters because they were doing really cute stuff. So we bought me a t- shirt, I mean, an otter t-shirt, and I was wearing it. And so it just occurred to me like what about otter tunes because you know, your animal names are kind of fun, like it’s always nice to have an animal mascot so I didn’t know it just kind of came together in a weird way. But of course, it’s, you know, kind of a play on auto tune.

Andy Pavlo 1:10:00
I love it. So yeah, so that’s the origin of the name. And then when we formed the company during the pandemic, I was sending emails to like investors and so forth. And I was watching the Wu Tang Clan documentary on Showtime. And the RES is like, you know, we’re gonna do the first Wu Tang album that we’re gonna do the solo albums and the record label and a clothing line. And I was like, oh, man, we should do the same thing with auto tune. Right? So there is an origin record label, we put up the clothing line yet. Yes. So like, like, that’s what we want to go for like this hip hop theme for the branding of the company. Because also, when I was looking at all the, you know, you look at VCs websites, you look at all the logos, and they all have things like fin fonts, and these pastel colors, and everything’s gonna look the same. And I was like, I want to do something where it was not clear again, whether we were like a record label, an art studio, or like a tech startup. That was the angle there.

Eric Dodds 1:10:55
I love it so much. That is marketing brilliance. At its best. So this has been so wonderful. We have covered so much ground, but we’d love to have you back on to hear about the next version of OtterTune when it comes out. And you know, you get it into production. Yep, I love to do it. What a fascinating conversation with Andy Pavlo. And Dana Van Aken. of Otter tune, Costas, where do I begin, I mean, of course, maybe the best part of the show was hearing about the name auto tune, and the influence of the Wu Tang Clan on their brand. So I think listeners are gonna love the show, just for that, obviously. But it was also really interesting for me to learn about tuning. And, you know, the complexity of turing and the skill of tuning, I thought was a really interesting conversation and really informed why something like auto tune is so powerful, because it can take so many more things into consideration, then, you know, a human changing one knob, and then you know, waiting to see the result on the entire system.

Kostas Pardalis 1:12:16
Yeah, however, I said, I think like we, okay, learn the tone, like through the conversation with ambience. A few things are like, I want to keep from the conversation, like hearing from them, like, like, how that’s like the tuning problem. It’s not just, let’s say an algorithmic problem, like the algorithm used, like machine learning or whatever it’s like bouncing, but it’s equally like an observability problem. And maybe it’s even harder like to actually figure out like, how to obtain the data that you need, how to collect this data, like from like database systems, making sure that like the right data, and like all these parts, which I think is like, super interesting, and how much of like a having the technology, again, it’s like one thing, building a product is another thing, like figuring out like the rights, I would say, balance between, like the technology itself, what the technology can do, and how to involve the human factor in need, right, by providing recommendations about best practices, like that was like super interesting, like to hear from both data like and rd, but other tools that Google just like, optimize for, let’s say, based on like the metrics that we collect, and the knobs that we have access, but we also like, combine that domain knowledge, like best practices from like running database systems to inform the user on like, how to go and do the right thing at the end. Right. And I think the example that they gave us, yeah, sure. If you go and like, turn off, cops, it will foster. Yeah, but is this what you want to do? Yeah.

Eric Dodds 1:14:08
Writing, you can take the airbags and seatbelts out of your car, and it will wait less. Yep.

Kostas Pardalis 1:14:13
Yeah. Do you want to do that? Yeah. So Yeah, amazing conversation. I was like, encourage everyone to listen to eat. And hopefully we’ll have them again in the future to talk more about database systems. What it means to start the company, and musically though,

Eric Dodds 1:14:37
Because, yes, a company in a music label. So we’ll definitely have to have him back on. Thanks for joining us again, on The Data Stack Show. Subscribe if you haven’t, tell a friend to post on Hacker News, so we can try to get on the first page, and we’ll catch you on the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 135:

Database Knob Tuning and AI with Andy Pavlo and Dana Van Aken of OtterTune

April 19, 2023

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter