Episode 153:

The Future of Data Science Notebooks with Jakub Jurových of Deepnote

August 30, 2023

This week on The Data Stack Show, Eric and Kostas chat with Jakub Jurových, the Founder of Deepnote. During the conversation, the group discussed the role of notebooks in data science and engineering. They explore the history and purpose of notebooks, the challenges of collaboration in notebook workflows, the potential of notebooks in the AI revolution, the importance of interactive computational environments, and more.

Notes:

Highlights from this week’s conversation include:

  • Jakub’s journey into data and working with notebooks (2:43)
  • Overview of Deepnote and its features (7:22)
  • Notebook 1.0 and 2.0 (14:04)
  • Notebook 3.0 and its potential impact (15:46)
  • The need for collaboration across organizations (17:16)
  • Real-time, asynchronous, and organizational collaboration (28:02)
  • Challenges to collaboration (32:03)
  • Notebooks as a universal computational medium (36:14)
  • The rise of exploratory programming (41:40)
  • The power of natural language interface (43:04)
  • The evolving grammar of using notebooks (47:02)
  • Final thoughts and takeaways (55:50)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show where you get information on all things data from me and Kostas, we have an exciting guest today. Jakob from Deepnote. Kosas I’m really interested in asking Jakob about notebooks in general, I mean, we’ve talked about the sort of ml ecosystems on. And we’ve talked a lot about analytics on the show. But I don’t know if we’ve ever had sort of a basic, you know, 101 on notebooks, right, and talked about where they came from, what they’re used for. And why there are, you know, multiple options sort of emerging, including Deepnote in the notebook space. So that’s what I’m gonna ask about, I want the one on one and some history on notebooks, because I don’t think we’ve talked about that yet.

Kostas Pardalis 01:15
Yeah. I’m very interested in sharing what your group is going to answer to your questions, to be honest, because I’m also having the same questions as you. But I’d also like to get a little bit more into the let’s say, what’s the relationship Wolf, like the notebook converter, like to other parties, what we have in writing code and engineering, right? Like we have IDs, we have Excel, right? Like the spreadsheet model. I’m very interested to hear where, what kind of gap if there is a gap, or what is going to substitute notebooks right from these parties. That’s one thing. The other thing is, you know, like notebooks have been like the most likely, commonly used among data scientists and engineers, AI scientists, right. So it’ll be interesting to hear from him about it. All these AI craziness that is happening right now, like how it has been the team supported by notebooks, right. So that’s what I have in my mind. And, as always, I’m pretty sure that more questions will come up. So let’s go with him.

Eric Dodds 02:44
Well, let’s do it. Jakub, welcome to The Data Stack Show. We are so excited to talk about all sorts of things, notebooks, AI, I mean, we’re going to cover a lot of stuff. So thanks for giving us some of your time.

Jakub Jurových 02:56
Eric, thanks for having me.

Eric Dodds 02:58
All right. Well, we always start with your background, and kind of what you’re doing today. So tell us how you got into data. And then how you started deep No.

Jakub Jurových 03:10
So my background is primarily in developer tooling, that is something that I have been doing as long as I can remember. And there was something very magical about building tools for other builders. And naturally, if you are a software engineer, you get very drawn to this concept of building tools for our engineers. And I’ve been doing this for quite some time. And in many different areas, in many different setups, I’ve been doing a lot of research in space, I spent a lot of time in human computer interaction research. I studied the usability of programming languages, but also spent a lot of time in the area of machine learning. What are the things that we could do there, both on that, let’s call it the interface level, as a user of those models, but also as someone who’s trading those models and the tooling that can help you out with that. So that’s my primary background. But I also spent Yeah, just the idea of building building tools is very connected to the idea of building products in general. So my background also touches some of that UI and design areas, something that you pretty much have to do if you are at least somewhat serious about human computer interaction. Yeah.

Eric Dodds 04:36
Very cool. And what, what led you to starting deep note, what was the Can you maybe even just describe sort of the moment where you said, Okay, I’m going to build this tool set.

Jakub Jurových 04:52
I think I actually can. I was thinking about this, and I realized that there was a moment When I saw Drupal for the first time, and this was very much an academic setting, this money had been back in 2015 2016, sometime around that area. And I used to be walking the floor of the computer, blah, blah, laboratory at Cambridge. And I will be looking at what all the other people are doing. And there used to be this interface, this early version of Jupiter that you will see more and more often on on the screens. And it’s like one of those things that at the beginning you don’t really understand, like, hey, like, this doesn’t look very modern, it looks a bit clunky. You try to install it on your own computer. And you realize that that’s not easy at all, you actually have to go through quite a lot of steps before we even manage to run this. And even if you manage to run this, it kind of looks like a software that was built a long time ago, like in the past. But despite all of these things, and you know, all the things that are always connected with early versions of something such as stability, you could see it’s just growing. And there was a group of people who were really allowed to use Jupiter. The interesting part begins where there was also a group that absolutely hated Jupiter. And you will often have this, like these two groups of people in very close proximity, especially if you are if you had anything to do with machine learning. Because machine learning by itself, combined, this combines the zoo to different ways of thinking about software, like the idea of exploratory programming versus the idea of software engineering. And suddenly, if people are looking at the same problem, for different lessons, for different methodologies, they might not really agree on what’s the best approach on how to build things. So this is my first introduction to Jupiter. And as a result, the first introduction to notebooks.

Eric Dodds 07:10
Yeah, absolutely. And just so we can level set, just give us a quick overview. I want to go back and talk about notebooks in general, but just give us a quick overview of what deep notes do.

Jakub Jurových 07:22
So we are all probably familiar with the traditional IDs, traditional cold editors, where you open up your favorite one, you know, it could be VS code, it could be pi charm, there can be plenty of others out there, and usually spent time writing and reading cold. And it’s purely a cold. It’s the interface purely for anything cold. But when it comes to notebooks, they introduce something new, they introduce this idea of mixing both text and, and cold in the same documents. This is pretty much the first version of Jupiter that said, Hey, why don’t we also add markdown to Python, it might be pretty useful for situations where I want to describe what is happening in that code. And turns out as you are either training models, or you’re running some analysis, it’s very helpful to add more context than what just the Python comments allow you to do. So that’s the idea behind we had no books as an interface, the place where you can combine both text and code. There are some philosophical backgrounds to this, there is this whole idea of literate programming, where this existed quite a while ago, but not in the same context, like literate programming used to talk about description of what is happening. Nobody tried to actually put it next step to the next step, the next level where you will actually bake it, like pretty much like the textual elements of the notebook will be probably as at least as important, sometimes even more important than the call itself. And this was the idea that we got really excited about, because turns out, this is the type of interface that allows you to bring a much wider audience to the tool itself. It’s no longer scary, like you’re sending someone an old book, they can actually find some anchors, they can find a heading, they can find an explanation of what’s going on. If you’re trying to do the same thing with a blind old Python file. It’s very unlikely that the non-technical viewer won’t be able to make any sense of it. But it turns out, we can do something like that in a notebook. So this is something we got very excited about when we were thinking of the future of notebooks. And well, we were not quite happy with the current state of the art that we are seeing. And we’re also not happy that there were only two types of only two types of cells. Why couldn’t there be more like, why do we have to just work with biotin? Why do we have to work just with Markdown? When there are so many other different building blocks that actually go into our day to day work? For example? There are actually more people writing SQL out there than Biden. Why is there no native SQL block in logs? Every single notebook kind of starts with the idea of describing what’s happening. And trying to learn markdown, just to give a title to your notebook also didn’t really seem like something that was very intuitive and very natural for users. So for us, deep know, this kind of natural evolution of notebook, we think about it as notebook 3.0. And we can maybe talk about this later in, in the show, like, what was the 1.0 is no book 2.0. But the way we think about the plot is this next generation of notebook interface that’s very easy to get started with something that’s naturally intuitive, something that should be as easy to understand as a spreadsheet. But it’s in time, something that’s really powerful, something that can build pretty much anything, or sky’s the limit, something that could be really compared all the way to this full fledged, really powerful IDs.

Eric Dodds 11:56
Super interesting, I actually think it’d be, I think it’d be helpful if we did talk about notebook 1.0, and notebook 2.0. Because, you know, I know a lot of our listeners are probably familiar with doing some sort of work in a notebook environment, that they’re also probably some who are less familiar. So I know, early on the idea of a notebook, of course, included the ability to write Markdown and subscribe as a combined sort of text and code. But obviously, there’s other functionality around you know, cells and running different parts individually, as opposed to just executing a single, you know, a single chunk of Python code. So yes, give us a description of 1.0 and 2.0. So depending,

Jakub Jurových 12:39
depending how far we want to go back. But let’s say that the first versions of a notebook started to appear sometime in the late 80s. Like you would have this concept, you would have this thing called Mathematica. And not that many people use it, primarily because this was a very specific tool targeted at a very specific audience, primarily people who were doing math, statistics, kind of like more of an academic type of work. And it was actually the first halogen first generation of notebooks that cater to this very academic audience of mathematics, statistics, physics, and we will talk about Mathematica Mathcad maple, these types of tools. And this is how it pretty much stayed for the next 20 years, still, like a very niche tools tool that didn’t really make much noise in other areas until let’s say early 2010s With the first release of the visual part of python so IPython as a tool has been has even earlier, but I think it was like version zero point 12 Or something like that, which added like not even a major version, right. The other thing is the visual interface that you could connect through your browser and instead of you typing in all the Python commands you know terminal, you would actually go to localhost 8888 your browser and there will be this very basic interface, which could be written by item in a nice text area. And this somehow changed the game. Suddenly, this really made the idea of an old Pokemon, like going from a small niche to a much wider audience. And that the beginning is like it stayed mostly in the academic setup. That’s where at the end they are Jupiter . That’s where Jupiter is coming from. But over time, we’re starting to see this appearing more and more often in industry, as people are taking what they’ve learned during their, during their studies, and actually applying this in their, in their jobs. And this is something they will describe as a mobile 2.0. Probably the best represented by Jupiter as the most prevalent implementation, because it started to add some of these features. But it was still relatively limited in that concept like some of those limitations such as how difficult it is to actually install Jupiter in the first place. How do you deal with collaboration? What do you do about reproducibility? And also just the limitation to these two basic building blocks, Python and markdown kind of made Jupiter never really escape that, let’s call it a data scientists or data analyst type of crowd. This is something that we are only seeing with notebook 3.0, that really started to appear very late. When the dense Diplodocus started, it’s big in 2019, we’re thinking there is something really magical happening here. Like we have a completely new type of computational medium, which is not just this, like, not just this, like, nice, cool thing that few people care about. But something that really appears to be this holy grail of human computer interaction research. For the past 40 years, we’ve been trying to find a tool that will be really easy to get started with, but at the same time, be very powerful. Not letting you know, running into scalability issues as you started to work on more complex or more complex problems, or as you started to involve more people into that process.

Eric Dodds 17:16
Makes total sense. Let’s dig in on collaboration a little bit. Because I know that when you think about, again, like a lot of ml work, a lot of you know, notebook workflows rely on some level of local development, right. And so you’re, you’re running a lot of things locally on your machine, which obviously makes collaboration difficult or at a minimum requires you to implement, you know, different processes that, you know, you have to run every time that you want to like push your work, test your work or pull down, you know, other people’s work. Can you describe some of the specific pain points there? And then, you know, what does actually being able to collaborate in a notebook look like? And what does that unlock? And who does it unlock it for? Sorry, there are tons of questions. And yeah, let’s just start with what are the collaboration limitations with, you know, say, notebook 2.0.

Jakub Jurových 18:15
This was very interesting for us to see, because we were looking at notebooks, and thinking, wow, we finally have something that can be used by anyone in your organization. No, just that no small number of data scientists are sitting in a corner, but actually something that can be shared with anyone in your organization, whether these are product managers or BPO, finance or C level executives. So we came to these hardcore Jupiter users and said, I know the problem is that you are feeling it’s a collaboration, right? And they would look at us and just say, What are you talking about, what collaboration? I absolutely do not want to have anything to do with collaboration? That yeah, antipattern, please stay away from me. And this didn’t really make sense, right? Because you’re you’re working in the setup, where you absolutely hate the fact that you open up a notebook, you compare your warehouse, you run some models, but the end of the day, it’s not like sitting on your laptop forever, like you need to share it with someone, at some point, like someone asked you a question. I want to give them the answer. And having to suddenly open up a completely new tool, whether this would be PowerPoint, and it will be like taking some of your paragraphs or findings, or just like sending this one off rule through email. Like we were very surprised that this is not something that already existed. And it kept me in so we spent a lot of time thinking about this and trying to figure out why we are so Pete-like, why are so many people unhappy about collaboration. And just to be clear, again, this was the same thing that we were seeing earlier, where there was one group that was very loud about how amazing notebooks are. And there was another group, very loud about how this should never have been invented, how they are waiting, counting the days, until they disappear. And like, we think this group of notebook enthusiasts, you again, wouldn’t have these two very loud groups for saying collaborations, that terrible idea, just give me a nicer output. Don’t use JSON, or use some kind of Yamo format, so that I can put it into my, into my GitHub, and lead the collaboration. But there also be this, again, pretty vocal group that will say, No, I don’t want to be using gifts. I’m a data scientist, I’m a data analyst, I am running hundreds of experiments every single day, you can’t possibly ask me to write a COVID message. For every single one of them. I literally have no idea what I’m doing right now, I’m just exploring as much as possible. And if you want me to write git commits, it’s just going to be named experiment, one experiment to experiment free. So this is, this is something that we spent a lot of time thinking about. And we realized that there is a concept there is already there’s already been researched in this area that describes these two types of workforce. What most people are familiar with, is this idea of traditional software engineering. This type of work where you know, what needs to happen, you know, what’s expected of you, like, you have this very nice, almost waterfall Lee way of working where someone comes up with an idea, they will have some kind of prototype sketch design pretty much gives you a blueprint of what needs to be built. And then the software engineer comes in, they take the mock up, turn it into something that’s actually usable, something that actually works, something they can put into production. And then we have this very mature software engineering system that knows what to do with this. With this artifact, they know how to version it, they don’t know how to deploy it, they know how to monitor it, there is a very nice ecosystem of tools around this. But turns out, there is a different way of working with data, something that we call exploratory programming, and or exploratory programming, we can imagine multiple different things. But the overall idea is that at the beginning, you don’t really know what you’re going to find out. Like you don’t have any kind of blueprint, you don’t know whether you are going to be working on this problem for five minutes, five hours or five years, because it’s very likely that no one has asked this question before, and you don’t really know what you’re going to find out. And in this world, you suddenly have very different goals, very different processes than you would have as a software engineer. And once we understood this, suddenly, like everything fell into place, everything clicked. And you understand that, okay, we have this really powerful suite of tools, all these calculators, all the IDs that had been built specifically for software engineering. But it turns out what we do with data teams, as data scientists, data analysts, is much closer to exporting to programming. And this is where collaboration also plays a part. Because while in software engineering, you actually want to be left alone for most of the time, like you told you, you got your requirements. Now you just want to like, close yourself in a dark room and spend a couple of hours writing code and then emerge victorious. With the final product. The idea of exploration and data analysis is actually much more collaborative and much more iterative. There’s also the reason why, if you have, we’re working in spreadsheet, this model of Google Sheets where you can have multiple people at the same time looking at the same spreadsheet, and being able to quickly collaborate and trade quickly as either question becomes very powerful.

Eric Dodds 24:59
Makes total sense So, what is that? So I’m really interested to know how you approach collaboration, in a deep note, from a user experience standpoint, because on one hand collaboration can be, you know, two people being in the same spreadsheet at the same time, right to use your example of Google Sheets, right? And so you can almost think about that as enabling pairing or, you know, easier review or other things like that. But there are also instances where you might want to actually, like, communicate with that person, you know, which a lot of times will happen, you know, via zoom call, or whatever. How do you approach that? Is it mainly just for people being able to interact with the same notebook? Or are there other ways from the user experience standpoint that you’re enabling collaboration?

Jakub Jurových 26:04
What we found out is that everyone wants to collaborate. But everyone has a different idea of what collaboration means. And over time, we had to develop some kind of framework, how to think about this. And we realized that there are three levels of collaboration. Each of them exists, will they exist in the same theme, for example, but they are different in terms of what is the expected outcome? So let’s say, let’s have a look, what will those levels be. So level one is something that feels very natural, something that’s happening on smaller scale, where you invite your colleague to bear program on something, we call this small scaled real time collaboration, very, very hard main feature is that you have two people looking at the same thing, at the same time, fully synchronous. This is where a lot of research goes into this collaboration, capabilities and synchronization algorithms. Everything that allows you to even collaborate on that line level, AES, two people are trying to type the same thing at the same time, because both of you spotted the same typo. And in your query, it’s very helpful primarily in the educational context, where you have the concept of a teacher and the student, or maybe a junior data analyst who just got a result of their query, and it’s full of nulls. Or they’re running into some kind of syntax error. And they just want to tap someone on the shoulder and say, Hey, can you help me out with this? It does, that person might not be sitting next to them, they might be on the other side of the country. And they just want to be able to collaborate in real time over zoom. But there is also a second level, something that is much more common in that software engineering world. And we call this the team scalar or as a synchronous way of communication, or collaboration. What does this mean? This is the moment where you actually start to rely more on features such as just as commenting versioning does, being able to see what does happen, what has happened in this document, between the time I looked at this last time, and now Git is really good at this. Because you can go and manage a collaboration team, you know, Team scale, you don’t really need to have all 510-15 people in the same room at the same time to understand what’s happening. They can all be working on this summit synchronously for just leaving comments, leaving feedback and being able to version their code. And there is a third level of collaboration that we found out is very common primarily in data teams. And that’s the idea of organizational collaboration. This is the moment where you have larger teams, and you will have a data team that’s sitting in New York, and then you have a data team that’s sitting in Singapore. And suddenly your primary concerns around collaboration are not about real time. Synchronization you don’t really care as much about comments. What they really care about is the better you can even find work that someone else has been working on. So the concept of putting notebooks into catalog into some kind of form. Walter’s having a very powerful search to even discover what has been happening becomes the primary concern. And once you start thinking about the collaboration of these three distinct levels, you can start to reason about this a bit better, and understand what kind of user you’re targeting with this specific feature.

Kostas Pardalis 30:24
That’s very interesting. And it’s really made me think about, like, collaboration inside, like, the organization. And I have a question that might also relate a little bit with, like, the different types of programming that you mentioned. But how does collaboration work when we have teams that need to collaborate, that they are not using the same tools, right, as you said, like there is this exploratory programming concept, which is very natural when you’re working with data. And it’s almost, let’s say, the opposite of how a software engineer works, right, where you have an algorithm, you have something very deterministic, you have a sequence of steps. And of course, it might sound very simplistic how we describe it right now. But the simplicity gives rise to a lot of complexity in terms of the to link, right, like we have IDs, the developers are using, we have good we have like, all these things. And no matter what, like when at some point, the, let’s say the data scientist, finishes her work like in the notebook. And we want to be productive or operationalised, like part of his work, right? The engineers will get into the equation, and they have their own tools. So how we can bridge the paladins. Together, so we can enable also this type of light collaboration,

Jakub Jurových 32:03
I may be biased, because I spent the last, I don’t know how many years working on notebooks and studying notebooks. But one thing that we kept seeing all over again, and again, was the curse of a data analyst working in with the modern data stick, just the amount of tools that you have to go through. From the inception of the idea to delivery of some kind of insight is pretty is pretty, it’s actually pretty wild. There are warehouses out there, there are ETL tools out there, there are extraordinary environments, there are dashboards, BI tools. And there are also completely different communication mediums. And they sometimes would work really nicely with each other, but still means that whoever we are collaborating with needs to have the same set of tools on the other side of the dryer. And it’s pretty interesting, because this wasn’t always the case, there used to be a time where every single person working with data would have a license of Excel, and you will be able to get a question in Excel, you will be able to do all your work in Excel. And they will be able to send back the same document to whoever was asking this. And you will have data teams collaborating very easily with, with product managers with, with business folks with with finance folks, because they will all be using this one beautiful evening fight unified interface unified tool. Turns out, we can’t really go back to the world where a spreadsheet is used for everything, because it kind of hits the limits of what you can do in a spreadsheet. And there have definitely been many advances with spreadsheets. Finally, as of a couple of years ago, we have notebooks that are therefore Turing complete. And we can do amazing things with it. But at the same time, we have seen quite a big rise on of the amount of data that we are working with, and trying to put more than no worrying couple of megabytes of data in the spreadsheet results in well, just the fact that you have to figure out how to share these how to how to set these over, but also computational limitations of your local machine. Famously, trying to put more than a million rows into Excel wasn’t the easiest task. And what we are working with right now is definitely more Much more than the million billion rows of data. So we had to start looking for four different tools. And that gave rise to this big Cambrian explosion of different tooling. And you will have a BI tool specified, like the specialized in this particular field, or he would have an analytics tool that’s very good at, at measuring products, product like impact of product changes, you will have know this whole suite of amplitude mix panels, and, and similar to get a subsidy of your work done. But when mobile came along, that’s something happened. Something interesting happened again, and internally, we talked about notebooks as this universal computational medium, because it really does give you the ability to build anything that you want. In that one tool itself. And just to be clear, that might not always be the same thing. But the very comment, like sometimes those specialized tools are much better for the tasks that you have at hand. But it always comes at the cost of complexity. And sometimes I just want to keep things simple. So in, in the world of deep nodes, we already talked about this, we don’t have just, first of all, we don’t have the concept of cells, we have different concepts we call the blocks. Because we think of these as building blocks. And you could have a block of Python code using another code, another block of Python code, but it will also have a visualization block, that can be using one or those variables that you have defined earlier, they will be able to have an input block, which allows us to do some kind of interactivity, and letting your fine tune some parameters. So all of this combined creates a possibility for a new type of computation medium that has pretty much the same beautiful features that we were used to from the spreadsheet world, but without the limitations of spreadsheets that they run into. Actually, sometimes, yeah, sometimes we go even as far as to say, Hey, we are living this very amazing time where notebooks are the spreadsheets of our era. And they’re just so much that’s going to be there’s going to be possible if we do the implementation, right.

Kostas Pardalis 37:39
Yeah. So if we have, let’s say, like, on one side, one, one extreme, like, let’s say an IDE, like Visual Studio, right, like something that someone is using, like the right type of code. And then on the other extreme, we have Excel, right, like the spreadsheet parodying both of them as ways to program the computer. I mean, right, like, that’s what you’re doing. In your opinion, the notebook is coming to substitutes, which one, the ID or Excel spreadsheet?

Jakub Jurových 38:17
Yeah, the way we think about it is that a notebook is the perfect medium for exploratory programming, whether it is exported data analysis, or it’s actually writing some Python code to find out what is even possible if I can even train the model to beat high enough accuracy, better. The syntax, as I remember from five years ago, is still valid. This function that I got from my colleague works the same way as I would expect, expect it to work. This is why workbooks are absolutely amazing: we are not trying to build a tool that’s going to replace the traditional software engineering tool stack, we are not going to be building things by monitoring our pipelines and artifacts. But we are going to allow you, we’re going to give you an interface that lets you answer your questions very quickly and very efficiently. Okay, got it. And okay, so one of the, like, the beautiful things around like notebooks is these mix of like different ways of representing information, right? Like you don’t have just the codes there. You have the comments, you have a very rich experience when it comes to walking with the machine. And we just like, entered like, almost like I don’t know, probably like a new era when it comes to like to compute with AI, right, like we have a new way to interact with the machine with these large language models, systems like GPT and all these things. So two questions, actually. One is how do you see the notebook being, let’s say, affected in a positive way by these new ways of interacting and working together with a machine? And the other question is, how the notebook supports these AI revolutions, right? Because there’s like, a huge amount of people like data scientists and melons in the hours of AI scientists, I’m pretty sure that most of them are like, probably using some kind of notebook to work with that, right? So tell us a little bit about that to like, how the notebook fit contributed to this revolution. And then how do you see the notebook change, because of these new ways that we have to work with data in the machine. There are two things that are happening right now. If you go and look up, tutorial, a demo on how to work with some new, cool hot model, they just appear on hatin face. Well, it’s very likely that you’re going to be getting a link to an old book, it turns out, this is the tool of choice for, for training and building those models, personally, primarily because of being able to iterate fast. And by the way, this is just something that has always been true. Like we started to see the rise of exploitive robot programming, even making the first wave of AI hype was the first time where people started to understand that we might not just the batch processing might not really be enough. And we want to have some kind of more interactive computational environment, something that allows us to iterate much more quickly. And this has been the case also in the second wave, and also in the third wave of AI that we are seeing right now. But right now, there is one more thing happening. And that’s not just the role of a notebook, for building those larger language model models, and AI in general, but also the way how users interact with, with AI. And when you say AI, we kind of mean the whole, the whole landscape of, of different tooling that’s available today. But if you were to think what is really happening, we suddenly have in our hands, a new type of computational paradigm. No longer you need to go and be an extremely specific tool to press a certain set of buttons that someone else had to put in on the screen for you. In order to get your job done. You certainly have this assistant that you can communicate with in natural language. And turns out the IDL interface for communicating with such and such a model seems to be very chatty. It’s very iterative. GPT made an amazing demonstration of this, when suddenly, like out of blue, you would put a cherry on top of an LM, and everyone just went crazy about extracting the value of the Delaune. But realistically, like when we look at this couple of years from now, it is very unlikely that we are still going to be interacting with our LMS. In this chat interface. The way we say this is that you still need something that is much more, much more interactive. But it probably should be a bit more powerful metal itself. And something that turns out notebooks are really good at something that really allows this fast iterative feedback loop, a place where we can quickly ask questions and get answers. And something that by the way, also allows you to do all natural execution of the code that you might receive as a result. And I’ll give you an example here. Sometimes you want to do data analysis and you will have the question that you want to ask, you will go to your data team and say, Hey, can you please give me top five customers in South America? And there are plenty of tools out there. But being able to ask this in a natural way with natural language turns out to be extremely powerful. And LM can give you that answer pretty reliably, as long as it has all the context as necessary. But we don’t see right now is that it is able to do it ultimately from start to finish, but can definitely act as your companion as you are, is the helps you navigate your data warehouse, your data catalog, and give you suggestions to say, hey, maybe you want to go and query this, the Snowflake warehouse, maybe you want to use this specific, this specific table, because there have been other analysts of similar kind that have been using it as well. By the way, there is also a knowledge base entry that talks about being careful. Because back in February, last year, we made some changes in how we define who is our customer and how we, how we call it revenue. So with all these things, and all this context, you can get to a pretty good place where the whole idea of self serve just becomes 10x, more achievable, and more realistic than the nodejs today. All right, that’s very exciting. I can’t wait to see what’s next with these Ella lamps and how they are going to be integrated, like in these environments like we’ve known. Yeah, 100%. I mean, we totally know what’s going to happen, right. But it’s very unlikely that the current set of tools, whether it’s a GPT, or Bart, are really representative of the user interfaces that we are going to be seeing in a couple of years. Yeah, like it’s, it’s kind of like this whole new millennium or something like this happens, like whenever we see a new kind of paradigm, like there is a certain period of time where we have to go and develop the grammar of how to go and use the paradigm. And we have seen this many times before, right. But I always like to compare this to you to the history of cinema, because there have been many situations in the past where you will suddenly receive some new capability. And when movies came along, for example, you will have you will already have an existing entertainment business, like you will always you will already have radio, you will already have people writing stories and telling you those stories. So when suddenly, movies appeared, it wasn’t immediate, like the first couple of years, the first few decades, those movies looked very different from what they are looking at right now. But they first appeared like it wasn’t really obvious that you actually want to, for example, at the edge, all the way to the movie, it actually took a couple of years to realize that this might be a good idea. Maybe I want to add sound to the movie. Yeah, the first Yeah, the first couple of movies were extremely static, like they were just not as much too fun to watch. Because you would put your camera in one place. shooting the scene without moving whatsoever. Like we will be using the same grammar that we learned from, from radio, where the story the narrative wishes would not be like actually acted would not be played, it would be more like free people in the same room, reading out loud from their screens. And that’s literally what the movie would show. It’ll be like later on to realize that wow, the camera can actually move around, maybe we can actually start bending to maybe we can start zooming, maybe we can start introducing some audio cues, and sound effects that may happen slightly earlier than you are actually making a visual. Like all of this led to development of a new grammar that allows us to shoot vastly different movies today than what we’re able to do before even though technology is fundamentally still the same. And I think this is pretty much the same situation that we are happening to be right now. Wherever there’s a really cool new toy, like a very powerful paradigm. There’s so much we can do with those MLMs but we are slowly discovering what the grammar is. And I think the first important piece of grammar will struggle to face but I don’t think there was the last one. I think there’s going to see many more of these. And I’m hoping that notebook is going to be one of those. Yeah, it makes a lot of sense. And I think like it’s an excellent metaphor that you are giving here like with

Kostas Pardalis 49:52
with entertainment. So one last thing from me and then I’ll give the microphone back to Eric sir. Something without anything exciting about deep knowledge, like something new that is coming or something new to the product, something that you’re really excited about?

Jakub Jurových 50:10
Well, it is, it is June 2023. Everyone’s talking about one thing only. And that’s how you go about integrating AI into your product. And we talked about this, and there is a reason to be excited. For us, we see these two, two trends happening, where people like to build their models in a notebook interface, but also trying to see how far we can take this. The tool is always about enabling the citizen data scientist, like giving the power of analysis, not just to a few people in your data team. But the whole organization has been pretty interesting to watch how, with a simple addition of, of LM, and maybe okay, it wasn’t that simple. But the cost of adding LM into your tool allows so many more people to complete their tasks autonomously. Like people who have a set of tools like a set of tasks, they will give out to, to our audience, user testing for just making sure that the funnel works great correctly. And the moment we started to add those AI features, the moment we started to add an autocomplete that’s currently live in a deep node, or the moment we started to add solutions of your next block, although the next block should be. Suddenly, it wasn’t just the technical audience that was able to call it to complete these tasks. It was also all the non technical folks able to come in and get us to get those questions answered. So this is this place where we are spending a lot of our time and trying to see how far we can push this.

Eric Dodds 52:15
All right, well, we’re close to time. So one, one last question for me, and it’s actually on the same topic. How do you think, you know, the, how do you think the LLM will change the level of technicality needed for analytics in general, right? I mean, you see that, of course, non-technical users can come in and ask questions and get answers. But with the ability to significantly augment on the code side as well, you sort of, you know, how technical Are you going to need to be to do advanced analytics in the future.

Jakub Jurových 52:56
I think there’s evolution happening on two different fronts. Because on the one hand, you can have more mess in your, in your tech stack, you can have more mess in, in your data catalog. And the LM will actually do a fairly good job and understanding was in there. But there will always be limitations. And if you can take it, you can harness the power of LM to actually curate this, and make sure that you always have your metrics up to date. You always have the definitions of your processes updated, then suddenly, the innovation on the second front of a cell server of someone coming in, asking a question, or getting the correct answer seems to be much more realistic. And we don’t really know how it’s going to play out, right? Because we are definitely suffering from the issue of hallucination. And if you are going to ask your Erlang question like how do you ensure that you’re actually getting the correct answer back? So there is I do see if anything, I see the role of, of data engineers, and people who are meeting this meeting those pipelines and making sure that all the metadata and data catalogs are up to date, there are only going to be more, one more important, but primarily because of the amount of query queries that we are going to start seeing from from the folks in your organization. Because no longer are limited by few people in your data team. We’ll be asking those questions. So it can be the entire organization asking those questions without having to wait a week until New Jersey gets assigned to a particular data analyst. But having those answers right there in almost real time, when you need them.

Eric Dodds 55:01
Love it. What an exciting future. Well, Jakob, thank you so much for joining us on the show. What a great conversation we learned a ton. So thanks for joining us.

Jakub Jurových 55:12
Yeah. Thanks for inviting me. I really enjoyed these as well.

Eric Dodds 55:17
What a good conversation with Jakob from a deep note. I have a couple takeaways. And maybe we try to do one takeaway. Usually that one was just the history of the notebook, I really enjoyed learning about that. I think that’s such a value to go back and look at where something came from, you know, and Jakob talked about sort of notebook 1.0, notebook 2.0. And, of course, they’re trying to build notebook 3.0. I thought that was really interesting. I thought the other big takeaway that was fascinating was, you know, when we talked about that traditional notebook workflow, it’s very individual happening on your local machine, etc. And so we talked, and we had a pretty long conversation about collaboration. And what is it? Okay, so you have a notebook, it’s a great environment, you know, for exploratory analytics, and other topics covered. But he talked about these three levels of collaboration, which I thought was a really helpful way, even just to think about from a product perspective, how you consider what to build in terms of collaboration. And it was super interesting, you know, sort of the different users, the different use cases, synchronous, asynchronous, those sorts of things. So those are the two big things that I’m going to keep from the show. I thought they were great. Yeah.

Kostas Pardalis 56:43
They’re all like, I got a couple of things that I found, like, extremely interesting, like, first of all. Yeah, Jakob DBLog gave an amazing metaphor. Between the entertainment industry and AI, what is happening today? And how AI is kind of like a new medium, let’s say, and we need to figure out what are the new ways of interacting with it. And whatever we are doing today, like, it’s probably not going to be what we’ll be using, like in a few years from now, which I find very fascinating. And I want to add on that, that, at the end, like the history of like, humans trying like to interact and like build and program, these machines that we call computers, outside of like, what we are building and how like, we are building stuff that change our future future like these evolution happens in parallel with evolution, like trying to figure out what’s the best way of interacting with these machines, at the end, like all these different systems, from like writing low level code, to using IDs, to using notebooks to using conversational ways to interact with the machine. It’s nothing more than trying to figure out more efficient ways of instructing the machine what to do for us, right? And I think our evolution in this industry goes hand in hand with the evolution in this human computer interaction kind of space, which is really fascinating. And we don’t talk that much about it. I think, like, yeah, we’ll be talking more about it. And I think the conversation is happening right now. That’s because we have AIs out there. And we’ll try to figure out what to do with this thing. Right. So Ralph, like how Dirac married. So anyway, there’s some very interesting topics that we discussed and will make me different, like, keep thinking about these topics.

Eric Dodds 59:04
For sure. All right. Well, thanks for joining us on The Data Stack Show. Lots of good episodes are coming up. So subscribe if you haven’t, tell a friend and we will catch you on the next. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.