On this week’s episode of The Data Stack Show, Eric and Kostas talk with Chris Bergh, the CEO and head chef at Data Kitchen. DataKitchen’s mission is to provide the software, service, and knowledge that makes it possible for every data and analytics team to realize their full potential with DataOps.
Highlights from this week’s episode include:
The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:06
The Data Stack Show is brought to you by RudderStack, the complete customer data pipeline solution. Thanks for joining the show today
Eric Dodds 00:18
Welcome back to the show. We have a really interesting guest today, Chris Bergh from DataKitchen, a really interesting company, a bootstrap company in the DataOps space. And that’s a category that I think Chris has really been working hard to define so I think one thing that I’m really interested in is what his perspective on what DataOps is. It makes intuitive sense I think for anyone who works in and around data but you traditionally hear DevOps, Marketing Ops, Sales Ops, Biz Ops … and DataOps is kind of a new term. So I’m really interested to hear what he has to say about DataOps. Kostas–
Kostas Pardalis 00:59
I think we are pretty aligned Eric. Actually that’s one of the two main questions that I have like I’m also interested to see exactly like how DataOps is defined. That’s one thing. The other thing is Chris is … I mean any person who’s trying to define a new category is by definition a visionary, right, so he’s also been in the industry quite a while and I’m really glad to hear his prediction about where the market and the industry is going around data.
Eric Dodds 01:27
Great well let’s jump in and chat with Chris. Chris so great to have you on the show. We can’t wait to hear more about DataKitchen and the success that you’ve had. Thank you for joining us.
Chris Bergh 01:39
Oh I’m really happy to be here. Thank you for the opportunity to talk about DataOps.
Eric Dodds 01:44
Yes absolutely. Well before we get into the data stuff which we always do because we have to, you know, stay honest to the name of the show. I’d love to just hear a little bit about your background and how did you … you know what was your pathway into building a company that works in the DataOps space.
Chris Bergh 02:03
Yeah. Well I have a technical background so a big part of my career was sort of building software systems at places like MIT and NASA and Microsoft and a bunch of startups. And I got the bright idea in 2005 that I should go do data analytics and actually my kids were small and I thought it would be easy and I’m like I’m a big software guy and maybe this data stuff is for lesser beings. And you know what, it was actually really hard. I managed teams that did data, what we call ETL. I had data scientists, people who did data viz all working for me, and as a leader it just it kind of sucked; things were breaking left and right and I never could go fast enough for my customers and there’s nothing quite as fun as explaining to a head of sales who have 5,000 pharma sales reps under them why the data is wrong and why you screwed up to really kind of want to change your perspective. I’m an engineer by training. I like to innovate and I hired a whole bunch of smart people, and you know some like R and some like Python and some like Qlik and some like Tableau and some like writing SQL and some like doing visual tools but they all wanted to innovate and try something new. And so my life for many years was how do you go fast and innovate and how do you not break things? And so that perspective from a leader and a technologist perspective we think is generalizable and so we formed a company around it about seven years ago. And in that time we’ve you know we’ve grown and we’ve had to market and describe the concept as clearly as we can as engineers. So we wrote a manifesto. We hired a writer and actually wrote a book. And I go out and try to talk about these ideas because I think they’re important and actually think they solve a problem that I have and a problem that I see almost every analytic team have.
Eric Dodds 03:55
Sure. You know, one thing that … Kostas and I talked about data all the time because it’s the work that we do day to day but, he has such a good perspective that he always reminds me, it doesn’t really matter what you’re dealing with, if you’re dealing with data it’s going to be messy. And that really is true and I think you spoke to that in terms of just saying you know trying to deal with all of the issues around pipelines and cleanliness and accuracy and all that is crazy. Before we get into DataKitchen, one thing I like to do which has kind of become a pattern and I always monopolize the beginning of the conversation, so I apologize again for the 30th time Kostas, but I’m interested to know, one thing that we love asking our guests about is how their early career has influenced what they’re doing now in data. So I know that you worked as a teacher in the Peace Corps early on and then you also did some work with NASA which is really interesting and that’s actually a common thread among our guests, you know, sort of doing work in a scientific context. But I’d love to know, are there any lessons from those early experiences that you’ve carried with you today as you work with data, even though the context may be pretty different?
Chris Bergh 05:16
Yeah, I think, well, number one, there’s this cliche, all who wander are not lost. And I think it’s okay to wander, and certainly when you’re young, getting all careerist right away is I’m not sure the right thing to do. And I learned a lot by you know, spending two and a half years kind of teaching math in Botswana. And I actually probably use those skills every day. And it is actually kind of what I do now. I teach. I talk about these ideas and share them. And those communication skills, I think, are hugely important. And if you want to advance yourself in a technical career, you quickly learn that, you know, your technical skills are great, and it’s certainly great to stay an individual contributor. And technically, but also communication and emotional skills become more. And so being able to work across teams is fundamentally a communication challenge. And then second, just working at NASA was just a lot of fun. I mean, it was I was really, I kinda was really into AI in 1988, and 1989, like, I just loved it, like, I went to graduate school on it. In that time, it was like the winter of the winter. And no one was into it. Like I took a machine learning class, and it had six people in it. Like, right now, if you go to college campuses, there are hundreds of people trying to take an ML class. And it’s and so I sort of really was into AI for like five years. And we designed the system to automate air traffic control and sequence in space aircraft to kind of like, into arriving at an airport in an optimal way. And it was fun, wrote papers and got systems installed in a bunch of airports. And it was like, totally cool, a totally cool project. And it taught me a lot about how systems work and how especially intelligent systems work. And so both those experiences, I think, were just great.
Eric Dodds 07:05
So interesting. Okay, so I just have to ask you a question here, while I’m stealing the stage. So AI and ML are very hot topics today. And you have a perspective that, you know, sort of predates it being really cool, you know, in the 2020s. Tell us, you know, in that ML class, what are the things that are the same? And what are the things that are really different, because I think a lot of our listeners, me included, you know, I, I understand the power of ML and AI, but it’s pretty new to me. And that’s, you know, in part because I’m young, but the concepts themselves are not brand new, you know, they’re not only five years old,
Chris Bergh 07:49
You know, I don’t want to say I’m an expert on machine learning. But some of the things that have happened in the 20 odd years are, you know, people learn to train the middle layers of neural networks a lot better, and it’s still back propagation. But the amount of techniques that you have to train a neural network, and especially the amount of data that you have to train on a neural network has gotten a lot more has gotten a lot bigger, and there are some other techniques that have come along like some of the ensemble methods that are good. But you know, my, my perspective, and I don’t want to lay myself as an expert here, but like one of the reasons I left AI and I left it almost like a jilted lover, to be honest, and like, I loved it so much, but I got so frustrated with it, because it got asymptotically hard to actually do anything. Like, and so we were trying to get … think of it like self driving cars, how everyone says they’re gonna happen. But if you’ve watched, watch the latest Tesla videos sort of swerve and almost hit pedestrians. It’s scary, right? And they’ve been at it for five years. And I felt the same about working at NASA. And that is, it gets harder and harder for systems to get more intelligent, because they lack the sort of semantic context that people have. And you know, you can push something, we could push our accuracy of our sequencing and improve it by 1%. But that next 1% was so hard and actually caused so many understandable problems. And so finding the right level of automation, we actually backed off the amount of control we gave, or the amount of instructions we gave the air traffic controllers and just told them less and we got better results. And so the synthesis of intelligence with people was really important in my mind. And so when AI got really hot again, I sort of felt like it’s that girlfriend I had in high school, and suddenly she was like a movie star and I’m like, what? What happened?
Eric Dodds 09:42
*Laughter* That’s great. Okay, well, I’m gonna ask one more question and then hand the microphone to Kostas, but thank you for entertaining me. I just always love to ask some sort of quasi-personal questions, but tell us about DataKitchen, your company. You’ve been around for a while. You’ve seen the data revolution come about, you know, just in terms of technology and process. Just tell us about DataKitchen and what you do. And then Kostas, I’m sure, is burning with questions.
Chris Bergh 10:12
Yeah. So you know, when I started kind of full time, in data and analytics, I had to explain what it was to people, like, oh, what do you do? We do charts and graphs. And they’re like, Oh, that’s nice. And you talk to people at a dinner party, and they’d immediately turn away as if you mentioned that you were like, you know, a garbage man. They just had no idea what it was, and it wasn’t that interesting. And so what’s happened the last 15 years are things like Moneyball, and the idea that data is not just some exhaust, it actually is a generator of value. And that’s a really important idea. And in fact, just like people have come to realize that every company is a software company, people are starting to realize that every company has a data and analytics part to it. And so the succession of buzzwords of AI and ML and big data have all come out in the last 15 years. And what remains, I think, is no different than 15 years ago, is that the process of work to build these technically complicated systems, and whether the data is in batch or streaming or big or small, you put that data from a lot of sources in one place. And then you do something maybe predictive on it of some type of model, and maybe you visualize it, and maybe you govern it, and those aspects, they’re better than they were 15 years ago, but they’re still the same. And what remains is something quite embarrassing, I think, to the industry is that most projects that people do in data analytics fail. There’s an incredible 50-60, one analyst at Gartner said 85% rate of failure in analytic projects. And that’s just way too high. And sometimes I bring it up. And I feel like I’m kind of, you know, saying something embarrassing at a party, but it is really too high and what business works with a high failure rate? And so what is the real cause of that? And why do these projects fail, and they fail, whether you did, the technical superstructure that you worked on, whether it was a small database, or a Teradata or a big database, whether what kind of tools you had, has almost very little to do with it, in my mind, and what became clear to me is that how you get a team of people working with the data they have and the tools they have all to work together is the hardest thing. And it’s a people and process problem.
Chris Bergh 12:31
And that the people and process that we should follow has already been discovered. And people have actually kind of already figured out how to do it. And so if you look at the way people made factories 70 years ago, and started off with, you know, sort of piece production and mass production. And finally, people like Deming and the Toyota Production System, they figured out a set of principles where a team of people are working on a shared, technically complicated thing, an assembly line. And so they talked about things like safety culture, or Theory of Constraints, just in time, or total quality management. And then those are actually accepted now, right, and you don’t want to run a factory without those ideas. You don’t want to run it in some taylorist way, and you want to make Toyotas and not AMC Pacer cars, which are crappy cars from the 80s, near where I grew up in Wisconsin. And those ideas sort of started to percolate in the software industry, the Agile Manifesto was written 20 years ago, first DevOps conference was in 2009. And really, honestly, those ideas are kind of the same. How do you get a bunch of people who happen to be software developers and people running software systems to work together on this shared technically complicated thing, a big piece of software, IT thing. And so what occurred to me when I was sort of suffering in 2005, and 2008, I started to read about Deming in manufacturing. And I was like, wow, these ideas really apply. And then of course, I was seeped in software, and agile in DevOps.
Chris Bergh 14:02
And so we started to apply those ideas in that realm. And early. So we started to do a lot of test automation, we did groups around, trying to look for errors in quality, we tried to apply some of these ideas, and tried to change the culture to make it you know, people love their errors and not have shame. And so we built a version and then a second version of a system that did that. And then we sold that company. And because the system also had a BI system and other stuff encased in it. And when we started DataKitchen, we realized that we just could not work in any other way. And we were sort of committed bootstrappers and we were doing some work for customers. And then we realized that this way we work is the way everyone should work. And we’re not just nerds to think that, we actually have a better idea than other people. And this is an idea that, like I said, people have had in other industries and we just needed a way to talk about it. And so we spent literally years trying to figure out how to describe what was in our head. Like we called it agile analytic operations, we called it DevOps for data science. We called it analytic Ops for a while, but then if you shorten it down, it makes a really terrible shortened phrase. And so we, about four or five years ago, we called it DataOps. And then we wrote a manifesto, the book. And we’ve been trying to talk about the concept since. And they’re all based on just our experience.
Kostas Pardalis 15:23
This is great Chris. Actually, you said something a bit earlier that I really loved. You talked about education, and how being in education helped you. And that, actually, you’re still an educator. And I love that. I think it’s super important, especially with technology. Actually, many times when I’m talking about marketing intake, what I’m saying is, that’s the function marketing should have in tech, to educate people, right. And I think you were pretty early in like many new technology trends in the market. And I’m pretty sure that you are experiencing this also, with DataKitchen right now and DataOps. It’s almost like mandatory to be able to, you know, educate the people out there about the ideas that you have, how the product changed their life, and how it can be used. And I’d love for our show here to be like a channel for this kind of education. So let’s focus a little bit more on DataOps, you mentioned a little bit about how you came about the name, how your experience shaped your decision, like to create something around DataOps, what’s, let’s say, one sentence, two sentences, like definition of DataOps, how you could describe that?
Chris Bergh 16:37
Well, I think it’s a set of technical practices and cultural norms for data and analytic teams to focus on really three main things. One is being able to iterate quickly from the ideas in their head and get it into production so they can get feedback from their customers and learn. The second is to run their factories with very, very low errors of any costs. And then third, is to deal with the fact that your data and analytic teams are teams, and not just one team, but many teams. And so how does your data science team relate to your data, your centralized data team, to your self service teams, to your data governance teams. And so it’s about focusing on cycle time, error rates, and collaboration, and all those things end up. If you get those right, you actually end up being able to produce a lot more insight for your customer. And you end up being able to have a lot more customer trust, and your team is actually happier and more productive.
Kostas Pardalis 17:42
Makes total sense. So I have some more questions around that. But before we go there, you mentioned the DataOps manifesto. And I’ve seen that before with the Agile Manifesto, for example. What made you go after something like this? Why was it important to come up with a manifesto for DataOps?
Chris Bergh 18:02
So we went to our first conference, and we, you know, we wore chef’s hats and gave out wooden spoons. And people just thought we were freaking aliens, they had no idea what the term DataOps is, they had no idea what DevOps was, they just are you an ETL tool? What are you guys? And we paid all this money and like, you know, it was just embarrassing. And we sort of realized that, like, wow, we’ve got to go back to the beginning. And so we wrote the Wikipedia article after that. And we wrote the manifesto, got some feedback from other people. And then we realized that we had to write about it in a very clear way. So we were always going to conferences and discussing, but we had to. And so the ideas, the expression of the ideas was actually really important and kind of from a business, we felt like we were creating a software category. And so we had to do the work. And I think, you know, thank goodness, we never got any funding, right? Because it just took a while. It’s still taking a while. Because, you know, we’re sort of the anti-lean startup, it’s like, you know, we’re sort of stuck on this idea. We know it’s right. We’ve got to find the right business to make it happen. But the education part is really what surprised me. And you know, we’ve had over 10,000 people sign, we’ve had 15,000, people read the book, and I’ve literally had dozens of people who’ve read our book, and then have gone off to influence their organization and to follow the DataOps principles. And I find that really interesting and really exciting that ideas can change. And I think what’s cool about what I really like about technical people is we just love learning, and we love ideas, and we want to try some stuff, and it’s a good idea. You should try it.
Kostas Pardalis 19:39
Absolutely. So how do you write a manifesto? It sounds like something very revolutionary, let’s say, what’s the process? I mean, you mentioned that it takes time, iterations. You’re the first person that I have met who has been involved in a manifesto to be honest, so I’m really interested to learn more.
Chris Bergh 19:58
Well, we stole literally, literally. So we started in one of our conference talks, we had taken the Agile Manifesto, or the Agile Manifesto and removed the word software and put in data and analytics. And that actually kind of made sense, but it was sort of wrong. And then we took it and put it into a Word document and started, my co-founder, and I started mailing it back and forth. And a fella who worked for me also was involved. And then we just, you know, we, we tried to make it happen. And so there’s, I looked at some things in DevOps, some things in lean, and we sort of … when you live the pain for seven or eight years, or you’re continually living the pain … because at that time I was we had, we had built a sort of early version of our product. And I was actually also functioning as a data engineer day to day for a small company. And so we were using our product, and I was doing the data work for a small pharma company. And so I was literally writing code and feeling the pain myself. And it’s been surprising, I just thought, you know, it would be kind of silly, and we tried to get other people to join in, and some people did, and gave some feedback. But it was mainly sort of us as a company putting it up. And, yeah, I mean, it’s marketing, right? So it could just be bullshit. But it also is an expression of really how we think. Those 18 points are really what we’ve learned, and so they’re true for us.
Kostas Pardalis 21:23
Yeah, I mean, it’s not wrong if something is marketing. As we said, like marketing is also education and it’s also communication. So it seems like it’s a great tool to communicate ideas, and especially in a very early stage, right? Because, okay, as you said, DataOps was a very new term, like, you need to communicate that, you need to create a consensus over what this term means. And you have to establish this communication. And I guess, having a manifesto and going back and forth, and like talking with the community and agreeing on that, I think it’s a great way to do it and create a new category, which is great. Quick question about … we could get a little bit more technical around the concept of DataOps, I hear you all this time that we are discussing, and you’re mentioning agile DevOps. And my understanding is that there are like techniques, best practices, methodologies, maybe also technologies that are related with these disciplines that are borrowed for DataOps, and correct me if I’m wrong. So can you share with us what from each one of the disciplines that affect DataOps, or inspired DataOps are the most important?
Chris Bergh 22:30
Yeah, and I think the first one is testing or monitoring, or some companies who started in the last year are calling it observability. And so that goes to, when you have data in production, you’re in the squeeze, right? The data is coming in, and you don’t know if it’s good or not. And therefore you don’t know if your result’s good or not. And so you want to make sure that the data is tested and monitored and correct before your customer sees it. You don’t want to get that call on Friday afternoon saying the data is wrong. And then you’re spending all night Friday trying to fix it, or you’re leaving the soccer game on Saturday, and your wife’s giving you dirty looks because you just got an email, something’s wrong. And by the way, these things have happened to me, and people I work for. And it’s not fun. And I think you should build a system that you know that if it’s right, and it tells you if it’s right, and to do that, you’ve got to go in and grab bits of data, look at them, compare them to previous versions, you’ve got to test the size and shape, you’ve got to look at the artifacts, the models, the visualizations to make sure that they’re all right, because if you take the manufacturing analogy a little further, the workstations and in the assembly line are the tools that we use to do the work. And so there’s a class of tools to do data work called ETL, or ELT or Data Prep, there’s a class of tools to apply models, there’s a class of tools to visualize, there’s a class of tools to govern, all those are sort of workstations that you use and data is passing along the assembly line on those workstations. And it doesn’t matter if it’s big or small, or streaming or batch, you’re still having a tool and that tool is governed by code. And that code has complexity to it just like software systems do. So the first thing is that you run a factory. And that’s similar but not quite as similar to software systems. The second is more similar is that it’s, you know, analytics is code. That’s one of the lines in the manifesto and your ETL tool may produce an XML file, but that is code equivalent in my mind, because it runs in an engine, your viz tool may produce a visualization that’s an XML, but that runs in an engine, you may have SQL code or Python code, that’s literally code. And there’s some tools like that that produce YAML files, which are very close to code or JSON files. And so you have a code governance system, right. And so code means complexity. And so we’re, when you’re doing data analytics, you’re in the complexity business, and software actually has been in the complexity business for years. That’s what it is–how to deal with all this. And one way that software teams deal with complexity is to have a path to production that is automated.
Chris Bergh 25:03
And so one aspect of that path of production is they have a development environment where you can test things and find out if you’ve broken anything. And so you can, you can change something in the middle and see the effects downstream of it and in development. And I think that’s an incredibly powerful concept. And a lot of data and analytics teams, most of them: A) aren’t testing in development, or they’re doing it manually, and they don’t judge the sort of small effects. And so they end up building processes, like meetings and technology review boards. And so the other process that software has done, in addition to complexity, in addition to testing is automating the deployment of things from a development environment to production, making that smooth and fast and automated. And so DevOps, kind of some of the same ideas are there but DataOps is different, and fundamentally, at a high level, like if you boil it all the way, Agile says, the thing that you’re building, get in front of your customer quickly and change it because you don’t really know what they what they want, right and don’t spend six months building something, spend six days and then iterate and iterate. And you thought you had to do 10 things, but you put it in front of your customer, you learned that they didn’t want five, but they wanted two more. So you’re gonna have a net gain of three, a net gain of three things that you didn’t have to do and the customer is going to be happy. That’s like, to me Agile in a nutshell. But the problem with data is you’ve got another cycle, in addition to the thing that you’re giving in front of your customer, you’ve got the data cycle, because the data may not support what your customer wants, you’ve got to learn, test, probe, model, experiment on the data.
Chris Bergh 26:44
And so you’ve got these two cycles that are going on, the application cycle does it make sense and you can see that as charts and graphs or dashboards, or however you want to express that. But you’ve also got the data and the learning from the data cycle. And both of those things, I think are better done in an iterative, experimental way. And they have to be coupled together. And that makes DataOps more complicated. And then finally, you know, software teams have Dev and Ops, they’re two separate teams. And they’re usually under the same boss. And data and analytics, there’s just multiple development teams and multiple operations teams, the whole idea of self-service data prep, self-service visualization, and being able to push into production and data science teams are off in their own corner. And we tend to work with big companies. And they tend to have hundreds of people doing data and analytics scattered around the organization. And you could argue that 10% of a company is involved in some form of the process of dealing with data. And the best companies, in my mind are companies like Netflix, who are trying to have everyone in the company, or Spotify have some ability to access the pile of data and get good results on it. And I think that is where we need to go. But that means that everyone in the company to a certain extent is going to be a developer, is going to create code. And what do you do with that code? Well, it should be engaged, it should be versioned, it should be tested, it should be deployed, you run a factory in production, all those things happen, whether you happen to be a full-time highly-paid $200,000 a year professional, or you happen to be someone with a, you know, a BA in business who’s helping to helping the business by doing something.
Kostas Pardalis 28:26
This is great, great, super, super interesting. Actually, when you mentioned at some point about the manifesto and when you say that data is code, I couldn’t help and it reminded me of the exact opposite that the Lisp programmers say, I don’t know if you’re aware of it, that actually they say that code is data. Anyway, it’s just something that comes from more of a computer science kind of thing, because of how the language is. But this equivalence, what I’m trying to say is that like this equivalence between data and code, actually is super, super important. And we also see that a lot. So I have a question about data itself. We are talking a lot about data. You mentioned, like all these value creation chains, let’s say somehow, where data is moving around, it gets processed, but about what data we are talking about? Data can be almost anything right? What are the most common types of data that your customers are using, you have worked with? And what usually, you have in your mind when you are talking about DataOps?
Chris Bergh 29:29
Yeah, so we, you know, we work with companies, like for instance, big pharma companies, and they have groups that do analytics for commercial, like marketing and sales. They have groups that do … multiple groups that do data analytics for drug discovery, like genomic data or experimental data. They have groups that look at manufacturing data, which is like production and you know, quality metrics of the drugs they create, and they have internal teams that look at sort of financial or HR metrics. Or you could look at companies like financial service companies right and then they have you know they have teams that are focused on compliance and risk in addition to marketing and sales and internal functions and then they have different domains like banking or brokerage or insurance all within the same company, companies who are sort of more b2c consumer and their websites are throwing a lot of data off and there’s internal systems and so even charitable giving companies right who have to keep track of where their donors are and how much donations and how much the effect of their marketing campaigns. So it’s varied and you know it is true, we’re not particularly domain specific because a data and analytic team will do some of the same things. A lot of the same things in variant of what type of data it is, but the people who are most interested in DataOps are areas where the amount of questions they have of the data outstrip the supply of the team able to answer it. And that’s one issue and then second is where the tolerance of the team, they want to trust the data, and so a lot of times organizations don’t end up trusting the data and there’s a lot of reasons for that but those are the two things that we look for and sort of our prospects is like the the data team is not keeping up and they’re having just a lot of problems in the assembly line of getting data out and then the third is that they kind of realize that that’s a problem that they should fix because that’s not always the case, right sometimes that is the sort of hair shirt or they think that they have to live with that status quo, that they’re always going to be, I have too many you know their backpack is always going to be filled with requests from their customers and they got to wear it like St. John the Baptist wore a hair shirt and say like we got to suffer and our lot in life is to suffer. And like I just don’t think that that’s you have to live that way and I find it that I was not happy personally living that way and suffering under late nights and deadlines and kind of not feeling great that I couldn’t satisfy my customer because they always had 10 follow up questions and we couldn’t answer them. And so the solution to that is not to like look for the new magic tech widget, and I’m an engineer right, I love it and and so the solution is to sort of rethink how you and your team work and that fundamentally is a leadership question and so how do you lead a team to do that.
Kostas Pardalis 32:31
Do you think that DataOps is different when we’re talking about data analytics or business intelligence and when we want to do some work with machine learning? Or do the same principles apply in both use cases?
Chris Bergh 32:45
Well certainly putting Ops on the end of a noun is fashionable now. So there is Model Ops and ML Ops and that’s to me the idea of DataOps as applied to machine learning. There’s Data Gov Ops which is a new one. We actually helped coin it which is the application of data governance principles and Ops sort of like governance as code. And you know I think I think there are specifics in each domain that are unique to whether you’re talking about managing a data catalog and the deployment of changes to a data catalog from production whether you’re actually doing data management or doing data science. There are techniques specific to monitor compliance of a model and there are specific techniques to look at how to understand changes in data and so I think all parts of the data analytic pipeline have an Ops thinking and I tend to bundle those all under the term DataOps but the market sometimes refers to them differently saying that DataOps refers just to the data warehouse data portion and or the DataOps or versus the data portion, model Ops refers to the model portion, and analysts haven’t really named the portion that helps to do with self service analytics yet because self service Ops is too awkward.
Kostas Pardalis 34:03
Yeah makes sense. All right, last question for me and then I’ll hand the microphone to Eric. So we talked a lot about DataOps. How does DataKitchen actually help with that and how do you build a product or a service around DataOps? How did you do it?
Chris Bergh 34:20
So we’re a product company, so we have a software product that helps you solve those problems. Helps your team deliver more things to your customers so you’re not burdened, helps you use your current tools to deliver it with less errors, and helps you not sort of end up in the Hatfields and McCoys of you know your data science team and your BI team are at each other’s throats. And so we do that through software. We also have some services around it. What we found lately is that our thought leadership is valued and bigger companies are looking for us to help them with their transformation to do the DataOps so big companies will set up a internal team under the CEO saying they believe the leadership believes that the core problem isn’t going to be solved by buying yet another tool, that they really have to rethink that they’re being agile and maybe the CEO is talking to the CIO and the CIO says yeah we’ve gone through the agile devOps transformation. Or maybe you know a data engineer is sitting next to someone who works on the company’s website and at lunch that person hits a button and deploys new code to production and the data engineer goes yeah that takes me three months to do and we take you know 300 meetings. And so the idea is agility in our organization and as a business concept comes from the leadership down, I think it does it shouldn’t but at least right now that’s who we target and there’s you know as we grow we’re going to work more towards having the individual contributor who wants to help their organization move to DataOps just by themselves but yeah right now we got to sell, you know we have to pay, you know people who work for us like to get paid and so we have to economically find ways to sell things to people where they can get paid so they can build software.
Eric Dodds 36:09
Chris, it’s so interesting. One thing that we’ve seen on the show over and over is that when it comes to running a data driven organization and you start to ask people, okay what does it really take to do that they never mentioned the tools right. They say you really need, I mean you hit the nail on the head, you really need the initiative to come from leadership and then you need alignment across teams, you know and these are things that I think that many people know and to some extent are intuitive. But it’s fascinating that you are building software for a context that needs so much organizational alignment across teams and sort of across types of data and I just love to hear sort of as a you know a nerd who enjoys understanding how products are built, how do you approach that problem because that’s a pretty interesting breadth of problems to solve across an organization and part of the problem is organizational itself and software can’t solve that.
Chris Bergh 37:17
Yeah I hear you and maybe … I don’t know, I’ve had a career and this is a problem that I know exists and I know it should be solved and my fellow nerds such suffering like I suffered and I want it to be solved for them and maybe that sounds goofy but it is how I feel. It is a big problem, right, because we’re saying that you should rethink how all those people on your data and analytic teams work. And that’s fundamentally an upstream problem and so let me give you a metaphor to explain that, so imagine that you’re sitting by a river and on that river on a nice summer day and you see some kids in that river sort of drowning and are struggling and you kind of swim in and grab them and pull them out and you’re like what’s going on? And then you’re sitting on the bank again and some more kids come by and some more kids come by and you’re suddenly always sort of pulling the kids out of the river and they’re always sort of like drowning. And you’re like man this is … and someone comes along and offers you a way to get faster from the shore to the kid you’re going to go, that’s the right thing to do, I gotta get the thing that moves me faster to shore so I can rescue these kids faster. And one of you gets up and starts to walk away and you’re like the other one says what are you doing while you’re walking away? And he says you know what, I’m going to go upstream and tell the kids to stop getting in the river. And so that’s the kind of problem it is. A lot of solutions are about getting faster to the drowning kids and I’m saying no the real problem is you got to walk upstream and stop the kids from getting in the river in the first place.
Eric Dodds 39:00
Sure absolutely. What a great analogy. What a great analogy. You started the company in 2013. Is it easier to have that conversation now because of the proliferation of data in every part of the organization?
Chris Bergh 39:16
Not really. I mean there’s always been organizations where data is proliferated and I think what has happened is the amount of knowledge of the techniques that have applied to data whether it’s NLP or AI or ML or big data or you know Spark or Hadoop, people have sort of digested those ideas, if you will. The market is … there are blogs and if you want to find out there’s just a lot more information to learn and it’s been formalized a bit so for instance there’s a bunch of master’s degree programs in data science and analytics that didn’t exist. And so it’s much easier for instance to find people who have been academically trained in the field than it was 10 years ago and there was like no one who was academically trained in the field 10 years ago. And so what that means is people are aware of all the things that you can do, right. And what that means now is that they have a lot more ability to do things. And they’re seeing the problem clearer. Because before when you’re like, I’m only on second base, and you’re like, wow, I third base is like machine learning and AI, I want to get there, man, that’s cool, the home run is AI, I want to go there. And people are running really fast. But then they run around the base, they find out that they’re still losing the game. They’re like, you know, what is it? Is it? Is it AI? Is it ML? What’s the thing that’s going to make this work? And I think that those are all parts of helping you deliver insight. But really, you need to build a system that helps you deliver insight, it’s about sort of how you work and not what you do, and that could be AI or ML or visualization or data or whatever that you do. And so to me, it’s that I think what’s been helpful in the change is that more people are actually doing it and seeing the problem.
Eric Dodds 41:05
Absolutely. You’re a master of analogies, which is great.
Chris Bergh 41:09
I’m making some up. The baseball one was a little mixed metaphor there. So I’m not sure that that was perfect.
Eric Dodds 41:14
Hey, it worked for me. it was great. Well, let’s talk about the future. We’ve talked about the past and what’s led to today and the way that you know, companies are solving problems that DataKitchen supports them in solving problems around data. But let’s just continue with a baseball analogy. What inning are we in in terms of data, and specifically the software that supports data driven organizations?
Chris Bergh 41:43
Oh, so specifically like what inning are we in for data and the transformation of companies to be data driven? I think we’re probably like in the second inning, you know, maybe third, it’s still early and how companies are going to transform to run on data. And the idea of DataOps and the set of ideas behind it, we’re kind of like, you know, we’re in the first part of the first inning. In some ways, it’s still quite early. More people are interested in it, but it’s still quite early. And so I think the data and analytics industry is … and I’ve been, you know, fortunate that I’ve been able to watch the software industry grow. And I think we’re still early and it’s a cool industry, in a lot of ways, it’s a lot more diverse, the problems are a lot more interesting. And so I’m still bullish that there’s a lot of companies and a lot of good we can do by helping people to be data driven. And we can also deal with the negative effects of being data driven, that we’ve seen in lots of places from the biases in predictive models to, you know, to the sort of privacy problems that come up with data. And I think all those things are good and signs of a maturing industry.
Eric Dodds 42:49
Sure. You know, one thing that’s interesting and I would love your perspective on this. So, you know, I think that there, to some extent for people who work in the technology industry, specifically sort of, in and around Silicon Valley, geographic or not, but you know, sort of the ethos of, of high tech and software is that leading indicators of a decade long trend often show up in pretty big ways. So two things that come to mind when we think about the world of data. So one would be the acquisition of Looker. Right? I mean, that was a really big deal. And Tableau, right? So you sort of have this, like these significant acquisitions happening in the BI space. And you can kind of get the sense, especially if you’ve been in and around data and analytics forever, you’re like, Okay, this is mainstream, right. So like self service BI is mainstream. The other one would obviously be Snowflake, you know, which is sort of like, Okay, well warehouse, you know, data unified in a warehouse. This is mainstream, right, Snowflake went public, it was massive. And in reality, the long tail of the market is way bigger than the penetration that any of those companies have achieved. And there are so many companies that just simply aren’t operating on that paradigm. And so would just love your perspective. I mean, in many ways, in sort of this Silicon Valley ethos like that is the standard way of doing things. And a lot of companies are very forward thinking, but there’s, I mean, huge percentages of the market that, you know, just aren’t even there yet. And would love your perspective on that.
Chris Bergh 44:33
Oh, yeah. And I have this really wacked perspective on it. So hopefully, you’re not gonna laugh. So I actually think that we’ve reached “peak tool” in data and analytics, and Snowflake is the example of it. And I think peak tool for a person to use. And why do I say that? So, I look at the evolution of tools for software people, and at some point in the early sort of ’98, ’99, the pinnacle of cool tools was a thing called an app server and there were dozens of app server companies, there was one called BEA WebLogic worth billions of dollars. There were other companies and it was a tool that people used and you know what it turned out that those tools got commoditized and the things that actually have value now in software are the tools that make up the group of people who do software better and so you look at the acquisition of GitHub by Microsoft, you know it was a significant number and so the value has changed, and so for instance a great tool that a lot of software developers do is sPyCharms it’s like an IDE developed by a European company and they’re like you know hundreds or thousands of people and they’re completely bootstrapped and you know I would argue that they have more people developing within you know it’s PyCharms and I forget the name of the parent company who does it, perhaps you know. But they’ve got … they’re probably more people using that tool than Tableau and yet Tableau probably sold for $15 billion. And so I think the market is going to change because people are going to realize that the value in the analytic team working is getting the team working, is getting in the Ops side of it as opposed to what you do, and so I think it’s just gonna happen like it happened in software. The individual contributor tools are going to get commoditized and going to be worth less and the things that make the team work are where the value is going to be created and so to me that’s a long game. I’m one of the few people who have expressed that opinion that we’ve reached peak individual tool, and you know the fact that Snowflake is worth 200 times revenue. Probably the people at Snowflake are laughing all the way to the bank at me right now. You know I saw it happen in software and it’s gonna happen and not tomorrow but it’s gonna happen in data and analytics.
Eric Dodds 46:55
Alright if anyone from Snowflake is listening to the show, please email us and we’d love to have you on to respond to that.
Chris Bergh 47:03
And one advice, sell all your shares man now and pay your taxes. *Laughter*
Eric Dodds 47:09
Okay chris now we need a legal disclaimer because well I guess we didn’t give financial advice.
Chris Bergh 47:16
I disclaim that. It’s meant in humor.
Eric Dodds 47:20
No it’s great I think JetBrains is the company …
Chris Bergh 47:23
Thank you JetBrains yeah.
Eric Dodds 47:24
…. yeah that makes PyCharms. You know it’s interesting you know DBT is a really interesting example of that you know where there’s sort of a lot of usage and just sort of a groundswell of activity that sort of resulted in like a pretty big valuation they haven’t sold and it makes the relationship between teams much easier.
Chris Bergh 47:45
Yeah and I think that’s a case in point right like who would have who would have thunk that Jinja template at SQL you could build a good company out of it right and like I actually thought it was like an anti-pattern five years ago that you didn’t want to Jinja template all your SQL and so the fact that it’s gotten so popular is fantastic and it actually goes to show you that like having because a lot of analysts are using sql and the fact that it’s stored in Git, the fact that it actually has common components that you can share and reuse actually is very helpful to people.
Chris Bergh 48:20
I think that that’s a case in point about why the system ,getting a tool that helps the system ,and of course it helps an individual be productive but you know if you’re you know and I I just think it’s really interesting because I was sort of writing the code of our product back then and I was Jinja templating SQL and I thought that was wrong. And I don’t know, I think I talked to their CEO once. I think just really cool that that became I find things that I thought were wrong that became right as actually a really good indicator that it’s a success.
Eric Dodds 48:51
isn’t that funny how that works. I mean that is kind of an interesting thing in general where I think about the conversations around the beginnings of Twitch and then pitching investors and people just saying this is like the dumbest idea I’ve ever heard of and you realize like no it actually wasn’t.
Chris Bergh 49:12
They did it right too. You know they bootstrapped. They self-funded and they got traction. They’re an open source tool and they got investment. And so I think finding a way to support yourself and your team and having you know I’m sort of an anti-blitzscaler. I believe time actually really helps you and so you know getting funding is not in my mind you know … unless you’re really sure that you need to build scale which is certainly an honest thing that you should do, getting funding isn’t the right thing and certainly you can get technical skills you can sell your technical skills and build your product at the same time which is kind of what we did and so it’s not actually that hard to financially make a company go.
Eric Dodds 49:53
Yeah it is interesting. I mean it is really neat to … and that’s probably a whole other episode just around the different ways that some of these tools that have become really big successes found their beginnings, because not all of them are sort of your traditional venture backed effort. One other note on DBT, just thinking about patterns that we’ve seen, and I’m thinking about a lot of customers that I’ve worked with in customer success at RudderStack and just thinking about the ways that they’re using different tooling, we have sort of the benefit of, of seeing all the infrastructure and tooling that surrounds their data pipelines. And one really interesting thing, thinking about DBT, that I haven’t thought a ton about until this conversation is that the companies who are really … the company’s running Looker, who are heavy DBT users seem to get a huge amount of value out of Looker, because of the underlying work on DBT, which I think really reinforces your point of, it’s an in and of itself, it certainly like solves problems, but it actually is a big enabler of teams that are separate from the people actually using DBT, which has been a very interesting dynamic to see. Because some people have the, you know, mindset of like, well, I’ve Looker, I have LookML, I don’t necessarily need DBT. But DBT can be a huge enabler. And it’s just really interesting to see that dynamic.
Chris Bergh 51:14
Yeah, yeah, I think both of them are common in that they think of the tools to express code that is human readable, human understandable, editable, and diff and mergeable. Right, and so that you can put it in Git, and you can actually use it. Whereas another generation of ETL tools, they tend to have these XML blobs, or confusing JSON syntax, or they’re binary. And so, you know, if you really think analytics is code, like we wrote in the manifesto, well, it should be treated as code. It should be in Git, and you should be able to diff and merge it. And so I think that’s a great way for teams to, because there are people in analytic teams who are better at thinking in abstract terms and looking at SQL and looking at code or even templated SQL. And there are some people who just want to have a UI. And so to do it, and so there’s another company who’s another ELT company called Matillion, who has a very visual tool that compiles the SQL behind the scenes for you, and they’re just as successful as DBT. Because they make it work. And so I think it’s an interesting dynamic between the sort of tools that are closer to code and the tools that are different to code and where things are going to come out. And in some ways the market is almost splitting, there’s a lot of sort of low code, no code, you know, self-service tools out there that you can do data prep, data science. And then there’s tools that intentionally want you to code and do whether it’s a Jupyter Notebook, or, you know, whether you’re doing a DBT model or messing around with your LookML. And so, I think, you know, probably, I’m more on the things that produce code because I like I think code is a much better way. Even LookML files are a much more compact way to understand what’s happening in a system. However, the visual UIs are certainly possible, and are certainly popular. And so in some ways, the analytic industry is kinda like breaking into camps. Sure, at the end of the day, whether it’s the low code or codish tools, it’s still code, it’s still got to be versioned, and stored and tested and deployed, just because it just runs in a different engine.
Eric Dodds 53:18
Yeah, well, we’re getting close to time here. And I have so many more questions to ask, but I love that you are sort of the anti-pattern, the anti-pattern voice. What tools are really exciting to you, you know, that may not be huge successes yet, but that you think are sort of expressive of the future that you see happening?
Chris Bergh 53:39
Well, I think there’s, you know, one of the things that we try to do in creating a category is to find what the category is. And so I think there’s a bunch of companies who have started around automated testing and production or observability, that are exciting. Of course, our product does that. There’s a bunch of companies that over the past three or four years that do model deployments. And, you know, we have capabilities in our product there. And there’s other companies that do automated data governance or data governance as code. And we work with it but we don’t do that. And I think the idea of thinking of things, as you know, putting the as code on and then thinking of them applying DevOps ideas. And there’s a whole sort of movement, I think, or set of ideas that come from software that are playing out into the data and analytics industry. And ML Ops, DataOps, Data Gov Ops, data observability are one of them. And the final one that I actually think is also interesting is the idea of a data mash, which is really the application of domain driven design into data and analytics. And so I think there’s actually a right way to think about how software has dealt with complexity, and the ways and methods and how those play out into systems that are built with data that I find just incredibly interesting. And of course, that’s what we did. That’s the purpose of our company. We just sort of stole the DevOps ideas and say, Move them over here. They’re really good.
Eric Dodds 55:06
Well, Chris, I’m sad that we’re out of time, because I have a ton more questions. I’m sure Kostas has a ton more questions. But that just means that we need to have you on the show in the future, which we now have proven that we actually can do. So we usually say, hey, let’s catch up in six months and see how things are going. And we had our very first podcast guest on recently from six months ago, when we started the show. So I know that we’ll talk again, and I’ll be interested to see which companies sort of get acquired or IPO in that time that we can talk about and sort of validate our anti-pattern hypotheses. But thanks so much for joining us. And thanks so much for the insights. It’s been really wonderful.
Chris Bergh 55:46
Alright, thank you for the opportunity, and you guys have a good rest of your day.
Eric Dodds 55:49
As always a great conversation. This is so specific, but since we’ve tried to limit ourselves to one, one or two things, I wanted to spend so much more time hearing about doing machine learning, you know, 15 years ago at NASA, trying to do air traffic control support. I mean, that is just amazing. And it’s actually really, really interesting to me that they had to scale the recommendations from the model back, and they got better results, giving a little bit more control to the humans. But that’s probably a whole other episode. So that that was my big takeaway. And what I’ll be thinking about because which I know is a very small part of the conversation. So Kostas, you hopefully have a takeaway that’s more relevant to the data conversation.
Kostas Pardalis 56:36
Yeah, although I think your takeaway is also, like, quite important, to be honest. And it’s not the first time that you hear something similar, right? I think it’s a common trend, that it’s coming to the surface with all our like the conversations that we have, especially with people who are in ML that the future is not, you know, black and white, like humans or AI, right. A future is going to be built by the synergies between machine learning, AI, and humans. And that something that’s I mean, it was clear also, like 15 years ago, I think that’s the takeaway from Chris. There are a couple of other things that I really enjoyed in our conversation with him. First of all, okay, it was amazing to hear about DataOps, and make it clear what DataOps is. And I think our audience is going to find this like, very interesting. And I really enjoyed the part of the conversation around marketing, and education. That’s super interesting. I think we should discuss with more people from tech marketing too and especially data related companies, like how they market and how important education needs. And the last part is how important collaboration needs when we work with data. At the end as what Chris was saying is that yeah, I mean, technologies will get commoditized. The most important technology that we have to build is technology that will help all the people who need to work over the data to work better together. So yeah, those are my takeaways.
Eric Dodds 58:00
Well, as always a great conversation. Definitely subscribe on your favorite podcast network in order to get notified of new episodes weekly, and we will catch you next time on the show. The Data Stack Show is brought to you by RudderStack, the complete customer data pipeline solution, learn more RudderStack.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.