Episode 117:

DX for Data Tooling with Taylor Murphy of Meltano

December 14, 2022

This week on The Data Stack Show, Eric and Kostas chat with Taylor Murphy, Head of Product & Data at Meltano. During the episode, Taylor discusses command line interfaces, why the developer experience is unique from user experience, balancing quality and quantity in supporting connectors, data ops, and more.

Notes:

Highlights from this week’s conversation include:

  • Taylor’s journey into data (3:09)
  • What’s been going on at Meltano recently? (7:28)
  • Addressing basic problems in data even with advancements in technology (12:23)
  • What makes Meltano unique in the space (16:53)
  • Why the CLI experience is important (25:37)
  • Quality vs quantity in supporting connectors (35:51)
  • What does data ops look like for Meltano (46:44)
  • Takeaways and closing thoughts (52:56)

 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Welcome to The Data Stack Show, Kostas. We talk all the time about how we want to have guests back on the show to catch up with them. And we were able to do that when we tracked down Taylor from Mel Tano. We had data on a while ago, I can’t remember how long ago, but it was a while ago. And it was a fascinating conversation. They’re building some super interesting things. And so we’re gonna catch up with Taylor, who leads product. And I think I just want to hear how things are going. I mean, they were kind of building this almost command line interface, you know, sort of configuration layer for the data stack, in general, across pipelines, orchestration, et cetera, which is very compelling, for a number of reasons. And so I want to hear how that’s going. And of course, they, you know, are big investors in the senior system, and all those protocols and that entire community. So, yeah, I’m just excited to hear how things have gotten. How about you?

Kostas Pardalis 01:30
Yeah, I’m also very curious to see where Madonna is today, Madonna is one of these products from companies. Both. That’s like when you see their endo like theirs, what they have done, how they have started, how long they’ve been around, and how hard they’re trying to build a business around that, like really makes you like, appreciates aquatics means like how important perseverance is like for building a business in light of something that I have to recognize them, then something that they should also like be very proud of right. So these folks just don’t develop, and so to me, so I wonder if you like where they are today. And one of the things I definitely want to discuss with Taylor is above like developer experience, like sounds fairly enunciate like the product. Yeah, I can relate to the competition out there. So yeah, I think we need to have a very interesting discussion about how you’ve gotten abroad, like the problem of data pipelines in a different way. Well,

Eric Dodds 02:43
Let’s dig in with Taylor and talk. Taylor, welcome to The Data Stack Show. We are so excited to talk about Mel Tano. Again, we had Tao on the show before. And we always say that one of the best parts is actually recording a show and then checking back in later. So we’re super excited to hear about what’s been happening at Meltwater.

Taylor Murphy 03:06
Yeah, thanks for having me. I’m really, really excited to have the conversation.

Eric Dodds 03:09
Okay, so how did you get into data? Give us the backstory.

Taylor Murphy 03:13
Yeah. So my background is in chemical engineering, and coming out of grad school, I decided I didn’t want anything to do with that and kind of looked for a way to use the skills that I gained in grad school in an interesting way. And kind of the data side really caught my attention. I joined a startup in Nashville that was focused on genetic testing, and airspace. And really, that’s where I grew a lot of my data chops. Prior to that I used MATLAB and excel was doing some relatively simple things like data modeling, but it was there that we had real business needs. That’s where I fell in love with regular expressions and built my Python and SQL skills, was there for four and a half years and then moved over to GitLab, where I started as a data engineer and was able to to lead the team, as the company grew from 200 people up to over well over 1000 people as it made its way to its IPO. And there was huge for my career because we were able to be very open about everything we were doing. That’s also where we started the Meltwater project, which is where I’m at now, I was able to join that team in 2021 is the head of product and data and had been there for coming up on a year and a half now as we’ve grown, the community grown the company and are really trying to make a really fantastic ELT tool.

Eric Dodds 04:23
Awesome. Well, first thing I have to say is, I don’t actually believe you that you were doing simple things in Excel, because anyone, anyone I know who’s fallen in love with regular expressions who started in Excel. My experience is that they were essentially building software in Microsoft Excel before actually discovering notebooks and then that sort of is, you know, great freedom.

Taylor Murphy 04:53
Yes. Basically, basically, yeah. was doing things you probably shouldn’t do with these tools because you’re unaware of software develop. I’m in no way this other industry had a ball. We were using, I think subversion for some of our code practices. And we literally had, like four computers that were running some of the models we were doing. It was this whole world when I actually started working with actual software engineers. I was like, oh, there’s a better way to do this.

Eric Dodds 05:14
Yeah, totally. No, I mean, I literally remember working with someone who would like to be at a computer, and they would just run stock Excel overnight. And it’s like, this is absolutely insane. I love it. Okay, the cost this is going to last because I love when I get to ask this question. But so chemical engineering background, and now you work in data, what lessons did you bring with you from chemical engineering? And do you still use any of those in your day to day work with data?

Taylor Murphy 05:45
I think so one of the I’ve talked to a lot of former chemical engineers, people who have gone from chemical engineering to other disciplines, a lot of them programming some, like went to law, the big things that I go back to, for my engineering training, are really about understanding systems and understanding how these pieces fit together. And things move. One of the biggest skills that I learned coming from grad school in particular was really how to troubleshoot problems, how to take, you know, I’m having a bad outcome, whatever it is, maybe this result doesn’t look good, or this equipment isn’t running. And to really have a disciplined approach to breaking problems down, subdividing them, and finding, okay, is the problem before or after this step? And it seems kind of simple, but it is a practice until you kind of see it work a few times in the real world. It can be, you know, kind of foreign to some folks, when they’re faced with a problem with a computer, they get a stack trace in their code, how do you then go and subdivide the problem? And that’s, I think, is the biggest thing, but then also just thinking systematically of understanding, like mass balances, and what are my inputs? What are my outputs? Where can I see things happening? And then how can I break the problem down even further? And it’s just it’s engineering, it’s problem solving. It’s taking, you know, what, you know, and maybe learning some new things to solve interesting problems.

Eric Dodds 07:02
Yeah, super interesting. Yeah. I’m always fascinated by that, because you think about it, and I am way outside of my expertise, but free radicals. And like when you think about chemical stuff, like, there’s behavior that’s extremely difficult to predict, you know, even in controlled environments, it’s like, oh, well, actually, a lot of those same attributes are true. All sorts of data as well. super interesting. Okay. Well, tell us about Matana. So I mean, the singer ecosystem, as you know, it’s, you know, sort of a huge amount of it’s worth to the work that you’ve invested in it. It’s growing, that’s super exciting. When we talked to the while ago, that was a huge focus, you’re also looking at sort of the ops layer as well. So tell us, you know, what’s been going on in Meltwater over the last six months, you know, from a product perspective? And then why don’t you just also tell our listeners like the vision of the company? Because it’s been a while? Yeah.

Taylor Murphy 08:07
So Mel Tano really exists, I think, to bring, like a better way of working with data to the larger data ecosystem. And frankly, to like software development, if, you know, coming out of GitLab, where this product came in a lot of the founding team was from GetLab. And kind of the DevOps principle was built into how we think about things. And Matano really was, you know, a data team that should build a data platform or do their work, modeled after software development. That meant and particularly in the GitLab, framing, like one, one tool that can kind of do it all. The big difference between GitLab and Mel Tano is git labs, like all first party stuff, and Meltwater has a lot of third party software that you can integrate with it. We’ve gone through, you know, a couple of refocusing moments in the company, when data took over. The project in 2020, really focused on the open source ELT side and saw a lot of traction with that, as we spun it out. We wanted to focus on this larger vision of becoming the foundation for your ideal data stack for any team’s ideal data stack. And what that meant is like, how do we work with the rest of the ecosystem? We’re doing a really good job with making the singer ecosystem better, enabling you to run tasks and targets smoothly and orchestrate them well, but there’s this whole other ecosystem of tooling that it can be hard to fit into the different parts of your stack. And so when we spun out, we started moving towards this larger vision of okay Meltwater can be the foundation you can bring in Airflow, you can bring in different tools, superset. Metabase, anything really, that’s open source or it has either a container or you know, it’s Python and soluble, and we made specific product choices to make that happen. We introduced a new command to allow you to run composable pipelines, it’s Meltwater run. So you can chain together your tap your tar Forget DBT Great Expectations and trigger, you know, some further downstream jobs. We’ve also enhanced things around the singer ecosystem. So it’s not just a tap and a target. You can also intercept data in between it’s called the stream map and filter data, anonymize it, drop data, do whatever you need to do and kind of give you that level control. And so, we still very much believe in that larger vision. But as we like, go to conferences and talk to people, we get really excited about this idea, like, oh, yeah, data ops, platform infrastructure. It’s exciting. They understand eventually, why people need it. But also we recognize it wasn’t meeting people where they were today, we were maybe a little bit further than a lot of folks in the industry actually aren’t. And most problems are like, Yeah, this, this is really cool. I would love to be able to do this. I’m still struggling with my extract and load, just like pure data movement problems. So what we’ve been doing here in the past few months really is just refocusing, doubling, tripling down on the ELT side of the story, and beefing up the SDK for writing taps and targets, enhancing functionality within Meltwater specifically around ELT to be a fantastic solution for that. But all the pieces are there for this larger story that I’m excited to, for us to get to the point where we can like earn the right to continue investing in tech, because I still, I think we as a company still believe very much in that mission.

Eric Dodds 11:20
Yeah, super interesting. Kostas and I were just talking about Coalescence, Kostas wasn’t able to join us there. But one of my big takeaways was as advanced as all the technology is, and you need to walk around the vendor booths. And there’s some amazing stuff out there. When you talk to the practitioners who are doing this work on the ground, a huge number are still trying to solve, like the fundamental challenges. A huge number are and so that really resonates because I think it’s easy, you know, I mean, you’re, you know, you work for a data vendor, you’re building out product, and all that sort of stuff. And it’s way easier for us to look into the future, because that’s part of our job, then for our customers, right, who, you know, certainly you’re doing that actually have a lot of pain points that they need to solve as part of their job today. And a lot of those problems are basic. Okay, so I have a question for you on that. Like, what? Why do you think with all of this advanced technology? Like why do you think the problems are still basic for a huge proportion of the practitioners and companies out there?

Taylor Murphy 12:35
Yeah, this? I love this question, because I think it gets to the, like an industry wide challenge. And I think this will change over time, as more data practitioners kind of come up through the ranks of different organizations. My hypothesis, and what I’ve seen in several places, and with folks I’ve talked to is like, data isn’t a strong consideration from early in the company’s lifecycle, or its overall Genesis, or maybe it’s a really old company, and they’ve gone through a lot of change. When data is kind of an afterthought, or seen as something of just like, oh, we can pay for this, we can invest, you know, X amount of dollars, and we’re gonna get some return with our data, I think it really is a does a disservice to the people on the teams that have to implement this kind of work. And, for me, data has to be kind of foundational to how you think about running more modern business, particularly tech businesses. But anything you’re doing in a company is generating some form of data, and you need to have that data lens. I, one of the reasons like not to get too highfalutin here, but like, one of the reasons I really fell in love with data engineering and chose the infrastructure and the hardcore, like low level data pieces, I felt it was so foundational to functioning and to a lot of these problems that we want to solve that one is like, great career stability, like people are always gonna have data problems. But two, I just saw, like, you can’t do all these fun data sciency things unless you have a solid foundation of good data engineering, best practices and workflows. So part of it, I think, is just you know, there are people who don’t maybe understand what the current state of the art capabilities are with data and how to use it to better operationalize all parts of their business. But that’s changing as people kind of come up through organizations and they get a little bit of power, they’re a head of data at a new company, they can affect that change. But people are just at different stages of this journey of learning. Hey, I enjoy building charts but now I need to learn a bit more about software engineering and how some of this work so it’s a maturing practice with professionals that are getting more skills and getting more influence across different industries every day, but there’s that kind of answer. Yeah, and that’s,

Eric Dodds 14:47
That’s super helpful. That’s super helpful and you’re the one who got to the root of it, whether it’s a newer company that is just trying to figure out, say product market fit or basic growth thing. You know, it’s really easy to de-emphasize data, if it’s a, you know, a legacy, you know, or sort of like, Legacy enterprise has been around for a long time. And they’re trying to become more data driven, you know, sort of different sides of the same coin. And that’s really hard. And that is so hard, right? You have to have the entire company committed to something that you work really hard at. And early, like early on, actually doesn’t bear a lot of day to day fruit. Right? It just seems like extra work. But you’re investing for the future. And that’s, that takes a huge amount of commitment and foresight from a company to be able to do that.

Taylor Murphy 15:39
Yeah. And I think there’s a parallel in software engineering, like, are you investing in a really good engineering culture that works well with your product team and can deliver, bring back insights and have just a positive feedback loop? It’s not, it’s not a one time thing where you put in some resources, and you get something out where it’s really functional, both on engineering, both on data, and there’s just so many similarities, I think, between data teams and software engineering teams. It’s that investment, that kind of like a positive flywheel across the entire organization. And I think in the early days, a lot of companies are a bit of a leap of faith, if they haven’t seen it in practice. I’m hopeful that we see we have more people that like our true believers, and a positive sense, you know, they’re informed by data and their experience. But you like to be able to articulate why it’s valuable to invest in data in these processes.

Kostas Pardalis 16:28
And to build that flywheel.

Eric Dodds 16:32
Yep, I love it. All. Right, Costas. I could keep going. But please, please jump in. I know you have so many questions. Yeah.

Kostas Pardalis 16:41
So first of all, I’m super excited that I have someone from the product side. Because I can make like, you know, like, some really hard questions. Like, for example, why someone should choose Madonna today, instead of like, something like Fivetran or BI or Stata. Right? Yeah. So yeah, why, like, what’s so much better above like Montana kimberlite? Let’s talk about the other solutions out there?

Taylor Murphy 17:12
Yeah, our focus right now is on a very particular persona. So if you are a data engineer, or very data engineer adjacent, who is comfortable on the command line, isn’t afraid of Python stack trace. And, you know, once that control over your software, that’s when Meltwater is gonna be a really good choice for you today. We’ve kind of seen that gap in the market where, you know, there are good point and click like solutions for day one, situations to move your data, when we’ve been talking to a lot of users. And hopefully potential customers, as we build out our managed offering, the pain points that we’re hearing, our cost is rising, and I don’t have a good sense of why or how I could even improve it. And there are problems that that crop up that I can’t fix, and I’m stuck in some sort of support, hell, as it were, and we what we’re aiming to do is kind of give users control back over their data platform, but in the way that we are still able to help them solve problems when something goes wrong, and something will go wrong. I think that’s something that other companies don’t necessarily like to admit, like, oh, we solved this problem, data is moved, don’t worry about it pointing, click and you’re good, something’s gonna change something about the system outside of your control is going to change, and you have to be able to adapt to it and respond to it. So well, Todd is going to be a good choice for you when you want to understand the code that’s running in your system, whether it’s the tab for the target, or even DBT, and have that transparency. We’ve also built in kind of the software development, best practices into the product. So there are Yamo files that define your configuration, the state of your system, and if you’ve worked with software engineers, they’re going to be begging for tools like that, because they understand the value of version control. So that’s a long winded answer. But the day one experience of Mel Tano is continually improving, but Montano is going to really excel today for the day to day problems that you’re going to encounter when something is changing, and you need to adjust your system and you want to test it and move forward with confidence.

Kostas Pardalis 19:10
Good. Yeah. Makes total sense. Like that’s I’d love to discuss more later about, like the developer experience and why it’s so different. But why do you think that like Fivetran or bytes, or since de DOT? They didn’t go after, like an experience that is, let’s say more native to the deadlines, because at the end, it’s not like Fivetran or BI this like used by someone else, right, like inside the organization like you will end up like the pipelines is like, the core of the work like with a data engineer is doing, they have like to deal with these tools. Right. So why didn’t they do that? Yeah, I’m,

Taylor Murphy 19:53
I’m curious about that as well. I think there’s a couple of hypotheses I have around that one is that you have the advantage of coming into the market a bit later, where these companies are a bit more established. And, you know, previously it had been data analysts that had been doing a lot of this work. I think data engineering is still a relatively new title, I don’t think I don’t think data engineers are ever gonna be called the sexiest job of the 21st century. And as I do more product and have like these, you know, pseudo sales conversations and talk to users, it’s very easy to get pulled into the idea of, oh, okay, you’re facing this problem, we’ll just build this UI for you. And you can kind of point and click problems will, will kind of be solved. But you’re not actually you’re talking to like, you’re talking to the buyer, but not necessarily the user all the time. The advantage that Meltwater has had in the market is, I think, for three, almost four years now, it’s been completely open source, free to use and has been able to organically kind of attract this audience of data engineers. And as we talk to them, you know, they’re the ones implementing these products. And yeah, they want the convenience of not having to worry about things, but when they do have to worry about it, they really need to solve some of these problems. And so when we talk to people who are paying customers, you know, a five train of stitch, and they’re like, Yeah, works for some of these things. But I would really, like, you know, X, Y, and Z. And I think there’s a place for Miltonic to come in and give them a lot of that control back and hopefully be a better experience that they can build the kind of the foundation of their entire stack

Kostas Pardalis 21:30
on. Yeah, it makes a little sense. But, I mean, Milton is still trying to build like assassins, right. Like, it’s like a self-serve solution that you cost for your customers. So you still have to like daycare or let’s say like the infrastructure, like the issues there you have, you need to run the operations around like the technology itself, or view them like someone can do on their own, they will like to use the open source version of it. But at the end, like someone was going to be more done, or they’re going like to be like for something that’s hosted by you show me those like what these like, also the similarity with something like fire town or like, even nearby, because I’m saying like given their bike, because they’re vital to have like an insurance version of it. But at the end, like that’s how they also make money. You go like to their hosted version, and you pay for it, right? So things will go wrong for you , like Salesforce at some point will be like, No, we’re not going to reply to your request, like what you do, you know, and suddenly, like the pipeline breaks, right? So what is like, different in the experience that needs to be embedded, let’s say for like yourself, like for sorry, for a cloud posted like a product that makes it, let’s say, much more convenient, or like native as an experience for a developer compared to let’s say, data analysis? Yeah.

Taylor Murphy 23:00
So a couple of thoughts there, we are doubling down on the command line interface as the primary interface, at least initially, for a managed offering, which is what we’re talking about with our early alpha users. And full transparency, we’re, you know, in the process of building this, we’re pre alpha, but we have some folks lined up that are excited to use it. They’re comfortable using the command line interface to interact with the product, there will be an API as well, they need to kind of orchestrate things themselves. And the UI will come eventually, at some point, because we’re just going to need some form review, I’d check basic things. And not everybody always wants to go to the command line to check things. But in terms of getting your work done, it’s going to come from the command line interface, primarily. The other piece is transparency around what’s happening within the managed platform, most likely, we will at least have a source-available version of what’s what we’re actually running on to manage, like, the code itself will be proprietary, but you can actually see, here’s the code. A lot of this is informed, I think, by our Git lab history, where GitLab is you know, they have a free open source version of GitLab. And then everything else is their enterprise edition. But you can see all the code and you can actually make contributions if you want. And I think that’s a really exciting model. Because it allows people there, there are certain groups of people that will be able to say, hey, I want you to go ahead and manage it. But I’m also like, smart and I can figure these things out. If I can, you know, help you quickly figure out a bug, it’s gonna help me get my support ticket figured out faster. That’s the second aspect. And then the third aspect is, hey, here’s the actual code that’s running for your tap and your target. If for whatever reason, you need to fork the, you know, tap Snowflake or target Postgres or whatever happens to be you can fork that still run that fork on Mel Tano and then we can work with you to merge it back into the main branch or whatever connector Altana our ourselves are managing and allow us to To quickly solve their own problems, because there’s a lot of downstream components that rely on data engineering, instead of saying, hey, there’s a problem with Fivetran. And it’s out of my hands, some folks may want that, because it does kind of shield them from whatever political pressure they may feel inside. But for folks who are like, this is mission critical, and I don’t really care to worry about the deployment of the stuff, but I do like to know what code is actually running. And if it’s Python, and it’s built on our SDK, like, it would be pretty quick to change it. So those are the kind of the paths that we’re threading of what makes a better developer experience and invites people into kind of how we’re building this product in business. You know,

Kostas Pardalis 25:36
That’s super interesting. So let’s start with the CLI experience, what do you think CLI is like, so important for a developer. And it’s more, let’s say, more important than a mug, a user interface, like a graphical user interface? Yeah, I’m

Taylor Murphy 25:53
definitely speaks to a different audience, and definitely a different persona. When you’re on the command line, it is very utilitarian. I think there are fun things that you can do to make the user experience more enjoyable. But there’s nothing generally, it’s a well designed command line, it’s like getting in your way of getting the job done.

Kostas Pardalis 26:17
It speaks, I

Taylor Murphy 26:18
i think it communicates hopefully, to people that were like, we’re here to get the job done. And kind of get out of your way. And that’s why I fell in love with DBT as a product, because I’ve, you know, with GitLab, has never used DBT cloud, I’ve only ever used DBT core, used it from the command line, it was just a very comfortable interface. And then it also works with all of these other tools that you have on the command line, in Bash, built off of the Unix philosophy of piping things together. And so I think it just speaks to that audience. And it’s also for me, as I’ve learned more and more over my career about software engineering, it’s like, Oh, if you have a good, you know, kind of API back end, you can build whatever UI you want. But you can also build this command line, it’s quicker, you can iterate faster. And if you want something, it’s less work. And, you know, building this whole UI, so it didn’t, it enables us to kind of move and iterate faster, and invites people in again to kind of contribute, if they have ideas, some of our features and flags and default commands were contributed by the community, because, hey, I need to be able to add this to my project. But I don’t want to install it. Cool. We took a PR for that, to have a no install option. Now it’s available to everybody. But that’s how I think about it.

Kostas Pardalis 27:28
Yeah, that’s super, super interesting. And like, how do you like from a product perspective? Like I mean, you know, there has to be, like, so much work down and like research and processes around like user experience, how like to run apt is like to figure out what’s the right color there? You know, like, all the stuff that we know about, like building, let’s say, like, very graphic art, let’s say experience for the user. But what about the CLI? Like, how do you figure out like, what’s a good experience? How do you design CLI? How do you do that? Yeah,

Taylor Murphy 28:04
I think we’re trying to figure that out. I think there is definitely prior art that we can lean upon. I’m you know, for me, personally, I was a data engineer prior to this. And now, this is my first true product role. There’s a bit of learning on the job. But the benefits of the way I think we’re building Mel Tano is that it isn’t open source. We have this, and it’s a great way to get that feedback. Talking to people is some of the best ways that I’ve found to just figure this stuff out. Like, my take away from being you know, doing product and talking to other product managers is just like, the more you can talk to your users, the better off the product will probably be because you’re integrating all of that information. We also invite people in like, well usually have specs around, hey, this is what we’re thinking for this specific functionality, whether it’s like a new command and like what are the sub commands? What is the structure? We also had fantastic engineers who bring their software engineering skills and say, Hey, this is what I would recommend. What do you think of this and me going? Okay, yeah, the problem we’re trying to solve does this, you know, here’s kind of the overall ergonomics. So yeah, it’s small iterations and then doing it in a way that it’s not, you know, fully irreversible. I think we needed to roll something back.

Kostas Pardalis 29:19
Yeah, I love that. Like I hope one day you write like a blog with something like the experience of building a shed light like I truly believe that there’s nodes. I think there’s a lot of experience with people that have built stuff out there but I don’t think that from the perspective of the product discipline, we have codified this information, a wave of like people can go and like clear, right, like and find the best information out there. So I don’t know if you ever do weeks, please let me know. I’d love to read something that’s super interesting. It’s something I like to read a lot. So like, I’m very unique like personality, like how we can define developer experience. Somehow we can build while we see like tools, you know? Yeah. Am I moving rocks or the way? You know? Yeah,

Taylor Murphy 30:06
I’m starting to, you know, doing a relatively new job that I think you’ve learned all the things you don’t actually know. So I literally just started reading The Design of Everyday Things, can’t remember the author’s name, but excited to dive more into it to design more broadly, just kind of bring everything to bear, because a lot of like, what I brought to the product job is, you know, at one point, I was in the target persona. And now I get to talk to a ton of people that are in our target persona, understand where my experience is different from theirs. And that’s what has made this really enjoyable. It’s like I get to help build a product that is solving problems that you know, I experienced personally in the past and that a lot of people are experiencing today.

Kostas Pardalis 30:47
Yeah, that’s the fun part. Being product is the ultimate comparison. And that’s fun, but we’ll discuss that another time. Which today, let’s stay positive. Right. Fifth. All right, so, okay. I, I think like, we’d have a good idea of how the experience of working with McDonald’s is different. One of the very interesting problems when it comes to electric deal solutions. That has like engineering, product and business, let’s say consequences, like depending on what kind of place to follow there. It’s like the connector, right? At the end. Like without the connectors, they have no idea why give me like to pull data from somewhere in like Bulldozer data lake somewhere else. And there’s like a lot of discussion about there’s like a long tail of connectors out there. There are some very important connectors out there. How do you deal with that? For example, I would like browsing, like the website and reading fast. I saw the lengthy comparison between like Fivetran and air bytes. Like you claim that you support 300 Plus, like go Mactel, for example, compared like, I don’t know, 150 or 200? Plus, like, there are others. What does this mean? Like? How, like, how do you adapt, like in the situation where you have like three continents to make their dollar? Well, I walked out and I was like, why do we need those who’re coming?

Taylor Murphy 32:17
Yeah, so that number comes from we have our it’s called the MEL Tano hub, where we’re listing all of these connectors. And to be super clear, this is our understanding of the larger singer ecosystem. So when Meltwater was started, singer was already a project, initially supported by stitch, now Talon. And when we say there’s 350, plus connectors for Matano, there are at least, you know, 300 factors that we’ve found in the wider community that other people have made that conform to the singer specification. And that’s where the power comes in, in these longtail connectors you can write a connector, and as long as it meets the singer spec in terms of the data that’s being output from this tap, it can be accepted by any target. We for the longest time really took a somewhat hands off approach to the maintenance of the connectors themselves and said, Okay, we’re gonna address some of these problems around transparency around testing around building new ones. But we’re not taking on the burden of the challenge of maintaining these as kind of first party connectors that have actually shifted, we’ve now taken on, we’re starting with a lot of the database types and targets. But it really is like a decentralized, you know, open source community where people say, Hey, I have this connector, I’m gonna build this tap, and it solves my problems. Maybe it’s all yours. And so you might have to fork the code. We are, you know, in an effort to be more competitive with some of these other tools. We aren’t, like I said, taking over the maintenance of these, the database taps and targets. But they are built on top of the MEL Tano. Singer SDK, which is really a lot of people’s first introduction to Meltwater. They’re like, oh, I need to build this custom connector for whatever reason, whether it’s some, you know, weird API, or they just want to pull some data internally, and they some for some, whatever reason, they couldn’t find it. People find this a lot through the SDK. And so we are investing heavily in improving the SDK, we recently brought a batch message type, which basically means instead of one, one key part of the singer spec is that every record is output on standard out in a new line JSON format and says, like, record, and here’s the data. That’s good, especially when you’re maybe coming from an API, but for database sources in particular, that can obviously be very slow. So this batch message is basically a pointer to a file, where we’ll say, Hey, we’re gonna extract all the data, write it down to a URL, the batch message gets sent to the target, and the target knows where to go pick up that file. And we’re seeing, you know, 30 to 90 times x Dataflow provement doing than that. Yes. So it basically means there’s a lot of there’s an active community, but I think that’s one of the differences too, if you look at Fivetran, they maintain all you can’t see the code, when they’re gonna be limited in kind of the long entail, if you can support everybody is in a better place than Fivetran. Because they are open source. They are currently in a mono repo. And so everything kind of has to be in their main repo. And I don’t want to completely misspeak, but I don’t know that you run forks of connectors within the main airbike platform. And whereas we’re just saying, like, it’s good to have a decentralized system, and that’s where Matana hub comes in to show just how active the community really is. But it can be really hard to tell if someone on the ground is sick or dead, like I go into this slack channel. But a lot of what you don’t see is people just using it day to day pushing, you know, gigabytes of data through these connectors, because it’s not as transparent. And so that’s what we’ve really tried to do with some of the features that we’ve brought into the market.

Kostas Pardalis 35:50
Okay, that’s super interesting. And like, okay, so how do you balance? Like, quantity and quality of connectors? Right? Because I’m pretty sure that if you took the Fivetran, they will tell you like, yeah, they would have everything close, but like, the quantity of our connectors is like, super high. Yeah. When you are allowed, like everyone likes to go and contribute out there, which is the complete opposite of that, like, we’ll get anyone to do whatever they want, like with the gold that they contribute there. So how do you balance that, like how, let’s say, coordinator of these decentralized hub of like creating connectors can help, like ensuring the quality of this connector, because at the end, it is important, right? Like, if I’m a new user, and they see out there like five different implementations of a connector for Salesforce, which one do I choose? And why, right? And what if something goes wrong? Like, is it McDonald’s? Problem? Or is it like the contributors problem? And if the contributor does not reply, you know, like, you have all these open source like, standard issues, right? What do you have to deal with? So how do you do that? Like a meltdown, right? Yeah.

Taylor Murphy 37:07
I think that, frankly, we’re going to figure that out. It’s absolutely going to be based on the SDK. And so what we’re seeing with that is, we’re getting a lot of good contributions, as people will maybe discover weird quirks about a particular API that they’re working with, they’ll implement the fix in their connector. And that improvement comes into the SDK. And so likely like Meltwater is not going to offer support for connectors that weren’t built on Meltwater SDK. But as it makes sense to say, like, hey, a lot of our users are using Facebook or Google ads, and a lot of the marketing ops type data sources, if they’re built on the SDK, I think we will absolutely start to take on the maintenance of those. Because that solid foundation, you know, one improvement for a particular connector can spread out, across all of them. I think the other balance is recognizing that people do have different quality and stability needs. Some folks are fine with a community tap that maybe isn’t fully tested, but they can just try it out and see and see what happens. One of the things that I haven’t mentioned about multilotto is that it has this native understanding and built in features around environments. And so if you have a staging table, or if you want to write locally to duck dB, you can test out the quality and the capabilities of different tools, particularly, you know, taps and targets in a safe manner. And then if you like what you see, you can just run that in production and override a certain configuration. And that Meltwater makes it easy. And that’s kind of like the software development principle of having testing and continuous integration and things defined in code is you can have the safe space to test things. So I think, for us, as we actually build that manager and actually start to onboard customers will have these conversations around like, well, what are the data sources that you want? And we’ll just kind of, we’ll kind of go from there. But the thing that’s interesting is, a lot of these connectors actually work really well for the majority of people’s use cases. And it’s only when you start to really push the boundaries hard on some of the data volumes that it starts to maybe be challenging for some particular data teams. And so I’m just excited to have those conversations and see what we need to do. But like, it’s absolutely gonna be based on the SDK.

Eric Dodds 39:21
Do you have a question that for both of you, because one thing that’s interesting, because both of you have such deep experience in this world, but one interesting thing is, if you need something, let’s say, you know, modified or custom that isn’t offered out of the box by black box, SaaS provider, Allah, you know, Fivetran or whatever. Like, one of the challenges I think a lot of companies run into is like, Okay, well, we’ll run sort of these like core pipelines and like a Fivetran and user interface and set it and forget it, but then you go From there, and it’s like you build something custom or even use open source technology to manage something custom. And so now, you’re managing the same basic data flow across, like, very different ecosystems. And it’s basically the same process: orchestration becomes hard. Like there are a number of challenges there. One thing that’s interesting to me just hearing you talk to you, Taylor, is that, okay, so you have, like, let’s say, supported connectors that are, you know, or tabs that are like core or whatever. But if I need to develop something custom, I’m not actually going to a completely different ecosystem. That’s like, fairly compelling. Is that part of the thesis and Kostis? Does that make sense to you like having built similar technology? Yeah,

Taylor Murphy 40:48
so I would say absolutely, part of the thesis is, if you are quickly able to solve your problem, and then fork the code and run it, as long as it conforms to the singer spec, and I’m sure we’ll have some guardrails around that we’re validating that output singer data. But you should be able to run that with the Manage Meltwater platform, because you could run it with self hosted multilotto. So with a managed platform, you should be able to run that. And that way, you aren’t forced to either go, I’m gonna go buy another SASL that happens to randomly do this, or I’m just gonna stand up some random python script. Yeah, we can help you like, have those best practices, while quickly solving your problems. And then once it’s up and running, you can kind of behind the scenes, like incrementally bring it back into the fold of like the well maintained mature data process when you don’t have to, like reach for these other

Kostas Pardalis 41:36
tools. Yeah. For me, what is very interesting with that, I’d like to know what Taylor was saying about the developer experience. Like, if you would like to define developer experience, you have like, two very important interfaces, right. One is the CLI. And the other one is the SDK, right? So there is a reason that the developers need DOCSIS and both of them like, okay, we can talk a little about that. But having access to an SDK that you can use to modify the behavior of the system in a predictable and safe way, it’s super important when we are talking about something that is consumed. And it’s used as a system by a developer. Now, obviously, like a developer would prefer to have the connector, they’re working, right, like no kind of like to write us local area, but wouldn’t like that, right. But that’s why you’re an engineer, because there are edge cases that are like, issues that you want to care about. That’s why you’re in the company. And you might like to be able to extend the behavior of the system that you’re working with, like that, like I think, like, a very big difference between developer experience or like user experience is the like user experience with like, super guardrails, right? Like, what you can do on the user interface is defined by the visual components that are there with predefined behavior? Well, when you’re talking about developers, you also need to give them, let’s say, the tools to extend or change somehow, like the behavior with the system, right? And, yeah, like it makes total sense. When you’re working with this persona. Now we can debate if this persona is like the best persona for this problem, which is moving the data around. My opinion is that it is, but someone else might have, like, I don’t, like I found might have a different opinion. And that’s, like, fair, right? That’s where we’re competing out there. But yeah, like, I think, for me, it’s like a very interesting approach to solving the problem. Because always like, traditional, like a big problem, like with these platforms, was that okay, this is an open set of connectors, like how do you maintain that, like, that’s not scalable, like, you cannot have like an organization with an army of developers out there, who has made daily Inc, like every silly, like, connector for an API on Spinnaker. And by the way, it’s super hard to find people who want to do that job culture. Like anyone who has tried to hire developers, we are going to maintain like connectors, like they know how hard it is to do that, right? So building like this developer experience, things like you’re responsible, like, how do you come to like a scalable solution to the problem of moving data around? So

Taylor Murphy 44:29
yeah, I think the point that really stuck out to me what you were saying was like, modular and like being able to extend it. And it’s definitely you know, how we kind of built Montano generally, recently, we’ve taken an effort and this is moving away from the singer side a little bit, but out of the box with autonomy, you can run dB t, you can run Airflow, and we’ve been that’s been pretty consistent for a while now. But now, we’ve developed what we’re calling the EDK, an extension developer kit, and basically solving the problem. Have I wanted to change how Airflow or even DBT was integrated with Meltano? Previously, it took a lot of effort to understand both the code and the Altano code base. And then like what other weird repos we might have had for how DBT gets installed or Airflow gets installed. And then also, like the Airflow DAG generator that we had, the EDK comes in to basically have a single repo with a, you know, similar developer experience to the SDK to make it easy to add new components that run well and Altano. So we’ve rebuilt it during kind of preview mode, and they probably won’t be in GA for a while, but they’re on the hub. So for DBT, all of the adapters that we have for Airflow for superset, and we have community contributions around Daxter. Elementary and a couple of other tools that are built with the EDK give you like, it’s basically the wrapper around how this tool interfaces with multilotto. And it’s I’m really excited because it paves the way for the future for this longer, like the data ops platform that we’ve talked about and hinted at, and it with our managed offering, like you’ll be able to run DBT on cloud as well. It’s not just for the yield side of things, even though that’s what we’re focused on. So that’s all in an effort to make it, you know, your data stack, like more composable. And really good developer experience.

Kostas Pardalis 46:20
That’s super, super exciting. Okay, I’m going to stop asking questions about developer experience and connectors because we can continue doing that for days. And I have like one last question. And then I’ll give the meat back to Eric. So you mentioned a number of additional tools out there outside of like the ETL and ELT electrical connectors. So, there is this new concept of like data hubs, right. And I would assume that is the conduct that is the context of data ops that also includes orchestration and like quality or like modeling and like all that stuff. So I want to ask you like, what these data jobs do you like for Madonna, and how it relates to McDonald’s sales as a brother?

Taylor Murphy 47:09
Yeah. So data ops, I think, I really give a lot of credit to the folks from data kitchen, because they have their data ops manifesto, which I’ve looked at a number of times across my career, and frankly, I think does a fairly good job of describing the idea of the philosophy on it. The majority of the pieces that are or the items that are listed, I think they have like 18, or something like that, a lot of them recognize that the data ops term is really about processes around people. A small part of data Ops is a technological solution. But the problem I think, that data ops as a term kind of addresses, and is just about recognizing that a lot of data problems have people problems, and that there are there’s a technological component to it. And that there’s a way of working that enables you to achieve the outcomes you want faster, more stably with a higher level of quality. And frankly, in a way that’s maybe more enjoyable to do. I think the reductive way of talking about data Ops is that oh, it’s just it’s devops for data, that doesn’t fully recognize that there are stark differences in working with data, particularly around orchestration managing state, and that things like CI CD are great, but can be way more challenging when you’re talking about, you know, working with a Snowflake database, or working with a terabyte, you know, multiple terabytes of data. So for me, data ops, like I think simply is just, it’s a bit of a marketing term, talking about a way to to work better as data professionals recognizing that building your data platform, and building your data practice is a lot more akin to software engineering, than it is to maybe in another discipline for Matana. Specifically, I think we really lean into that software engineering side of things of building your data platform, like it was a software engineering product. And I think that manifests and how the features of the product look and how people experience them through the YAML files from the command line interface.

Kostas Pardalis 49:13
But yeah, I think

Taylor Murphy 49:15
a lot of conversations I’ve had with folks, people like they’ve heard about data, often they get excited, but again, it comes back to like, what problems are you experiencing? And for us, there are better ways of working. And we believe a lot of those are working more like software engineers than working like

Kostas Pardalis 49:29
another type of tech worker. Yeah. That’s great. I think we should, like, try to cover in episodes about data ops and like, just, like get some awesome. Yeah, and you should be part of the bundler. Like we should do that. I think it’s very interesting when we have like new terms and during light and industry, and being able to, you know, clarify Like, make it more clear of why what this thing is, right? Because that’s the, that’s the problem you see, like, and that’s, by the way, a problem that is caused a lot by marketing. Because the Dems themselves, like, okay, they have their own meat, meaning like, whenever, like a new term arises, I think there is a reason for that. But marketing is rank, like really aggressively capitalize on that and use it as a way to communicate something. And many times like problems arise from I’ve seen like a lot with concepts like data mesh, for example, right? Which is like, okay, like, if you read, like, on the end, what the data is, like, make make sense, what you’re reading rights, but you have like, certain like, Grace, even in some cases, like bad marketing happening around them that like, it really like destroys, like, the semantics behind it that are communicate to people and that helps the industry at the end, right. So I feel like if we can have like these casuals with people looked like experienced, and they have like a very wellness like, approach from not, again, it’s not, I’m not going against marketing share rates, but just trying like to describe reality, I think it’s gonna be very beneficial, like for the people who are like listening to the show to do that,

Taylor Murphy 51:19
putting our product hat on, I think just like focusing on the problems that people are having, and that data mesh data ops, data contracts are tools that are trying to solve problems. And I just like being honest, that is like, a tool is not gonna magically solve your problem, there’s always going to be some sort of people, aspects that you have to deal with. But I do believe that technology can enable better ways of working. And so I don’t know, I don’t know what conversation we would have, the full definition of this is what data Ops is for, you know, Forever and always, but inviting people in and to understand, like, these are the problems we’re trying to solve. And this is like how this came about, I think would be very beneficial.

Kostas Pardalis 51:56
Yeah, let’s do that.

Eric Dodds 51:57
I love it while we’re at the buzzer. So I have several more things to discuss. But we’re gonna have to do another episode. I will say right here at the end, though, this episode has confirmed my theory Costas, which I opine to you about in a recent chop talk episode about logic moving further and further down the stack. And I think CLI is the best example of that, right? Like, it’s going lower and lower. So that’s, it’s been very validating, in terms of that theory, about business logic, you know, being expressed as code. So thank you, Taylor, for validating one of my wild theories. And congrats on all the work. You’ve done at Meltwater know, what an ecosystem, I mean, amazing contributions. And best of luck as you continue to build.

Taylor Murphy 52:49
Thank you so much for having me on, I really enjoyed the conversation. And glad I could confirm your hypothesis around the industry.

Eric Dodds 52:57
What a fascinating product. And I’ll have to say my big takeaway is that, you know, you don’t hear this very often. But Mel Tano, as a company, has a huge vision for being a data ops layer, for the stack. But they really listened to their customers and went back to the problem, the main pain point that their customers had, you know, which is actually on the pipeline side of things. And so, I just think that takes a lot of courage as a company to say, you know, we have this grand vision of what we set out to build, but we’re probably too early for that. And so we’re going to listen to our customers and go back to those components of the product and make them better, so that we can better serve those customers. And I was just really impressed by that. I think that’s such a refreshing thing to hear. Doesn’t sound as cool as you know, we’re breaking new ground with a data ops lawyer, which they actually are doing that. But they’re also just making a lot of things way better about their core product and the core problem they solve and what you’re hearing from customers and so I just really appreciated that.

Kostas Pardalis 54:18
Yeah, 100%. I think what you just described is, let’s see, proof of the quality of the people that run both the business and the product, the company, so that’s not easy to achieve and I think we should congratulate him on that right. And I think it’s also like you can see like cavalerie abilities to have someone leading your product function who comes very deep in Orleans and understanding the problem space and makes it awesome that this is happening here because they learn was a practitioner, like he was Dealing with these flakes. So as you can have both eyes with a user, and she can build something that iterates much faster on like, you know, like converting the solution to, like multiples faster to the solution, like compared to other like, products out there. So, yeah, that was super refreshing and super encouraging. And like, it was lovely to chat with him and share like, all the like opinions and like, the knowledge that he has on how to build a product, that it’s going to be successful in the long term in terms of trying to capitalize on the hype day, which is great.

Eric Dodds 55:40
Yep, I love it. Well, if you enjoyed that many more great episodes and guests, come, subscribe if you haven’t, and we’ll catch you on the next one.We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.