This week on The Data Stack Show, Eric and John welcome Taylor Murphy, Founder and CEO of Arch, a data analytics platform. Taylor discusses Arch’s evolution of its vision and delves into the challenges and innovations in the ETL (Extract, Transform, Load) space. The group also discusses the impact of AI on data analytics, the importance of modern data practices. And how AI can enhance data teams’ capabilities, emphasizing the necessity of human expertise alongside AI. The episode also explores the commoditization of data storage and compute costs, the bundling of data services, and so much more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:06
Welcome to the data stack show.
John Wessel 00:07
The data stack show is a podcast where we talk about the technical, business and human challenges involved in data
Eric Dodds 00:13
work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. We’re here back on the data stack show with Taylor Murphy, the CEO of arch, and Taylor, you have been on the show before, actually almost two years ago, as crazy as it sounds. So welcome back. Glad to have you back again.
Taylor Murphy 00:43
Yeah, thanks for having me.
Eric Dodds 00:45
All right. Well, give us almost two years, give us the quick fly by of what happened. So last time we talked, you were working on meltano and arch is something new. So what happened? Absolutely, yeah. So
Taylor Murphy 00:58
Since that time, I’ve actually taken over as CEO of the company, and kind of expanded the vision of what meltano And now arch data can do. We rebranded the company to arch to signal, hey, we’re this larger platform and kind of the foundation of your business success. Arch today is an AI data analyst for business leaders. So we help bring the outcomes of good data teams, and what good data teams can provide to more organizations. And we’re doing this with an AI powered all in one platform that comes back with human experts in the loop.
John Wessel 01:31
So Taylor, before the show, had a really fun time talking about the AI agent model and the different philosophies behind that that people have. You guys have a really unique philosophy on that, so I’m excited to dig in on that. And then what else are you excited about digging in on?
Taylor Murphy 01:46
Yeah, I’m absolutely excited to talk about that, I think more broadly too. The reason AI is so compelling at this point in time is we now have a good understanding of what excellent like modern data practices are from the explosion of all these different tools, and we also understand that what businesses need and what moves the needle in a business are always going to be based on the metrics and how you measure your business and what the processes are under that. And when you combine a good, modern data practice with a smart, intelligent AI and a platform that can help combine those two, I think it did something really cool and magical, awesome.
Eric Dodds 02:22
Well, let’s dig in. Yeah,
John Wessel 02:24
let’s do it.
Eric Dodds 02:24
Taylor, so great to have you back after almost two years. Pretty crazy that we can say that now on the show, you know, to have a repeat guest, I want to man, there are so many things we can talk about, but out of the gate, just give us the overview of arch. You rebranded nel Tano. We can talk about that journey a little bit, but I would just love to tantalize our listeners with a description of arch, but then we’re going to talk about ETL first and make them wait a little bit to build anticipation.
John Wessel 02:55
There we go. I like that strategy. Yeah. I
Taylor Murphy 02:58
I love it. Yeah. Thanks. Thanks for having me on the show. Excited to be back here and talking with you guys. Yeah, so with arch, we are really trying to push the boundaries of how data analysts, data professionals and AI can collaborate to bring what I believe to be the outcomes of excellent data teams to more organizations. We’ve been on this journey with deltano in finding product market fit, and on this journey from just kind of pure el to a larger end to end platform. And so where we are today with arch is working with organizations to really up level their entire data practice with our all in one platform, and then also scaling the efforts of, you know, existing analysts, or if with us as the king of the analytics team, if they don’t have anybody on staff and our AI capabilities to again, just bring these really good outcomes, because I’m a data person at heart, and I know when a well run data team is functioning for an organization, what they can do for that business. And I just want to bring that to more and more organizations. So yeah, so we’re focusing on being your AI Data Analyst with a full platform and team behind it. Love
Eric Dodds 04:01
it. So many questions there. But as I promised, we’re actually going to talk about ETL or El. And I’ve been dying to ask you this question since we got this new recording in the books, but you’ve been on a journey with meltano. Starting to get, like, within GitLab, spinning it out, and, you know, building products and selling them. And, yeah, building and selling in the ETL el space. And so you probably have a really unique perspective on that space in general. And I think, you know we were talking before this show a little bit. It’s easy to think of that as a solved problem just because there’s, you know, multiple, like big vendors, you know, tons of people who write their own custom pipelines. Is just so ubiquitous as a pipeline in any analytics workflow, you just go, yeah.
Taylor Murphy 04:57
Is that true based on your perspective? Yeah. Yeah, it is and isn’t true. Like you mentioned, we’ve been on this long journey to figure out, you know, Product Market Fit for Arch and ml, Tano the early days. And you know, when maltano was spun out of GitLab, was kind of towards the peak of the zero interest rate phenomenon. Lots of VC money, lots of funding, and we were certainly a benefactor of that environment. What we’ve seen and the organizations that we talk to, there are larger organizations where moving data is a core piece of what a lot of large data teams are doing. And for them, there’s always going to be new challenges in terms of scale, data, variety, complexity, and I’d say for them, like that is just the hard part of like software and data, and they’re going to be continuing to evolve there for a different segment of the market, and it’s one that I’ve been we’ve been pulled to, generally, is folks who, for the most part, like data movement is kind of solved from more common applications, whether it’s like Stripe or Salesforce or HubSpot like that, basically, is a solved problem. The more the meta conversation around that, though, is. And what I’ve kind of come to believe and realize is for the vast majority of products, and when you’re talking you’re trying to sell to folks, they just don’t care. I’ll just be honest, they just don’t care about the underlying data movement. They just want it to work. And they want it to know that you can get your data from point A to point B, and give you the value on top of that. And I think back in the pure modern data stack era, as all these tools were growing, there was this new renewed focus on genuinely better ways to move data. And we saw some great innovation there, including what we can do with maltano. And people still love maltano and use it today and reach for it, but the reality in terms of the business and what it cares about, what’s visible to executives and what is driving the needle on business outcomes. Data movement is an essential part of that, and you do need to get it right, but it is so below the line in terms of what’s visible to anybody that has control of the purse strings. And so we’ve kind of evolved our strategy as a business to move to something that is much more visible throughout the organization, while still caring very deeply about how to solve these problems, and that’s why meltano. We didn’t rebrand the meltano project. It’s still an open source project that people know and love. It is now built by the team at arch, and it currently powers the data movement within the arch platform.
John Wessel 07:18
So I have a lot of painful memories from your early ETL, 10 plus years ago. Start with that. This is gonna be a good, healthy word through some of those, some of those things behind, yeah, this is probably 10 years ago, mainly Microsoft ETL tools specifically, and a lot of them. I just think it’s interesting because one of the Microsoft stacks from, say, 10 years ago was pretty ubiquitous. Almost every company had that. I mean, there was, you know, IBM had some auction options, and there’s few other options, but that was one of the most popular tool sets. And there was never a data movement tool that I don’t think you could buy separately. I think it was always packaged with a storage layer of reporting slash analytics layer, and then the data movement. So they were packaged together. We unbundled them five in the last five, seven years, however many years it’s been, and it’s interesting to watch them get a lot of them, pretty much get re bundled with two of those three, or maybe all three of them. So I just think it’s just an interesting observation from, yeah,
Taylor Murphy 08:25
Yeah, the bundling, unbundling journey is, I think, very common, and it’s a natural part of the cycle. And depending on where you sit, I’m sure if you go talk to George at Fivetran, he’ll have his own opinions on this as well. I think the reality is, when we talk to businesses, people are more cost conscious nowadays, and that pushes you towards all in one solutions. I think what’s different about this time, though, is the ubiquity of open source availability, and that kind of solves for that long tail. And that’s why we’re huge fans of open source. Interestingly, though, now the new world of AI, and I don’t know if you’re ready to transition to that, but these AI models now can actually help you take advantage of all this unstructured data. And that’s kind of the burgeoning interest of, you know, you throw a PDF at it, or, you know, can literally scan something, fax it somewhere, and then get that data structured. That is super interesting, and something we’re addressing as well with our platform.
Eric Dodds 09:17
Yeah, yeah. One more comment, I mean, so much to talk about on the on the AI side, but one comment, and then one more, one more question on the ETL side, the it I’ll never forget, actually, like, very early, when I was when I joined RudderStack And we had a customer who was just running a bunch of, like, really Basic event stream pipelines, right? We’re just getting data from, you know, an SDK into a data store, right? And they just needed to move a bunch of data. And so I get on with this customer. I asked him, could you just tell me, sort of what, you know, they had given us a bunch of good feedback. Hey, we love this. And so I’d gotten on a call with them, just to, you know, connect with the customer and learn more. It’s like, well, what do you love about red? For a second they’re like, I don’t ever think about it like, Right? Which is a little bit jarring, but it really like, when you said people don’t care, like they do care, but not in the way that, you know, you’re not like a product, that you’re in the interface every single day because you’re, you know, doing a bunch of, you know, it’s like a huge part of your job, which is interesting. So that really resonates with me. But one other thing you mentioned was around, you know, sort of El becoming part of a platform. Do you think we’ll increasingly see that, right? Because Fivetran is sort of the big player in the room, right? Like, as far as just being their own thing, but in some ways, they’re kind of building out or purchasing additional functionalities to actually be more platform-esque, as opposed to just a pipeline. Yeah, I
Taylor Murphy 10:45
I think everyone is going to be compressed in interesting ways. And I think, like this still ties to the AI story of, if you believe in what people are saying about the future of AI generally, it basically is going to compress the cost of intelligence, and then also, I think, the cost of computation generally, and the value of software. And we’re seeing with, you know, you can move petabytes of data using just open source tooling. You can write it in an open data format. You maybe pay for some like, you know, egress costs or storage costs on Sri, but you can do things that you could never do before for so inexpensively, and these players that are charging rents on different things that basically are going to come down in cost overall, it’s going to be a tough spot. And so you have to figure out how to compete and be more valuable to organizations. And that’s part of the same journey that we’ve been on. You know, when I look at an organization now, how do we what is most valuable to people, and it’s usually what is most visible and what is helping them achieve their outcomes? And you know, when you think about it, to use a rough analogy, it’s like you go to a car dealership and you don’t really care what fuel it uses or even how the fuel gets there. You just want to know, Okay, do I have enough energy to actually, can I go buy fuel for this thing you care about the car and the features and the value that it’s getting to you? And so, you know, el is some component of a car that people care about only in that they can kind of check a box there, but the people who really care about it are figuring out ways to make it more efficient and all that fun stuff. So, yeah, I think we’re entering kind of a whole new world, and where you sit in the market kind of depends on your maturity as an organization.
John Wessel 12:26
Yeah, yeah. I think another thing, speaking of the cost, I had this really interesting conversation with a non technical CEO in the last week or two, and we’re just talking about computer cost and its budgeting season, right? So everybody’s trying to get their IT budget. And I think it’s interesting, where there’s, there’s basically three major cloud public providers, right, that almost all of this tech runs on. So there’s this, like, leveling of cost. And I was explaining to him, like, there are a lot of reasons to spend more than this, but as a baseline, your storage is like, like, 15 cents a gigabyte. That was either Azure S3 like one, and there’s different tiers, right? Sometimes it’s lower. And then your computer is around $2 an hour, right? Like, that’s kind of a rough starting point. And when you see that you’re paying multiples above that, you could be paying for redundancy. You could be paying for, you know, more advanced use cases, or maybe you have 1000s of users, but just having a starting point, I think for him, was like, oh, so like, you know, I’m looking at a million or several million dollar IT budget, yeah. Like, what, like, how do we start breaking this apart? But like, well, how much data are we storing? Like, well, it’s only a few terabytes of data, okay. Like, if that’s the case, then, like, maybe you’re overpaying for storage, but maybe you’re overpaying for compute, and because you’re using an arbitrary system that has, you know, has you kind of locked in, or maybe this or that or other, but that’s kind of that, like, first principles approach, I
Taylor Murphy 13:58
I think that has helped people think through, like there’s this commodity layer, so we can kind of reverse into how much this could cost. Yeah, and I love, you know, there’s been arguments from some of the mother duck folks and the duckdb folks around how powerful computers are nowadays. Like, I have an m1 Mac looking at some of the M fours that are going to come out. And the processing power there is insane. And so, like, oh, you know, we have an organization of 1000 people. We buy them all Max, that’s a huge amount of computer that’s just sitting there. 98% that’s running Chrome, exactly, yeah. And electron apps like chrome takes every ounce of it, but it does just make you rethink from like it is that that first principles thinking of this stuff doesn’t have to be expected, like the raw actions that we’re taking to move data, store it, transform it, and ask questions of it, are incredibly cheap nowadays, and so when you’re paying for a vendor, you just have to consider, what are you actually paying for? And. You know, there’s a lot of things there to pay for, certainly, collaboration aspects, security model, SLAs, things like that, and not discounting those. And those are important. But you also have to look at the raw numbers of, does it make sense? Has it ever made sense to pay for, you know, monthly active rows or something? Not, not the bash on Fivetran, too much, but it’s, it just puts downward pressure on it in a good way. And so I feel like we’re in a good position, for me and for Arch as a company, to, like, rethink things from those first principles and like, Oh, if all of this is basically free, then what is the value of a data platform, of a data team, and how do we position ourselves and go to market in that way?
Eric Dodds 15:38
Yeah, yeah, I love it. Okay. Our listeners have been waiting, so we’re gonna, we’re gonna talk about AI, how much time we left for AI. I know that we’ll get like, 50 minutes in, and we won’t even bring it up, really proud of myself, because we could keep talking about, like, you know, our philosophy on data industry and ETL and Yale for a long time. Okay, John, I’m gonna give you the first word here, because I usually jump right in. But you had some really great questions about this when we were chatting before the show. So arch and AI, let’s, yeah,
John Wessel 16:09
Let’s start with that world view, because we talked about this, you had such a good and interesting take on this. So I’ve seen two or three models, and like in this AI agent model, say, hey and everybody, or a lot of people have seen like, what chat GPT has done, what you can do with like, a data you use it kind of as a data analyst. But there’s this world view of one, like, I’ll call it like the zero shot, world view where people just want to like, where they want to like, do text to SQL. They just want to, like, throw something in the magic, you know, box, and then get exactly what they want out. There’s that one, and there’s the other one from like, raw data, like, we’re just gonna like raw data and get something magic out. That one, you know, I don’t think has gone well for very many people. The other one is, we want to, like, very carefully, craft and model the data and put metrics layers and give tons of context so the AI can make better decisions. So I’ve seen some success with that model. But you it sounds like not that you’re not kind of pulling best off from all the models, but it sounds like you guys have kind of a third view on this. So love to hear more about that. Yeah. So
Taylor Murphy 17:12
yeah, lots of dive in on this topic. So generally, I am bullish on AI and the opportunity it’s gonna bring. I think I tend to be more on the optimistic side with some of that. I’ll be honest, though, like a lot of the newer models, each time they get in better it does generate that anxiety, just like, Oh, are they coming for our jobs, but any automation probably will have a net benefit at some point in society. So how we’re thinking about AI is that it can be in the most optimistic sense of what it can do. It can bring the outcomes of good data teams to more organizations. I see it as, like this force multiplier, but not in a way where it’s just like chat GBT on steroids, where you’re constantly like, having to shut things in the context window if you want an AI analyst. And that’s kind of, you know, that’s how we’re pitching ourselves to actually do. Well, it’s going to have to do a lot of the same things that a human analyst would do. It’s going to have to ask follow up questions. It’s going to have to get context from the business, and it’s still going to have to do the fundamental aspects of what a data professional has to do. Somebody’s going to have to move the data to a central place or make it available for querying. There needs to be that cleaning step on top of it, and then there needs to be the structuring, the mapping of real business processes to what the data is saying in that one of the phrases I like that I’ve heard recently is that data is the shadow of a business process, and so part of modeling the data is, you know, revealing what the form is of the shape that is casting that that shadow, and AI Still is going to have to do all those things, it can just start to do it much quicker. And when you can do things quicker and for more cheaply, it opens a world of who has access to these things, and that’s been our strategy currently, is that by focusing on the AI analyst aspect that has access to a full data platform, that it can build its own ETL, do, you know, write DBT code for transformations and then also has the assistance and collaboration of human experts in the loop, then that kind of democratizes. I kind of hate to use that word, but it opens the door to more people benefiting from this great data intelligence. Yeah,
John Wessel 19:18
So you touched on this, and I think it’s really interesting, because I think it’s completely overlooked by a lot of people. A lot of the analyst role is doing, like, collaborative or like even data collection stuff. So I’m even imagining, like, right now, like most of these analysts, like, you’re just talking with it in the data, right? But if you had, you know, one of these agents that can go ask somebody a question, like, from the business, like, hey, does this look right? Hey, I need your sales target number for, you know, for October. I think that’s a really interesting thing here, where the Analy, like, you get you, it’s easy to get so focused on the well, the analyst job is to analyze the data. Like, well, sure, but a lot of it is. There’s a Data Collection Component. Where’s that human in the loop? Component of like, asking the right people the right questions to either validate data, or a lot of times, collect data, Yep, yeah,
Taylor Murphy 20:10
I’ll jump in there and say, the pushing of data experts back out into the different aspects of the business, I think, is one trend that we’re going to see for large organizations will have still like a centralized Center of Excellence for data and analytics, but what matters at the end of the day are the outcomes of the business. Are you growing revenue or increasing leads and prospects and decreasing costs? That’s where data teams need to be focused. And I think for a while we got so wrapped up in the tooling, and how we are structuring data teams and data as a product and all this stuff, and it started to get a little disconnected from how businesses actually work. What are the levers to control your growth as expressed by the different metrics? And so we fundamentally see our approach and our strategy when we’re talking to prospects is like, Hey, if you don’t have a data professional will be that professional. You’ll have access to that AI, and then you have kind of a human in the loop there, even though all the conversations are still going to be kind of funneled through this AI chat interface. But if you already have someone on staff, and one of our customers that we talk about a lot on the website, they have an analyst on staff already, and we are 10x in his output, this is for a private equity firm. They have 20 plus portfolio companies, and he’s essentially the data team for 21 companies, the portcos, plus the actual PE firm itself. And so something has to scale there. And he’s reached for arch to both scale the data back end for 20 of the same pipeline, bunch of DBT transformations, and now we’re working with them on the AI analyst interface. Like, what if you’re, what if you had an interface for the CEO, the VP of sales, to just ask questions? Like, Hey, can you give me these latest numbers? I actually think, like, with AI, we’ll see an increase in, like, the number of ad hoc analyst requests. And if you have good model data, go ham, right? Like, ask all the questions you want, and then the analysts behind the scenes can be kind of optimizing the systems. The system. So I get excited about that, because it does. I think I feel like, for a while as data professionals, we fought against the tide of what people just really want. And what people want is to ask questions. They want to get data. They want to. Sometimes the data makes them feel good. And data teams are like, Well, you got to push back and say, Well, what do you want to measure this with this data? What’s your decision going to be? And it’s like, sometimes, people just want to look at a chart and see it go up, and that’s okay, and I can make that much easier.
Eric Dodds 22:27
Yeah, yeah, yeah. So one question I’d love to dig into there. I think they’re probably listeners, and I’m even thinking about an occasional guest we have on the show name the cynical data guy, who they’re probably their gut response is, man, it’s so it like is hearing you say, okay, the CEO, can, you know, ask the, AI, a question, and, you know, sort of get numbers or whatever. They’re like, Oh my gosh, that’s so dangerous, right? That’s like, really scary. And I think, you know, I think that, I think part of that is because, and, I mean, Well, John, I mean, you were, you know, you’ve been on both sides of that, and you have too, Taylor, so keep me honest. But I think part of that is there’s sort of this, like, contextual, you almost, you need to wrap the numbers in context, right? Because it can be misinterpreted. Or, you know, it’s like, okay, well, if you don’t have some level of healthy control over the narrative, it can actually create more work for you, right? Because it’s like, what you know, what does this mean, or whatever. What do you think about that challenge in March? Yeah,
Taylor Murphy 23:33
I think where I’ve been coming down on this, and where we kind of structure our thesis, and how we talk to prospects, is, like, all of the questions you’re going to want to ask an analyst or an AI analyst need to be anchored in the metrics of the business. And I keep coming back to that, because it is the forcing function to help solve some of these problems. When you have a CEO asking, like, our sales are down for last month, like, what’s happened? That is an opportunity to have, like, a back and forth with the CEO and an AI can do that just as well. I think, to say, like, great. Are you asking about, you know, metrics, X, Y and Z? Are you looking at churn? Are you looking at, you know, leads dropping down on the marketing funnel? That you can have this kind of, like back and forth conversation and then actually have the day is like, Hey, we’re working on measuring this or we don’t have this metric. And so the explainability of why this top level number went down doesn’t exist yet. This is kind of tied in with it’s a little too nerdy on the side, but statistical process control and to anchor it more concretely for folks, Amazon did this with like, a weekly business review where they have hundreds of different metrics and they’re just looking at variation across all of these things. And so that’s our thesis, where it’s like, we help you build the model of the business based on the metrics that are measured and expressed in these different processes. And then that’s what the AI can really do. I’m with the cynical data guy, though, like, I think there’s a lot that you can be cynical about, but the. AI does give a new optimistic lens on this stuff, in a way that we haven’t seen before, that scales non linear, non linearly, outside of humans later. Part of
Eric Dodds 25:10
I am also really interested in what you can do with the questions themselves, right? So, I mean, there’s a lot of debate over self-serve analytics, is that even a thing? Is it even possible? Right? Democratizing data and all these things, right? But you know, no matter your opinions on that, actually, if a bunch of people in the organization started asking a bunch of questions, and you could actually see the lineage of those conversations and even have the right it’s an AI bot, right? And so one of its core strengths will be actually summarizing all of the questions that it’s getting right? I mean that in itself, I think as an analyst would be hugely helpful, right? Because that’s actually pretty hard to scale, even though you may be able to, like, you know, understand nuance that an AI bot might not, you know, because you’re human and you have, you know, you went to the CEOs, you know, party, pool party, you know, at Christmas, and had conversations that the AI wasn’t privy to, even still, like having, you know, trying To collect intelligence around the questions that are being asked is really difficult within an organization. Yeah, and that’s
Taylor Murphy 26:26
It’s interesting, because that is the promise of some of these, like enterprise knowledge search tools, where it is like, you know, connect your Google Docs, connect your notion, and it wants to be this decentralized brain for task automation and asking questions. But I do think all of these tools are disconnected from the metrics, the fundamental measurement of these different processes. And to your point, like, when, when someone asks a question that that question exists within the context of the rest of the business. And so, you know, the AI has infinite patience with this stuff, and can search a lot of information. And, you know, appropriately summarize, like, hey, actually, your VP of Sales asked this similar question. Here’s the answer that we gave them. It’s been updated since then. You know, here’s the and what I want to like, kind of build a NAR product in particular, is, how do you really trust what the AI is saying? And I think right for that to be true, it needs to cite its sources, show its work, and be connected to the underlying systems, not just, not just like chat GPT, where it gives you an answer, and that’s the end of it. You’re going to want these things to be auditable. And just, you know, good AI systems that are plugged in can do this at a very large scale. And again, that infinite patience point is, I think, is important, because, as you know, having been the sole data professional for a while at a large organization like you’re inundated with with requests, and if you can have support in that in a way that doesn’t require you to tire people every single time, I think that’s huge. It’s a huge unlock. Yeah,
John Wessel 27:47
yeah. One, one thing you mentioned a minute ago with Amazon and the metrics, the weekly metrics reviews, how so? And I think that makes a ton of sense in their context. How would you approach that in more of a mid market, right? Because two things I see and with mid market companies, one, a lot of them don’t really have metrics honestly, other than like financial metrics, they typically, like, understand at least roughly, like, how much money they’re making. And some like, high level, you know, maybe like customers some of them, like more sophisticated, like customer lifetime value, but some of them, it’s just financials. Is their metrics. So there’s that component of it. And then two, they do not have the luxury of a lot of statistically significant data, because at the Amazon scale, like, you can quickly get enough data to, like, really point, you know, directions, if you’re AB testing or if you’re doing whatever. So like, say somebody wants to implement a full data stack for one of these mid market companies that maybe has just the financial metrics, like, what would you say to them, and then has the sparse data problem as well? Yeah.
Taylor Murphy 28:55
So that’s a fun challenge, I think. And I purchased two ways. One is, if I was literally stepping into this organization, what would I actually do? You know, you’d have to make some assumptions about the size of the business. I think the particular, you know, kind of weekly Business Review, statistical process control is definitely for companies that are kind of post product market fit and have some scale and set processes going. But the whole idea with this is trying to get an understanding of the like, causal model of your business. Like, when I do X, I expect y to happen when I increase, yeah, ad spend I expect, you know, leads to come in and revenue to jump up appropriately. And so what a good analyst would do, or, like, if you got hired as a, you know, head of data, or CDO, or whatever, he would have conversations with the executives and the business units for sales of like, you know, what do you think is happening in this business? Like, where are you confused? And what tools are you using? What are you trying to measure? And it is part of a larger conversation to figure out and hypothesize what’s going on. And so some of it just falls back to, what do you think is happening? Cool. Is there a way we can measure that? Awesome. Let’s go measure that. And then. Track it for a few weeks and see if it has any correlation with that, and if we think something else is missing, it’s, you know, another conversation to figure out the metric one, one thing that Cedric from common cog talks about and references stuff in other places, is the data problem is actually less challenging than the the social problem within the organization. It is getting business leaders and business unit leaders to think in this way, to think about what are the causal leaders of their particular unit, whether it’s marketing or sales, and how do you measure that? You know, the actual fundamentals of measuring a metric sometimes can actually be quite easy, and then it’s the you know, do they have low enough ego to look at these numbers and actually respond in a way that moves the business forward. So it comes back to the classic, you know, this is a human problem. There’s that meme of, like, all my data problems, underneath are human problems, and that’s kind of the reality. And so that’s why, you know, I’m, I think there’s a world where we can have, you know, 100% AI data analysts, but they’re still going to need to be some human in the loop. There somebody you can talk to to help drive that kind of narrative forward? So well, that answers your question, but,
John Wessel 31:06
yeah, I think so. And I think this got me thinking too, there’s this other component too, where there’s no reason the AI can’t help with a metric creation process. And you know, at least from an ML standpoint, like, ML is pretty good at, like, helping with sparse data problems, right? Like, like, in my past, like, had this problem around pricing and like, that’s like, a really good application if you have sparse data, is using kind of machine learning to fill in gaps. You can get to better, better recommendations for pricing or for a lot of things, really. So I think maybe it’s using the tools for those things, and that data gets, you know, fed back into the tool. So there can be, like, kind of, I think there can be a positive move. There one
Eric Dodds 31:51
question I have on the metrics I’d love just to know, from a product standpoint, you know, what does that look like in terms of the product experience, and then what does that look like under the hood? Like under the hood? Because, I mean, it makes complete sense, right? If you anchor this in metrics, that actually anchors the conversation, which means that you know, the AI isn’t, you know, you create boundaries within which the AI is going to have a conversation with the CEO, which you know, sort of solves part of that problem, right? But how do you define that? How, if I’m an arch customer, how do I define those metrics? Do those eventually materialize in like, you know, the DBT semantic layer? Like, how does that, you know, just can start from me, like, as a user, defining the metric, and then walk us down into the stack and tell us what’s going on under the hood?
Taylor Murphy 32:37
Yeah, absolutely. So the early days of the conversation are about, you know, when onboarding is figuring out, like, what are you what is the current state of the business, and what do you want to understand, and what do you want to measure within the platform, there are ways to document the metrics themselves, and that is literally just, you know, a list of, here’s the metrics, here’s the owner, here’s what the definition means in plain text. And those are going to be connected to the semantic model behind the scenes. We do use the DBT core under the hood for the transformations. We’re using Kube Dev, the open source version, oh, for the semantic layer. That’s the actual semantic layer itself. And then, you know, on the AI conversational front, like that, stuff is put into the context of the chat interface. And we’re also working closely with our design partners on this to help them understand where this stuff starts to fall over, and where they’re not seeing it. Because some are. When you ask a general AI, like, help me come up with metrics for this part of the business, it can come up with some pretty good things. But then there’s also just unique nuances to every single business that aren’t going to be captured by these, these general foundational models, and that’s where you need to have that social, sociological component of having someone to talk to to figure this stuff out. I always like to point back, like, for real examples of how people can think about this too. Like, I go back to GitLab, you know, you go to the GitLab handbook, and they list all of their KPIs, or at least the top KPIs for each department. And for me, that is still kind of the gold standard of, here’s what we’re measuring, here’s the definition, and here’s, you know, the link to, kind of the history of that data.
John Wessel 34:07
So another aspect of this kind of metrics layer discussion that I think is interesting is, there’s several, you know, several methods, like we’ve already talked about, where we’ve got, like, the agent method versus the versus like a human in the loop. Tell us more about humans in the loop. And I’m specifically interested in, like, long running processes, something we talked about before, where, if you and I were interacting, I would be and you’re the I’m a customer, you’re the analyst, and, you know, I’m talking about something, and it would be very normal for you to say, hey, like, I’ll get back to you, and then a couple hours later, you get back to me with some kind of information, but if it’s some kind of technology like that, would maybe feel weird, right, where it’s like, well, like, shouldn’t you just be able to get that for me instantly? So walk me through that, like, just from a user experience, yeah.
Taylor Murphy 34:53
So, this is something that we are actively working with design partners on, and I’m really excited to kind. Try and dive in and solve it for them, and more broadly, because you’re right, it’s not something that we’ve seen in the larger market, like, you know, I think the the state of the art, for the most part, is like, is open, A is 01 where it shows kind of that, that full chain of thought process that says, you know, here’s what I’m doing, here’s what I’m thinking about. And that is kind of a longer running process. I think you’ve seen examples of people, you know, taking several minutes to answer a question. But then you hear folks like Sam Altman saying, like, Oh, you’re gonna have agents running for a month at a time. But he doesn’t talk about, like, what is that experience like for the end user? To your point, like, Am I just staring at a spinning wheel for a month? And so when we’re taking a step back and thinking about this, there needs to be some way. I mean, it comes back to kind of the classic Data Challenge of, like tracking state, like, what is the state of this overall process? The way we’re thinking about solving it initially is using what has been learned over the past decades of, like, test management and, you know, managing humans literally like a compound style flow of different tasks. So our, what we’re working on with our AI is, is it’ll say, okay, great. Here’s the process that needs to happen. We need to connect, you know, this data. We’re going to bring it in and transform it. Here’s the metrics that you’ve defined, and here’s the report that you’ve asked for. I’ve created, you know, these sets of tasks that are going to happen within the system. And just you give just an upfront estimate of, like we expect, this will take several hours for each of these different things. And then whoever requests the information can go off and do something else and say, we’ll notify you when your results are ready. And that can be, you know, via text, the ping on teams, email, whatever you want. And then it just kind of goes into a classic mode of, you know, an AI agent can be churning on something and building code, or it can be a human in the loop actually doing some of the work. Or it could be a specific task for that person to do. If they need to click a button to authenticate Google or Salesforce to actually allow the data to move, that’s a task for them to go do. So I think we’re just seeing this convergence between, you know, how humans interact, where it’s can be over email, it can take time, and then how we expect computers to act, which is usually instantaneous, and trying to bridge that, that gap in between the rep let there’s an interesting example with Repp lit, where, I don’t know if you paid for their new agent, thing, where you just ask it for a program, and then it has a pretty long running process, but It is more chain of thought where it’s like, here’s what we’re doing now, creating the project plan, getting your approval, and then it’s like, showing all the work that it’s actually doing. And I think that’s still interesting, but you’re expected to sit there for five minutes while it kind of turns on this stuff. And I think we’re gonna have a world where you can make a fire up a bunch of different requests and then go do something else, and then just be notified when it’s when it’s done, and it’s, I don’t think there’s anything going to be too fancy for it than what a normal human would do just like, hey, yeah, I’m tracking this. I’ve put it in motion, or I’ve put it in Trello or whatever, and I’ll keep the status updated. And we would expect computers to do the same thing. Yeah. I
John Wessel 37:56
I mean, that’s a fascinating workflow, like, you can imagine, like, like you said, like a Trello board, or linear board, or Jira, or whatever it is. And like, two or three things here, like, one, could you assign, like, tickets or tasks to an AI agent and this, and then they come back and update the ticket, like, potentially, Yeah, or like, eventually they could interact with each other. And you could actually imagine, like, a board of like, this isn’t in progress, and then some kind of AI agent comments on the ticket and says, like, this step has been done, like, moves it to another status for review. The human reviews it, and then it, you know, maybe goes off into another flow of getting assigned to another AI agent or a human, or whatever, like, you know, that would be fascinating. And it’s like taking that data, Team workflow from, you know, from project management tool, and then integrating, you know, AI to that, yeah, well,
Taylor Murphy 38:50
That’s typically how I would think of it. Like having, you know, I’ve worked remotely Since 2018, and, you know, in the most I guess, cynical world view, for some people, you’re only interacting with them via slack or via a task management system, and so as long as the outputs are something that I can consume and collaborate on, does it matter if it’s a human or if it’s a really good AI agent for certain businesses that they’re just not going to care one way or the other? And our approach right now is to have these longer running tasks that an AI can turn on, or a human can actually do, and then we have that kind of human in the loop check of like, Hey, here’s the PRS that I was going to make for the code changes. Or then we can stamp and say, yep, that’s good. Or no, actually go back and fix that so that we can earn the trust to automate more and more of this for these organizations. Yeah.
John Wessel 39:39
And that’s interesting too, right? Because, depending on how far that went, it basically elevates all humans to being some sort of a manager, a project manager, or the manager of, you know, AI, agents and humans, some combined team, like, it’s, yeah, yeah. I
Taylor Murphy 39:57
I mean, I see that already, like our head. Growth in marketing. Managed in one frame. He’s managing 10 different AI agents with these different tools, and he’s way more productive than he would have been, you know, five years ago. So I think that’s happening, and that’s going to happen for data professionals. Like, I just feel incredibly confident about that. But what is the UX? What are the affordances and like, that’s what we’re trying to figure out. Like, right now, yeah, interesting.
John Wessel 40:20
Well, one question
Eric Dodds 40:21
to bring the conversation full circle close to the end here. So you’ve learned an extreme amount about loading data, and because meltana is running under the hood as part of the platform, you know, and you probably have a lot of transformations, you know, just from the MEL tunnel experience that happened downstream of Mel tunnel loading, you in terms of loading this structured data, right? So you’re pulling Salesforce or HubSpot or whatever, right? You have developed some level of expertise around, okay, here’s how these jobs run. Here’s how the data is structured. Here the transformations generally run. Not that each business doesn’t have, you know, their own specific adjustments to the transformations that they want to run. How critical is that for building the platform that you’re building? Right? Because one of the big challenges is you have this really long tail of, like, everyone’s data is a little bit weird, and, you know, schema changes and all that sort of stuff. But you actually have a huge running start from meltano. Do you view that as a critical advantage? Yeah, so
Taylor Murphy 41:30
I do, and it makes where it manifests is in a lot of the sales conversations where, you know, part of sales, and that’s been one of the big parts of my journey is is moving into much more forward sales role, and having conversations, and, you know, trying to ask people for money. And that’s where, where the north meets the road. But one of the objections is like, Oh, well, can you get my data? And with meltana, we’re able to say, Absolutely, if we don’t have a connector for it, we can build one incredibly quick, and it, you know, to the earlier conversation. It’s like, it checks that box for them, but we do it in a way that is authentic and has deep credibility. Because we’ve done this for years. We have the open source platform, and now the new frontier for folks as well. And then the AI angle is saying, Hey, can you get my data in all this unstructured mess that I have? And we’re bringing those capabilities on to arch on how it kind of will relate in the miltano space. But I like, yeah, we can absolutely do that. And then there’s just the opportunity to bring even more with the state of the art and AI models today. Yeah,
Eric Dodds 42:30
that makes total sense in the conversations that you’ve had. I mean, I know you’re still early on the unstructured data side, but what are the main types of data that you’re, you know, in the conversations you’re having that people want to include as part of arch as a data platform.
Taylor Murphy 42:47
I think we’re still trying to figure out a lot of that, because for the businesses that we talk to, you know, they get excited by the, oh, we can get your data from anywhere, but then the real meat, the real wheat of the data is in the systems that they kind of already have, like the systems of record, or if they’re using a CRM, and so it captures people’s imagination of like, oh, yeah, all this data that’s lying around, whether it’s actually, you know, useful for not in this kind of metrics view of the world is very dependent on the individual organizations. You know, we’re talking to some folks that they have old paper processes that they’re working to digitize, but for them, that’s also part of a larger conversation, and moving to different tooling and things like that, we keep the conversation focused on like, what is happening in your business? How can you measure that? And then how can we help you find the levers to pull with this AI analyst, but it does open the door to getting them to think more deeply about this, because part of the journey in sales too is like, how much you have to educate your customers, your prospects, on what’s possible, where they can move things. But it is like some of the stuff is just super cool. If you upload a PDF, you get structured data out of it. You can chat with that data, and then you can put it in the process of defining some of these metrics and kind of the downstream analytics. Yeah,
Eric Dodds 43:58
We’ve been one example, just from what we’ve done at RudderStack. And John and I actually worked on this project together, which is kind of fun, but we’ve done a lot of transcript analysis, yeah. And so even if you think about it, I mean, you can do mining individually, which is super helpful, right? Like, okay, as a product manager, I want to go into all these customer calls, and I want to ask a bunch of questions, which is great, but you can also, for example, like prep for a QBR, by standardizing the questions that you ask of transcripts right from a bunch of calls, right? So, I mean, whatever it is sentiment or, I mean, there’s all sorts of things that you can truly standardize, right? I mean, ironically, of course, like a lot of those, materialize as a custom field, CRM, you know, but it is interesting to think about that as part of the data platform, because, you know, you can do a lot more than just, you know, material as a custom field, but that’s yeah, stuff like that is just, I’m excited to see what you guys build with that, because it’s something that would take weeks of just brutal. Work, right? And it’s like, wow, you literally can just do it like, and it’s so good, you know, it’s so good.
Taylor Murphy 45:04
Like, my first job out of grad school was, you know, we were scraping a lot of websites, and I wrote a ton of regular expressions to parse data from websites, from PDFs, and that would just be automated today. Like, you just, you dump it in there, you get it, it’s accurate. Check it, and you’re off to the races. It’s amazing what you can do today. Yeah, I’m
John Wessel 45:21
I am also fascinated by paper. Like, we had a guest on this for several months now, talking about an internship where he spent months and months, like, looking through old records prior to whatever year and creating a manual database of, I think it was economic events. Yes, right, yeah. Because looking through paper. I mean, like, that’s a it is feasible now, right, to base it, you know, to scan this in and actually get information, you know,
Taylor Murphy 45:47
I think, like one, I would love to work with organizations that have a ton of paper, because I think it actually be pretty easy. You set them up with a fax machine, just fax us a ton of stuff. It’s then digitized, and then you pull a ton of stuff from there, and they all still have fax machines for sure. So that’s especially if you’re in healthcare, yeah, yeah, healthcare for sure. That is
Eric Dodds 46:05
like the absolute best story of like, okay, what’s an amazing use for AI, right? Fax to AI, yeah, fax to AI,
Taylor Murphy 46:13
yes, absolutely.
Eric Dodds 46:15
That’s great, awesome. Taylor, well, we’re at the buzzer here, as we like to say, this has been an incredible conversation. Super excited about what you’re building at arch and what a gurney. So keep us posted, and we’ll have you back on.
Taylor Murphy 46:28
Absolutely. Looking forward to it.
Eric Dodds 46:31
The Data Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.