Episode 105:

The Modern Data Stack Is Just Getting Started with Astasia Myers of Quiet Capital

September 21, 2022

This week on The Data Stack Show, Eric and Kostas chat with Astasia Myers, founding partner at Quiet Capital. During the episode, Astasia discusses the limit of solution complexity, early-stage acquisition, and data-centric ML.

Notes:

Highlights from this week’s conversation include:

  • Astasia’s background and career journey (3:03)
  • How Astasia evaluates data companies (5:25)
  • Defining “modern data stack” (8:39)
  • The limit of the complexity of a solution (18:44)
  • How risky early-stage acquisition really is (26:15)
  • Flashing headlight advice for investing (30:17)
  • Signs you should do a product integration (33:38)
  • The next data infrastructure opportunities (36:19)
  • The likelihood of two data worlds merging (43:55)
  • How important open source is (49:14)
  • Data-centric ML (53:47) 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Alright, Kostas, after recording four episodes this week, guess what happened? I don’t know if you can tell.

Kostas Pardalis 0:32
I think I can. Yeah, I love your voice. I think you are ready do into the after-midnight radio program.

Eric Dodds 0:42
Oh yes. Yeah, turn this into a late show.

Kostas Pardalis 0:47
Yes, yes. For lonely souls out there.

Eric Dodds 0:50
That’s right.

Kostas Pardalis 0:52
Yeah, let’s do it.

Eric Dodds 0:53
Actually, having a sultry voice is a great introduction to this episode because it’s our first investor interview on the show, which is super interesting. So we have Astasia who is now with Quiet Ventures (she was a Red Point for a long time) and she invests in data tooling, and has made investments and actually a lot of the companies founded by people who have been on our show, which is super interesting. It’s lots of fun connections there. I think one of my biggest questions is going to be around the way that she thinks about evaluating data tooling, right? Because if you think about like an investor who focuses on data tools, every single day, they’re looking at new technology, trying to understand it. And they have a very wide and deep view on the different ways that companies are trying to solve particular data problems. And so I know there’s investment criteria, you know, on the business side, but they also look at the technical side of the product. And so I think it’d be helpful for me and hopefully, our listeners to understand the framework that they use to sort of evaluate that because they just spend so much time doing. How about you?

Kostas Pardalis
Yeah. Too many questions, in my mind, to be honest, but I’d love to hear your opinion about like the modern data stack. What is it like, why it exists and how she thinks that it will evolve? And of course, we shouldn’t lose the opportunity of ask king here about what’s next? And what’s put out there. I mean, that’s what investors are best at. Right? So, absolutely. Let’s do it.

Eric Dodds
Let’s do it.

Eric Dodds
Astasia, welcome to the show. We are so excited to chat with you.

Astasia Myers 2:49
Thanks so much for having me, Eric. It’s a pleasure to be here.

Eric Dodds 2:52
Okay, do you want to give us your background and tell us how you got into investing in data products?

Astasia Myers 2:59
Sure thing. So I have been an investor for nearly a decade now really started my career at Cisco on the m&a and venture investing team did a whole bunch of really fun investing in storage businesses back in the day, like in cohesity. I then transition to Redpoint Ventures to be on the early stage team was there for about four and a half years. And then more recently joined quiet capital to be an enterprise partner leading the practice over here. My background is a specialty and solutions that sell into technical audiences. So of course, they didn’t machine learning, cybersecurity dev tools and infrastructure. I’ve been really humbled to partner with businesses like dremio and LaunchDarkly, solo.io, preset hex, super base airbike. So amazing founders even had on the show yourself. So it’s been a fun ride. And, you know, we’re always evolving things in data, and it’s brighter than ever before.

Eric Dodds 4:02
Awesome. And you’re the first investor on the show, actually, as a guest, which is super exciting. I couldn’t think of anyone better. And I know that personally, I’ve read a lot of your work, you know, sort of outlining new data technologies and your thinking. So you’ve been very helpful to me personally. So thank you for that. I’d love to start out so you as part of your job, you evaluate data tooling, all the time, right? I mean, maybe every day. And so and you have such a wide perspective on the market because of that because you could just see so many new types of technologies. And I think specifically, the different approaches that companies are taking to solving similar problems with data nursery creating new opportunities data, I’d love for you to share with our listeners, sort of the evaluation framework you use because a lot of our listeners have to look for tooling to solve their own data problems a lot. But they haven’t invested as much time as you sort of surveying the market trying to understand data tooling. And so how do you evaluate companies and the way that they’re approaching solving particular data problems?

Astasia Myers 5:20
That’s such a great question, Eric. And I really think it’s important to clarify for the listeners, the way that we think about a data tool could actually be different than how we think about an investment in a data company. When we think about it in terms of a company, we’re thinking about the team, that technology, the market space, when we’re thinking about a tool, it could have a company behind it. But it may not necessarily have to be it could be a great open source project that provides a lot of value. With the tool, you know, there may not be a monetization opportunity for it, even if it is fantastic. So just want to make sure that we clarify the differentiator for the audience today, in terms of how we think about a useful data tool, we think about if there’s any first if there’s any existing offerings in this space, an open source project, or a commercial offering, that is failing at some aspect of their core functionality, or capability, and the magnitude, in which they’re not providing the value they should be to users. And so that is the first framing what exists today? What are the issues? The second thing that we really like to dig into is the criticality of this pain? Is the gown, you know, something that could be easily solved with people and you can throw bodies at it? Is it something that would be best served by a software offering? And is it mission critical to the business that it is solved? We also look at the core IP, if there are any patents against the technology that suggested it as differentiation that is not easily replicated by alternatives, or, you know, open source projects that can offer it for free, have wonderful adoption, but not be monetized. And like, my favorite thing to dig into with the product itself, is how easy it is to use and implement for teams, I often find that there are really sophisticated data solutions that are out there, but they’re too complex to get up and running to demonstrate value. So when I do diligence calls on technology, I literally ask them, How long does it take to implement? How long does it take to show value? What is the magnitude of the value you’re experiencing? And do you think it will be enduring, it’s always best if the tool can actually fit into a macro trend. Later, we’ll be talking about the role of the modern data stack and the movement to cloud. But it’s, it’s not a “must have” but is a “nice to have.”

Eric Dodds 8:09
Super interesting. That’s super helpful. Let’s actually just jump straight into the modern data stack because this is something you’ve written about a ton. And I would love for you to just— Can you define the modern data stack? In your own words for our listeners, because there are lots of sort of definitions and like, a million architectural diagrams out there that have like different flavors of this, but you’ve done some really influential thinking on this. Can you define it for us in your own words?

Astasia Myers 8:38
Sure thing. Yeah, there’s so many different definitions. I feel like there’s the agnostic research analysts definition, there could be a vendor desk clinician that likes to highlight certain components based on what they’re offering. From my perspective, the modern data stack is an analytic stack that has the foundation around a cloud data warehouse. And what really separates a modern data stack from a low legacy data stack is that it is hosted in the cloud and requires less technical configuration by the user to demonstrate value. It also often promotes end-user accessibility. So data democratization and can cut costs, shorten the amount of implementation time and downtime because it is hosted, and then actually scale out as the data volumes grow over time. And so we often think about, there are four core components of the modern data stack. There is the ingestion layer, with you know, offerings like Fivetran and airbike. There is the core of the cloud-native data warehouse, either Snowflake BigQuery, or Redshift, there is a transformation And dTT is often used. And then there is the BI service, either preset or Looker. And it’s really this reimagining. And the big trend that emerged was moving from ETL to ELT. And that was catalyzed by people wanting to ingest more data from sources like Salesforce Zendesk stripe into one central service and then build consistent transformation models on top that they could push out to other services. Something that has been really exciting to see emerge just outside of these four core components adjacencies that continue to add value like operational and analytics would reverse ETL solutions like census and high touch, we take data from the cloud data warehouse, and not just use it for an analytics use case, but also operational processes, moving it into, you know, Salesforce, or Zendesk or MailChimp. And it’s also cool to see that now, we are leaping in machine learning engineers and data scientists into the modern data stack, by enabling businesses to leverage that data to build internal or external models. And when I say external, like a production grade model that services a customer versus an internal model that can be something around forecasting. And so I think this is just the beginning of the modern data stack. But it’s really cool to see over the past two and a half to three years, the foundation really be set in place.

Eric Dodds 11:43
Super interesting. Two quick follow-up questions, or I guess a follow-up question about two additional components. So do you see sort of data observability or orchestration as sort of key components of this stack? Are those sort of augmenting that core units sort of those four key parts that you outline?

Astasia Myers 12:12
Totally, yeah, they’re very useful components of some stack as well, I think is like the traditional cord, waters de Deus Docker, there are four components. The reason that data orchestration solutions, like strong mer, pre-sacked Jeeva, open source Airflow become more important is the coordination of actions on data transformation over time. Data observability has a few different components to me, it could be pre-production observability, around pipelines, and linear regression and nondata itself to prevent schema changes that could have downstream negative impacts and create breakage or could actually be looking at data distributions on the data warehouse, to see if there’s any data drift over time. I think one of the reasons that data observability has become such a important and growing segment is because dashboards have become widespread throughout an organization, we often find that there are multiple BI solutions within one business, the ratio of dashboard creator to dashboard viewers, one to 100. And so when you have all this information, which is distributed to make smart decision making, you want to make sure that data is correct for the viewers, and that you’re making decisions that can drive the business forward productively. And so you don’t want to have a dashboard without have to do data, you don’t want to be in a team meeting and get called out for oh, I think this number looks wrong. I don’t know if you can actually make a decision today. And so data observability companies really stepped in to help make sure that the data is clean, correct, and the business can be more productive.

Kostas Pardalis 14:16
So I started like a follow-up question to about the modern data stack. So modern data stack as a term has been around for a while, right? And like okay, things in big change really, really fast. So based on what you have experienced so far, have you seen some kind of evolution in the Moodle data stack? Like something notable that has changed since I don’t know Firstly, it was introduced as a term until like today, like something that was added something that was removed or like our understanding has changed or some tools has have matured, right. So how have you seen the modern data stack involved in all this time?

Astasia Myers 14:59
Yeah, I think they’re is so cool new segments that have emerged. As I mentioned before, one of the newer ones is operational analytics in reverse ETL ability to push data from the data warehouse into third-party SaaS products to make smarter decision-making. So you could have data in the warehouse that talks about customer billing, and then you may want to push it into sales force to try to do better count qualification or email marketing campaigns with MailChimp. So that’s been really cool to see emerge. The second category that’s been pretty neat is the movement from batch workloads to real-time and streaming, instead of using an ingest layer, with longer time horizons for work, collecting, and, you know, ingesting data, that could be an hour, on the order of an hour for a day, you’re actually seeing teams use streaming systems like Kafka, or red panda for that layer, so that the data can be fed faster, to the dashboards to be updated more quickly. What’s really cool about that is we’re seeing new warehouses come to market that are supposed to support real-time analytics more effectively than the known incumbents, like Snowflake and the Cloud Data Warehouse from the service, large cloud providers. And so those would be like Keno, and reweighed, and ClickHouse. And so I think there’s a push in the market to get data faster to the end users to make decisions. It’s particularly prominent in operational teams, we kind of saw lots of blog posts coming out of Uber and Lyft, over the years that the criticality of the data and needs to be identified and visualized within like 10 to 30 minutes for change to make decisions, and seems to be more popular now than ever before to move from batch to streaming. So that’s been pretty cool to see, too.

Kostas Pardalis 17:31
That’s cool. But like, my feeling is that the modern data stag like us, I mean, time flows, but like, it gets more and more, not complicated, necessarily, but we see like more categories are added to it. And obviously, like more vendors for each category, right, so now you have like data engineering teams, or IT teams, I don’t know, like, whoever’s, like, responsible in the company to go and figure these things out and buy all these things. And they have, like, all these choices in front of them, right. So what is like, let’s say, and probably as an investor, you like more? You can you have like a better intuition about that. Like, what’s the limit that the market has in terms of like, the complexity of a solution, right, like, where are we going to reach a point where, okay, the market will be like, Okay, guys, that’s just too much. And all like, it’s not easy, like, from an organizational standpoint, to maintain all this infrastructure Oh, even like, navigate these infrastructure?

Astasia Myers 18:40
Totally. Yeah, it’s a great point, talking about historically, there were businesses like Informatica that offer a suite of solutions across the data landscape. And so a customer could go to them and buy multiple different products that were integrated, that theoretically should function very well together, we will bring down different components of the data stack into everything from Warehouse transformation, integration, metrics, stores, observability, reverse ETL data orchestration. And so yes, I totally hear you there has been a fragmentation over time, I can imagine that as things evolve, there will be a real consolidation, because of this exact take on the procurement side of the house. Why do we need so many vendors that we’re managing and can’t we use have these go one throat show for a lot of what we’re doing? A good friend of mine just ran a survey with, I think over 500 data buyers, and we’re asking about 15 different categories of data spend, and how it was going to evolve over the next five years. And when I saw that huge lifts, which even includes synthetic data, which we can throw in there, right? I was just thinking like God, it would be really hard to be a data leader these days, there are so many options to choose from across every category. Do I really need all these tools? As I said, I set the foundation or like the four core components that we see, like, do I have to get all the other eight that the survey had? We’re seeing early indications that teams want to buy products that are integrated and serve many different functions. We see that with RudderStack was changed. Some will take cron, but approach, ingest, streaming and reverse and ETL can see that with businesses and the data observability space that are not just doing data validation and linear regressions. And looking at the data dressed and quality inside a warehouse that are going to move up the stack to go into data catalogs. You can also see it even with DBT that started as a transformation louer layer and now offers a metric server think about metric server, as democratized look, ml. So you can define a metric one time and serve it up to many different SaaS products, see, there’s consistency across them, you can make smart decisions. So I personally think that we’re going to start to see consolidation at the layers above the warehouse. Because picking off, hey, I spend $10,000 a year on reverse retail less than $10,000 a year for synthetic data, it doesn’t make sense to build at all, a whole bunch of different vendors for that you probably just want one contract to deal with. And then you can get a discount for having Lord products in the that you’re using from that one vendor. And so I do expect consolidation going forward. And even like, we’re starting to see acquisitions now. Yeah, right. The people here by just did a really smart acquisition of an open source team. You know, I can imagine over the specially with the macro environment, I would expect over the next year to 18 months to see a lot of acquisitions emerge from incumbents and from, you know, higher group data companies to accelerate the roadmap and broaden the suite of offerings.

Kostas Pardalis 22:39
That’s awesome because you give me a reason to ask the next question that I had in my mind. So when we’re talking about consolidation, like most people think of big companies like Google, or like Microsoft doing and like acquiring and, you know, like the dream of many founders out there coming through. But do you feel that these rounds of consolidation is going to be driven mainly by these big companies acquiring smaller companies, or we’re going to see more of mergers happening? Between smaller companies, right, because, okay, we’re talking, you mentioned, like air bites, everybody does a startup. Like, they’ve been around, not for that long. It’s, you don’t usually see acquisitions happening like that early in the stage of a company. So is what’s your feeling? They’re like, what should we expect? One type or the other more?

Astasia Myers 23:41
Yeah, it’s a great question. I think it’s going to be a mix, I’d probably say that, you know, 75% will be tech and talent by and large incumbents, either cloud service providers, for publicly traded companies, this is a great opportunity for them, especially with the changes in the MACRA. And to go hire great people at a discounted rate. Another 25% will probably be other later-stage startups. I mean, over the past few months, we’ve seen acquisitions of airline purchase a Peru, Peru Hightouch data, acquiring work base sneak, acquired telco data. So I think that some of these later stage startups are being thoughtful of hey, these are great people. We’re aligned on vision, bigoted, be more successful internally. Let’s do this. Now. You have to remember we, you know, we had this era over the past 18 months that have just recently changed. Large capital raises, sometimes 50 100 million in capital for these data infrastructure businesses because people are so excited about this massive wave of public adoption. So the growth the data volumes and the precedent Ceph BI Snowflake of being the largest enterprise, I get it all the time. And so these startups raise a whole bunch of money. And they have the balance sheets now to go and make smart acquisition decisions.

Kostas Pardalis 25:14
Now it makes total sense. And, okay, acquisitions, taking like two different companies and put them together to work and align, like visions and sculptures and products and others are like, super, super hard, right? Like, we’ve seen many times even like in like big corporations acquiring like other companies, and then the products just die at some point. And I would imagine, like also as like a person who has gone through, like, the process of building a company from an early stage. And I’d like to hear that from you, because you’re investing in like early stage companies, how easy and how risky it is, for an early stage company to acquire another company, and try to align invents the products and the companies? What are your thoughts there?

Astasia Myers 26:13
Yeah, that’s a great question. You know, I used to do m&a for Cisco. So one of the most acquisitive tech companies of all time. And so from that experience, I can kind of speak to what I can imagine would be the pros and cons of doing acquisitions at a rose-stage startup, I think it’s important to know there are different types of acquisitions, right. So there’s the traditional apple hire, that’s usually eight to 15 people, usually very early in revenue generation, limited number of customer contracts, maybe not even like patents that you need to work through your diligence process. The next stage is businesses that are revenue generating with contracts with customers, sometimes it could be multi-year contracts. These probably be between, you know, one and $10 million of revenue. And then there’s the 10 to $30 million of revenue, which is really about, you know, having a second product line in business, I can imagine that most of the near-term acquisitions is going to be tech and talent. There are a lot of challenges of how to manage customer relationships, if you’re spinning down a business, it’s a huge headache for growth founders and executive teams to manage. And, you know, it could be six months to do an acquisition at its revenue-generating another quarter to do the integration of the team and the tech. And then another year plus of managing the customer relationships. Something else that these teams will be considering is the tech stacks that they had built on. If they have customers like, are they even using the same back-end systems is the software written in the same language. So it can be much harder to for these early growth companies to acquire businesses that our revenue generating, just given the commitment to the customers and the, you know, consolidation at the tech stack. So I wouldn’t expect those larger acquisitions at this time, tech and talent is easier. You know, those deals are usually done through direct relationship of the acquirers founders going out and prospecting great technical teams that they hope to know or already have a relationship with aligning on mission and like selling the value prop of finding a happy home being more productive, inside a larger business to continue to drive their vision, as well as the more de-risked upside opportunity for them personally.

Kostas Pardalis 29:03
One more question and I know that Eric wants to ask something.

Eric Dodds 29:09
You know me so well.

Kostas Pardalis 29:10
So, okay, usually, like what we’ll hear, like from, like, investors as advice for, let’s say, founders that are on a growth stage, and the next stage is that focus is like the most important thing, like you have to focus on your execution and be like really focused on what you were doing. And I would assume that like going through an acquisition or a merger or whatever we want to call it, it’s going like unnecessarily like it’s going to like divert the focus right. So there is like some associated risk around that. Because you are both an investor and you have like a lot of experience in M&A is so in these younger and not that experienced, founders that might have like the opportunity or the growth phase to go and acquire like a company, what advice would you give? What you would tell them? And what would you tell them like to be careful of?

Astasia Myers 30:11
Totally, like flashing headlight advice is take this very seriously, it’s very hard, right? acquisitions are usually longer than initially expected, as I say, six-plus months. Net doesn’t include post-acquisition integration, and it is there as product that is being sold. But it defined tech stack that can be a lot of time and make or break the ROI on acquisition. I mean, one of the reasons Cisco is so famous with their acquisitions is because of the integration to kick Das, and like meet them productive and often allow these teams to run a standalone business unit so that they didn’t impede growth. For growth, founders, there should be a framework to think about the acquisition. One is, is it truly an apple hire? How many months? And how much money would it cost for you to go find people from the market? And can you get great talent very quickly, to augment the team to drive towards that singular mission faster than before? You know, we used to joke, if you consider advanced HR, people that didn’t fit in the structure of your current compensation packages, or didn’t need to recruit before outside of the network, or another way of thinking about it is, is the product that they built, going to very easily fit into our tech stack, and accelerate our product development by X number of quarters. So that we can get more customers raise ACBS, and drive to build a bigger business faster. And the third is the Business Financial use case, if there are customers on the platform, if the size and magnitude of that and how it can immediately affect top line, if you integrated it or let it run as standalone is. So I can imagine that for each founder, it’s going to be a different motivation of what they need in that moment. But each different, the three different versions is going to be challenging for other reasons, for various reasons.

Eric Dodds 32:36
Astasia, one question on the flavor of acquisition that is technology focused. You talked about acquires that makes sense, right? Like, maybe something that the team that’s being acquired is built is a contiguous problem. But you’re really sort of applying like a brain trust, you know, to sort of her own vision, right. But when there is a, a really strong technical, you know, sort of formal product reason for the acquisition, how do you think about if the product is actually a good fit, right? Because you’re introducing sort of, you know, you’re heavily augmenting your existing product roadmap, you’re bringing in different functionality, user flows? I mean, the issues are multifaceted, right? Like if you’re actually going to try to do a product integration. And so how do you engineer experience? Like, what are the things that are signals that this is a good idea to do this?

Astasia Myers 33:37
The first would be market validation with existing customers or prospective customers, does this naturally fit into your vision of what our product could become over time? What is the credit criticality of this technology? And how much would you be willing to pay if we didn’t just do what a product manager does, day-to-day validation of the tech and what they shouldn’t be building for the future. The second aspect is really understanding the tire technical staff that the product is built on, if everything is built on BigQuery, and your Snowflake, Render, your built on Snowflake that can be very hard. It’s built in a completely different language, that can be very hard. So making sure that you have an understanding of the core tech stack and how easy it would be to integrate it because once again, integration is what makes these successful, it wouldn’t be too challenging that it’s probably not a good fit for the choir. And then the third thing that’s useful to validate is not as much on the tech side that I would highly say A like chemistry with the team, if you’re expecting it to be run as an individual product is crucial. I’ve seen acquisitions fall apart, because there wasn’t strategic alignment from both leaders about what the product should become as part of the acquire company. So it’s great point in time, we can add value for the next year. Fantastic. But really, I’m coming over as an executive, what should the product look like in the next three?

Kostas Pardalis 35:30
All right, that’s some great points and stuff that I wish I knew, like a couple of years ago, to be honest, but it’s never too late to learn about that path. So thank you. And I think that like, this is like information that’s going to be like very valuable, like for the people who listens out there on the show. So as long as Yeah, like we talked about, like the modern data stack, as it is today, right? And my cloud involved. As an investor, you’re always looking for the next thing, right? Like, you need to be ahead of the curve, let’s say it’s part of your job. Can you share with us like, what do you see as like, the next opportunities there when it comes to data infrastructure, like things that you are excited about?

Astasia Myers 36:17
Yeah. It’s always cool digging into new trends and what’s emerging. As I noted at the beginning, I’m a really early-stage investor focused on precede seed and series A so many of the great offerings and vendors that we talked about so far are way beyond me. And you know, off to the races. And so I’m always thinking about like, Okay, we have the pillars of the modern data stack. What’s next? As I mentioned earlier, I think one macro trend that’s exciting is the mullet for batch to streaming. While batch processing comes to as the majority of data workloads today, we’re seeing an increasing proportion of teams wanting real-time data, to support operational use cases. And so as I said, the ingest layer is changing from airbike and Fivetran, to thinking about Kafka, red panda, Roxa decodable, going to more real-time databases, like ClickHouse, Trino, Druid. That’s been really cool to see. And what’s super neat about real-time, it’s not just in the analytics data stat, we’re starting to see it emerge in machine learning as well, with continuous decorin and distributed serving of ml models. Think about you go on Netflix is you know, as you’re watching a TV show, it’s about murder mysteries, and then it very quickly learns your interest. And so when you’re complete that show, it automatically shows you more murder, mystery TV shows. And so that’s been really cool to see it move into ml. And the role of continuous learning. There’s some early-stage startups like clay pot that’s trying to facilitate those workloads. I would say that another trend that’s been pretty cool. And really interesting is there’s been a fragmentation we’re talking about, like portfolio and consolidated. You know, businesses offer many products. Something that we’re seeing on the flip side is the fragmentation around water table formats and query engines. So the water table format layer is like the patchy Iceberg and Apache Hudi. And Delta Lake. In this kind of reduces data gravity. It allows data to be moved across different environments, you also see New Query engines emerge, which is pretty cool. So unlike the analytic side, we’ve had Spark, but now we’re starting to our, you know, even Trino are talking about that earlier, that now we’re starting to see the emergence of like in memory, analytics with db. On the data, sitting on the ML side, we’re starting to see Ray and das that are trying to be like Python natives. So that’s kind of cool to see, like all these table formats, then also fragmentation at the query layer. The next layer that I’m trying to think about now is the ML semantic layer. People use Pandas for Data Prep. But then often when they’re building a real model and trying to push it production, they use PI spark. There’s a really cool business out there called Python. Are that’s the mode in which helps with distributed pandas to make it easier to do Data Prep at scale without having to graduate? I think it’d be really cool if we have like a semantic layer for ml that can go all the way from Data Prep to production. So teams aren’t rewriting our codes, they can actually push these models to production faster. I know I’m just going on here, but I feel like there’s some really cool stuff going on. I’m still excited. I know that a lot of data stack, it’s very well-defined plan still excited?

Kostas Pardalis 40:30
Yeah, yeah, absolutely. Absolutely. And actually, one of the, like, most interesting things with like, the modern like is that it’s very cool. Like, it’s a year to see how it changes, like, because it does, like you see all these new things that are, like, as we say, like, it seems like it becomes more and more complex. And as we said, that sample and we’re going to have consolidation, and like all that stuff. But the fact that this complexity exists is because it also like translates into innovation, where I like many things, and many great teams out there that are building some amazing technology. And I think that the velocity that like these new ideas are like delivered was like, it’s incredible, like it’s so really, really fast-paced, like, it’s very interesting to see like how from one year to the other things change in this in this space. And that’s like, that’s amazing. And I think you have you listed some very, very interesting technologies out there, like from Iceberg to DAG DB and other people are doing some great stuff. All right. That’s great.

I have a question that has to do a little bit more with the market. And usually, like, okay, when we are talking about data infrastructure in general, like, it’s not something that it’s new, like, we’ve been building databases, things, we have computers, right. But until quite recently, innovation was heavily driven by the large enterprise, right? There was like, the Bank of America of the world, like the very big corporations. And then of course, like the high-tech giants, like Netflix and tweet that they had to work with a lot of data. And a lot of, like many of these technologies were created, like, for their scale, let’s say, and they were driving innovation there. Another thing that I find very interesting with a modern data stack is that if someone like sees how it is defined, from a market perspective, it seems like a solution that feeds to the larger the price, but also it fits very well like to smaller companies, right? Like you don’t have to be at the scale of Bank of America to implement the modern data cycle your company and get value, right. And obviously, you don’t have to spend the same amount of money. So it still feels as though that there is some kind of disconnect there. There is like what is happening right now like in the enterprise in terms of like innovation and like evolution of the platforms that started with Hadoop in the early 2000. Until today. And then you also have like the modern data stack that pretty much like developed, like in parallel. Do you see these two worlds, let’s say in the data infrastructure merge at some point, or do you feel that they are doing like to continue evolving, like some kind of in parallel? And if they are going to merge, by the way, just to give a hint, here, I believe that they are going to merge. But when do you think that this is going to happen? And what is missing there?

Astasia Myers 43:55
Yeah, I love your reference about, you know, large enterprises having a lot of innovation. You know, do you often do the team incredible, the LinkedIn data team, incredible. That was a particular era. Then we went to Google and Twitter and undersuit and Facebook and what they were doing at scale, we had a whole bunch of great open source projects and founders come up those communities. And then the next evolution was, gosh, are you a lift maneuver? Like you’ve built some incredible stuff to support the data volumes? I feel like a lot of the great founders that have emerged and data over the past two and a half years have come out of the recently IPO businesses that were data-centric. Something that is very interesting to me is as a founder, it’s usually more opportune to go to mid-market companies as early-stage design partners, so you could have in-depth conversations consistently, and the faster time to product development and monetization. You’ve worked selling into enterprises. Gosh, everyone wants JP Morgan and AmEx and even Cisco, Walmart as a customer, those are going to be some bigger contracts, they’re going to be absolutely amazing, could transform the business. But gosh, that could be a very long process that could be nine months to a year, you could be interfacing with multiple different teams trying to get buy-in and seek back and alignment. The procurement and redlining process, if you’ve ever got a contract could even be, you know, three plus months. So if you’re an early-stage founder, you want to get high-quality feedback as quickly as possible for people that are willing to pay you money. And so that’s usually been market customers. So we often guide our founders to focus on the big market, and build a product that’s valuable, and then earn the right to go to enterprises. As they flush out their security and their compliance and identify enterprise needs, because the sales cycles are going to be longer, you can kind of see that in three data businesses like Fivetran, and segmented right, they wouldn’t the Midmark a got a lot of the loss it was logos would work with them very deeply. But we’re known brands and data-oriented businesses, people respected to build momentum and brand very quickly. I think that I agree with you, over time, these mid-market offerings grow into the enterprise, and start achieving larger ACV, six figures, seven figures, but it’s a process and it’s if they’re gonna be successful, if it’s a win. And most of the data companies we’ve been speaking with today, you know, outside of Snowflake, which is now publicly traded, you know, they’re earlier in their journey. And so if you’re five plus years old, you’re probably starting to move into the enterprise and see, so traction, but I wouldn’t recommend for series, a stage data company to be having those conversations, because it’s going to be a very long process for them. And it’s better to demonstrate repeated book sales and get product feedback that you go try to strike out, and, you know, get an IBM or equivalent business as a customer.

Kostas Pardalis 47:42
Yeah. 100%. Okay, that was, that was some great, great feedback on that. I’m sure that like many founders are like cynical that stuff. To be honest, there’s always this pressure of like, okay, when are we going to get our first enterprise customer? And I think, my feeling is that, okay, you can go for a long time without having to worry about whether price and I think you can see that also like with companies like Snowflake, right, like Snowflake managed to get to the IPO. Yeah, they had enterprise customers, but they were not an enterprise company, right? Like, they’re, they were driving a lot of like their growth from the mid-markets for a very, very long time. So you can get like, a lot of mileage, let’s say, by just like focusing on that.

Anyway, I know that we are getting closer to the time here. So just like one last question from me, and then like, I’ll give it to Eric, what about open source? Open source has been, traditionally, let’s say a big component of the go-to-market motions around data products. How important do you think open source is? And is it going like to remain important in the future for building a company that builds data infrastructure?

Astasia Myers 49:09
Kostas, thank you so much for the question. I love this question because I get this question all the time. I work with really early-stage, people who are ideating and thinking about needing their business and they’re like cautious stanziano, do I need to be open source, but I’m gonna go below that data and ml tool? And the answer is no, you do not need to be open source, you need to look at your subcategory. And what the precedent in the category has been, you know, I probably not recommend these days going and building that a quarry engine. It’s not open source. But and you know, there’s often a precedent in a lot of databases that you should be open source, but even there Snowflake is not and you do not have to be. So I would look at your individual segment and see what historically has been the go-to-market motion. Of what it’s opened. source score commercial. Another thing that’s really important to think about is, what is actually the open source value for your customer? Is it because they always want to see the codebase? Are they afraid that if you grow under each, you know, the, it’s such a critical layer in the stack that they’ll have to do migrations, which should be very painful? Is it some other rationale? And then also, what is the delay? It’s open source to the business? If it is a top of funnel pipeline generation, that’s one conversation that can be a free trial, you know, is it hey, we need awareness and with particularly with software, engineers, and you know, 70% of the software engineering stack is open source, we have to go open source that’s different. The last thing I would have people think about is the buying and adoption pattern for their product. If you’re considering open source, ideally, there’s a single-player mode, or the open source AI, as a data professional, or software engineer, take the open source, implemented getting up and running and add value. And I theoretically should have the credit card swiping capability to purchase as an individual unit. And then over time you build in team and corporate value to extend the contract size. If your work adoption pattern is multipronged, you have a purchaser, you have a stakeholder, you have a user. Sometimes you don’t necessarily need open source there because the user doesn’t have the ability to pay you. And he’s going to go, he or she’s going to go to a loss. And they’re going to have to have a salesperson come in and pitch and demonstrate value it would do or ROI calculator and move the contract forward. And then those examples, sometimes, even if you’re open source, you know, it does not generate pipeline for you. It creates brand awareness and helps you but you’re still doing top-down sales. And so theoretically, you didn’t need to be open source to begin with. It’s a nice-to-have versus a must-have. So as an adult think you need to be open source. It’s always a great question. I’m had a lot of back and forth on Twitter about this. But I really think you need to identify the segment. Purchasing and the intention for both the business and the user.

Kostas Pardalis 52:52
Thank you so much. That was great to hear from you about open source, that very controversial model. So Eric, your questions?

Eric Dodds 53:04
All right. Well, we’re close to time here. So it sounds yeah, one more question for you. And this may be unfair, because I know you love you know all of your investments equally. But if you were going to say, Okay, I’m not going to be an investor anymore. And I’m going to go found or start a company in the data space. Which problem area would you focus on? Like, which, like, on a personal level, which area sort of interests you the most that you would want to sort of sink your teeth into operationally, as part of whatever founding team, go-to-market team, however you want to look at it?

Astasia Myers 53:44
Yeah, that’s a great question. I’m pretty pumped about the movement in ML single data-centric ML. From model-centric, model-centric being, hey, we spent a lot of time building customized models and tried to move forward the architectures of them graph-based, you know, large neural nets. Data-centric is this movement where, hey, you can find a lot of great models online, download them off of GitHub, do the rebalance thing in the training, and it’s really going to affect the outcome is that data collection, labeling and quality and having insights into that will be critical for the model’s value. And so, for me, I think that’s a really exciting space. It’s kind of the analog of the data observability space applied to machine learning. Sure. And so there are some really cool businesses operating there, like unlocks and Galileo and so I think that’s a really cool movement because it’s a macro trend once again, like new or triple the cloud. analytics. That’s a macro trend. It’s in the data path. It could be a daily use tool. And it directly affects the end user performance, the model they building and then in the role itself, so yeah, we have a great.

Kostas Pardalis 55:15
Can I add something, Eric? If judge by her reactions when we were going through the next stage of modernity that’s like, and what excites here? I think that she would start the company around this multiplayer for ml. Like she was the most excited when she was talking about that. So that’s my prediction.

Eric Dodds 55:38
Love it. Well, it sounds Yeah, this has been so wonderful. Thank you for giving us a little bit of your time, we learned so much. We’d love to have you back on the show sometime.

Astasia Myers 55:47
Awesome. You guys rock! Thank you so much for having me. I love all things data. I love talking about what’s happened, what’s coming, all the cool officers and startups. So it was an absolute pleasure. Thanks so much, guys.

Eric Dodds 55:58
I might be stealing one of your takeaways. But I really liked Astasia’s perspective on open source. In some ways, some of the answers were really simple. Like, if you’re trying to give people visibility into how your product is built, and how it works. There are a lot of ways to do that. Other than just literally like having a repo that everyone can see on GitHub, right? Because there’s a lot that goes into sort of having a successful open source project in terms of the community surrounding it contribution, all that sort of stuff. And it was just refreshing to hear her say, there are a lot of ways you can give the same or a similar experience to users without having to, you know, sort of make a part of your product strategy and a true open source effort, which was fascinating and really, really interesting here.

Kostas Pardalis 56:58
Yeah, absolutely. I think he gave like some amazing like, actually advice and many different topics. And obviously, like water all one of them was about open source and how much, like you, it is important to invest in open source. But I also what I keep from the conversation that we had is that we’re at the stage right now where you can start like a data infrastructure company and go after like the mid market, you don’t like 10 years ago, like you pretty much had to go after like the enterprise view, like company or Updata. You don’t necessarily have to do that anymore. And I think that’s something that gives like a lot of freedom to the printers out there or like people who are considering of doing something like that. So I get, I definitely would keep it that from our conversation with her. That was amazing to hear from an investor.

Eric Dodds 57:50
I agree. All right. Well, thank you for joining the show. Thank you for dealing with my raspy sultry voice. I will make sure that I can talk normally again before the next show.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.