Episode 178:

How to Build a Data Stack to Win PLG, Featuring Peter Chapman

February 21, 2024

This week on The Data Stack Show, Eric and Kostas chat with Peter Chapman, Peter is a consultant who specializes in helping PLG companies drive more revenue with data. With a background in data and revenue operations, Peter shares his experiences in building data stacks at startups like Heroku, emphasizing the early consideration of data architecture to avoid future issues. He highlights the significance of a cohesive data stack for product-led growth companies and the unique challenges faced by open-source companies in commercializing their projects. The conversation also explores the operationalization of data, the importance of aligning sales with a company’s technical ethos, debating the balance between inference and training costs, the strategic approach to margins by focusing on enterprise features over infrastructure reselling, and more. If you’d like to contact Peter about his advisory services, his email is peter@chapman-coaching.com.


Highlights from this week’s conversation include:

  • Peter’s background and journey in data (0:26)
  • Introduction to PLG (4:18)
  • Starting in data at Heroku (6:05)
  • Building the data stack at Heroku (8:13)
  • Data stack requirements for early-stage companies (12:00)
  • Differentiating PLG companies from open source companies (19:26)
  • Venture capital and open source as a lever for growth (22:56)
  • Initial data modeling and analysis (25:38)
  • Operationalizing Data (29:16)
  • Sales and Marketing Operationalization (31:52)
  • Identifying Signals (34:16)
  • Challenges in Developing Signals (37:07)
  • Account Management for Developer Tools (42:30)
  • Challenges in Achieving Margins (45:02)
  • Leveraging Infrastructure for Margins (47:35)
  • Inference vs Training (54:55)
  • Final thoughts and takeaways (57:02)


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Eric Dodds 00:03
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Peter, welcome to The Data Stack Show.

Peter Chapman 00:24
Great to be here.

Eric Dodds 00:26
All right, give us a quick background of your I guess, data history data journey, I can’t think of a catchy term.

Peter Chapman 00:35
Well, it started at a company called Heroku, where I joined. Heroku was just starting to build its business function. And I ended up building the data and revenue operations teams there. And spent some time there and had a two year stint in venture. And since there have been, I guess, the term as a fractional data and revenue leader for a bunch of developer tool startups. Awesome.

Eric Dodds 01:06
Very excited to chat with you today. Likewise,

Kostas Pardalis 01:08
yeah, and now it’s based on the screen. It’s like, my turn to talk. But you know, one of the things that I’ve learned and like Goldrick used to from Peter is improv actually show, I’m going to change a little bit like the script. And I like to add something here. So being there is actually like the reason that I first came to San Francisco. And he was when he was at heavy beats. He was a customer, actually, although I think you were not paying but we’ve had because, you know, we’re good at the rev share and like reporting, of Blendo. And I remember, like talking with him, and he was like the person who acts like Alencar . That’s me like coming to San Francisco in the Silicon Valley. Like, until that time, I was like, oh, like, Okay, what’s just like a, like a tiny thing and the other side of the world that probably nobody cares about. So what I’m going to do there. So that’s how I came the first time. And actually, I spent a lot of time at the heavy bit offices. I have a couch there. I was very cozy. This 3000 There was also very interesting for me. That was on the ninth right. In San Francisco.

Peter Chapman 02:32
Yes, yes, like deep Soma. Yeah,

Kostas Pardalis 02:36
deep Soma. Anyway, so I’m really happy to have him here also, because of that, like the personal information I have with him, but also, he’s a person who has like an extremely deep knowledge of how data is actually used to deliver value to companies starting from Heroku, which was a company pretty much ahead of its time, from what it seems seeing like how things like, come back today. And companies are like, Oh, we’re going to rebuild Heroku in 2023. Now that Salesforce decided that they don’t need it. So it has a very deep knowledge of that. And like I’d love to hear and talk more about the connection of data and how the need for data emerges in the company and how it gives like sparks, let’s say like different functions inside by revokes share about plg. And what it is and why it is important and understand better what it is. And those that have a little bit of like how things have changed from Cheryl Bucha. Ai, craziness that we have today, right? Because things change, but also remain constant in a way when it comes to business. So it’s great to have someone like us who has seen things in so many different contexts, like from chakra to today and converted, like, learn from that. So that’s what I have in my mind. I’m sure we’re going to come up and improve a little bit with the questions hopefully. What about you? What would you like to talk about?

Peter Chapman 04:17
Well, I’m always excited to talk about plg. Question for you is this. We’ve beaten this topic to death on this show, or do we think that the audience is excited to learn what but to teams as well as guilty stack looks like?

Eric Dodds 04:34
We’ve talked about it. Yeah. Okay, great. Yeah,

Kostas Pardalis 04:37
yeah, we’ll see. I mean, I think and then what we always try to do is like we try to drive the conversation based on the curiosity of my curiosity, and like Eric’s curiosity. So we’ll see. I’m pretty sure that at some point, we will diverge. But I definitely want to learn more about building at least one of these things that remains kind of abstract but feels important to me. So yeah, I’d love to hear from you like how you think about it and how you implement it also, because you’ve done that. And

Peter Chapman 05:16
I’m always excited to talk about plg developer tools and how you actually make money off these centers.

Kostas Pardalis 05:23
What do you think of Eric’s? Should we go on? And like, regard? I’m ready.

Eric Dodds 05:27
I’m ready. Yeah, let’s do it. Right. We are here with Peter Chapman on The Data Stack Show. Peter, thanks for joining us.

Peter Chapman 05:38
Great to be here. Thanks for having me, Eric.

Eric Dodds 05:41
All right. Well, give us a brief background. How did you give us a, you know, a brief background in the intro, but I’m interested to know, how did you get into data? I mean, you’ve had sort of an industrious career across multiple companies, but is it something that you were interested in? Did you sort of happen to it?

Peter Chapman 06:04
Hmm. You know, I studied math in college. And I think, so I’ve always had, I’ve always had sort of a quantitative slant. And I’ve enjoyed looking at the world through a quantitative lens. When I first joined Heroku, I was not a data guy, I joined to manage partnerships. Yeah. I was hired to be a partner manager. And, you know, because I’m mathy, I guess my first step was like, alright, well, how do I know? How do I measure the impact of this work? Like? How do I figure out which partners are bringing us revenue? And which partners? Should I spend time alone? And like, can I ask for more money to invest in partnerships? Is there ROI there? And so I started trying to run some reports, and it was really hard, like, we just didn’t have good foundational reporting about revenue. So I ended up building a sort of holistic model of revenue at Heroku, just so I can understand how I fit into the picture, right? I said, like, all right, this is what I think revenue looks like most of it comes from customers that look like this. And customers grow like this. And attrition looks like this. Oh, and by the way, partners do this. And that was interesting enough to leadership at the time that they were like, Oh, do that. Like, like, we can get someone else to do the partnership stuff. But understanding our business from an end to end way and seeing what the letters are, that feels really useful. Let’s get you a T. So that’s how I stumbled into it. Wow,

Eric Dodds 07:52
That’s super interesting. But one thing actually, I’m interested to know, you talked about the reporting being really difficult. What was this stack? Like? And how did you know? Was there a stack in place? Or were you hitting the financial system and querying prod to figure stuff out?

Peter Chapman 08:13
Yeah, I mean, we were wearing copies of prod rochen. At the time, it was a very mixed blessing to have hired mostly engineers, which meant that the product people were also incredibly technical, all of them were fluent in SQL. So it meant that from time to time, a product person would be like, Oh, I wonder how this is doing. And they just query the database and produce what we call the data clip, which is like a saved SQL query. But there was no data warehouse, there was no BI system. And as you can imagine, asking questions that spanned multiple sources of data was nine possible. So a lot of what I did over those early years was build a data warehouse, install Looker, build the sort of fundamental infrastructure, we needed to both run reports and operate the

Eric Dodds 09:08
business. Do you when you look back on that? Is there anything you would have done differently and built out that stack?

Peter Chapman 09:15
Well, the tooling was so different back then, you know, this was like, pre DBT. There, the ETL tools available were less good. We were doing a bunch of database copies. And I wouldn’t do that today. But that’s what was available to us. Yeah.

Eric Dodds 09:33
Yep. super interesting. One thing, so we were talking about this a little bit as we were chatting before the show, but you’ve built stacks a number of different times at a number of different companies, both large organizations and startups, but less focus on maybe a startup or maybe more of a blank slate. Maybe Heroku is a good example, right? But in modern day where there really is no stacking place, right? There is no data warehouse. Where do you want to start? Like, what is your sort of minimum viable stack? If you were doing that again today, knowing what you know now, obviously. Okay,

Peter Chapman 10:11
So let’s start with the requirements. When I first start working with companies, the first conversation I have with them is, as I say, walk me through your funnel, show me everything you understand about your business, from website visit, to payment, and upsell. And one of two things happens, either you walk me through your funnel, or you as a founder look kind of embarrassed or like, you know, like, here’s something you know, I know a little bit from stripe. And then like, here’s, I think this is what Google Analytics is telling me, like, I have kind of a fuzzy picture of what’s going on the marketing side, but like, this MTM funnel you’re talking about, I don’t actually know, right? Like, I can’t actually tell you the ROI of a sign up. And if that’s where you are, and this is pretty common for the companies like earthquakes, great step one is going to be to help you build this. And to build this, we’re going to have to start dumping a bunch of disparate data into your data warehouse. So kind of sep zero for me, is something that consolidates data, right? So something like, I’m gonna reference RudderStack. Something that pushes data from SAS services into your data warehouse, something that pushes data from your production database into your data warehouse, you almost definitely need DBT from day one, it’s going to make your life easier. I’ll stop there. I think that’s step one. And we can talk about steps later.

Eric Dodds 11:49
How many companies do you like the companies that you work with? And how many are, you know, looking at stripes and Google Analytics? I guess, maybe maybe a better way to answer that as the tools have become very accessible today. And so are more companies adopting that stack? Or is it just that the vendors want to believe that?

Peter Chapman 12:15
No one, it’s really rare for companies I work with? Well, if you’re a seed, or early series, company, and you don’t have a full time marketing team, your Hubel analytics is completely neglected. And like your website, conversion data is totally ignored. Yep, you might, because stripes are a really easy way to get revenue data. And all companies care about revenue. You might be pulling stripes for financial information. But as I’ve mentioned, like the stripe gives you no ability to see conversion rates or understand attrition or growth within accounts.

Eric Dodds 12:54
Yep. How many people? Oh, yeah.

Kostas Pardalis 12:57
Got one question here. Because you both like talking about, like, the moments where, you know, like, as you said, like, Peter, you asked the question like, okay, take me through, like your father. And I think it’s like, also here a question about like, timing. So when is the time right, for a founder or like, for a company, right? To actually engage in this conversation, and try to formalize, let’s say, like, the funnel and the business itself, right, like, because that’s what it is at the end. Like, the funnel is, like representation of like, how these things that we call like a company actually generates value for both sides. Right. So when is the right time to do that? Because, okay, I can also see people over engineering that to me, in a way. Right. So based on your experience, like when should people do that?

Peter Chapman 13:55
Well, I advise companies to start thinking about their data stack from day zero. And that doesn’t mean you have to buy DBT and a bunch of data connectors and set up a large data warehouse. But just having a rough road map, right, like having a very simple google doc, that’s like, Hey, here’s the reporting. We want to get to, here’s what we think we’ll need to get there. can help avoid a lot of architectural pain later, so much of this stuff is easier to build and rewire. Right? I’d say the timing of when you build your stack, is a function of your go to market approach. So if you’re doing top down enterprise sales, give me an example of your favorite enterprise product. Ghostess. My

Kostas Pardalis 14:56
favorite enterprise product Oh There’s no such a thing like say where the price product.

Eric Dodds 15:04
Don’t think about this all the time.

Kostas Pardalis 15:08
Oh, let’s say I don’t know like buying IBM, while there were a lot means

Peter Chapman 15:17
was not about IBM, let’s say you’re selling a security product. Yeah. And there’s you’re not, there’s no bottoms up motion, they’re just selling directly to teams, you could probably get a lot of what you need out of Salesforce, or pick your favorite CRMs. Provided you have good discipline around Salesforce, right? Like, maybe you have a Contact Us button on your website, and you want to see how that converts. And you’re doing a bunch of outbound SDR stuff, and you need to do that conversion. But because your funnel is a sales funnel, as long as you’re instrumenting your CRM correctly, you don’t need a lot of sophisticated reporting out of that. In this case, you may not even be using Stripe, right, that you’re just invoicing your clients. So if that’s who you are, top down enterprise sales, I live and die by the success of my sales team. You don’t need BI for a while. In fact, Salesforce like all these tools comes with their own tool specific analytics suites. So you can get away with just using the Salesforce reports or just using the HubSpot reports. That’s obviously one enterprise top down sales, often to PLD, our favorite acronym, you might also hear me refer to it as bottoms up. When I say plg, I’m talking about companies that people can start using without talking to a sales rep. And then have a lot of organic usage appeal if your company is a company where most of your users start using you without ever talking to you. And then it’s your job to figure out which of those users to talk to you. If that’s the role you’re in building a cohesive data stack becomes a lot more pouring. Because now your sales pipeline is just a sliver of the information you need. Right? There’s some important activity happening in your CRM that you absolutely need to track. But a lot of really important activity that determines the overall success of your funnel is happening on your website. And it’s happening with your product. So tying all that together becomes a lot more paramount and needs to happen a lot earlier in a company’s journey.

Kostas Pardalis 17:35
Okay, that makes sense. Sorry, one last question, Eric. And I’ll give the microphone back to you. But you mentioned like the bottom sub motion there, and you should like you’ve have users that they work on, like they interact with the company and the product. And probably they won’t even like to talk to anyone, right? Yeah, there is also like, especially developers to link like, there’s a lot of like, part of it is also like the open source. So is open source and product, like growth, like compatible things or different things? Like how do you how do you combine them in a way because and the reason I’m asking you is because okay, I’ve seen and I’ve experienced like a starburst, like the most traditional version of that, which is, you know, like, we have an open source project used by all the fortune 500 companies out there, in production on their own having actual people getting paid to share it up and run it, and then a company that’s trying like to monetize that. But at the same time, the company itself is a very sales driven company, right? So it’s kind of like a weird mix of having, like, the extreme version of bottoms up in a way, but at the same time, like the company itself is like doing the more traditional enterprise sales, driven, like motions, like getting implemented there. So how does this thing work together and how have they also changed, right? Like, because, like, what Trino is doing or like what spark was doing in 2010? It’s not necessarily like companies in 2023 that want to incorporate open shorts in their strategies are doing right, like things have probably, like changed. So have you seen What’s your take on that?

Peter Chapman 19:26
Okay, so I’m gonna answer a slightly different question. I’m gonna answer a slightly different question. Okay, she is, I’m going to answer a slightly different question, which is, what is a plg company? I would say that there’s three ingredients. One, you need a product that is immediately valuable to one person ideally, or a single team, so there needs to be an organic way for usage to grow from one person to many people, or from moderate usage to lots of usage, right? So that could mean, one user invites another user invites another user, or could mean, I build a prototype app, and then I move into production, and then I build more apps. And the third thing you need for this motion to be successful is you need a way to talk to your users without being very open source companies, most open source companies that I’ve worked with, get number one, right? Sometimes they get number two, right. But if you’re really open source, you don’t have a number three, right, you don’t have any way of contacting your users, you’re not getting any information about them. And even if we do get information about them, it would actually feel really strange to reach out to them because it’s breaking the expectation of open source. So in less unless you’ve built a product where customers are actually logging in, and having sort of an in product experience. Your open source company is not a plg. Company. Okay.

Kostas Pardalis 21:10
But you can turn to open source combined with a big company, if you believe in the third part, right? Yeah. Okay, that’s interesting. That’s like, also like, we have the experience like that when I was at RudderStack, because we were trying to do that. It’s like a huge conversation. And I think it’s really hard for people like to do drives. Open source is always tricky. But anyway, back to you. I don’t want to.

Eric Dodds 21:38
Yeah, it’s and it really is an interesting conversation, because I think that Peter would be interested in your thoughts on this, right? I mean, there are obviously examples, multiple examples that we could all think of, of gigantic companies that grew out of open source projects, right. But when we think about technical tools, data, tools, etc. That is really strange. You have this ethos of open source, but then you mix in venture capital.

Peter Chapman 23:13
And a very classic play in the dev tool space, is to take an existing open source project that you have built, and go to venture capitalists and say, Look how many stars I have, look at my downloads. I am building a hosted version of this. Can I have money? Investors love investing in tools like this, because there’s already a proven user base. And A Proven use case, right? The mechanics of getting it right are tricky. I’ve seen a lot of companies build really successful open source projects, and struggle to build successful commercial offerings on top of them. One of the things that makes it hard is you’re always competing against yourself. Yeah, exactly. So if you build a really good open source project that’s easy to install and use. It can be very difficult to build a hosted version of that that’s actually competitive,

Eric Dodds 24:26
right? Yep. Yeah, no, that’s, yeah. That’s a super, super helpful perspective. Yeah, it is interesting. Like, the success of open source is actually a limiting factor in your commercial growth, right. But it’s also an ingredient to the success of the technology generally, which is quite a needle to thread. One thing that we were talking about a little bit before the show was How you use some of the data that is being combined in the warehouse, right? So let’s use the example that you talked about, you know, which is you have product usage data, you have financial data, you have multiple sources like going into the warehouse, right? And someone’s going to use it to like DBT, or write SQL, in order to model, you know, model that data. What are the first ways let’s say you get that stack set up? What are the first things that you do with that? What is your, you know, sort of in sequence like the playbook of like, okay, here’s where we start once we have all that data. Yeah, so the first

Peter Chapman 25:39
The thing I build, almost all the time, is a table where the rows are accounts. And the columns are timestamps. And the timestamps represent important lifecycle events. So look, I want to see the first website visit, the first sign of the first product usage. Depending on the product, I might look for things like the first invitation sent and the first invitation accepted. First payment above 100, for a steam at about 1000. The reason this is so important is because without this consolidated account based view, it’s really hard to know where to focus. And in particular, a lot of the tools. If you’re not building a consolidated view, it’s easy to get confused, because the tools that you’re working with will give you user specific views. But if you’re building a team based product, your user conversion rate is different from your account conversion rate. And that’s a super important distinction. Yep. Tell me if I’m going if this is too wonky for you.

Eric Dodds 27:01
Makes total sense. Yeah. Yeah, it makes total sense.

Peter Chapman 27:03
And so actually, like, the way we do this, you know, if we’re, if I were to sort of get into the weeds of DDT, is I actually first want to see this at a user level. Alright, so like, by user, when did they first visit the website, sign up, use the product, start paying, and then I want to aggregate that at an account level, and take the minimum time for each user at that account level. So I can get that consolidated. Thought? Yep. Okay, so that’s one thing we’re doing is we’re building this foundational model that has users with timestamps, accounts with timestamps. Oh, and part of this, that’s always a fun part is actually defining what an account is, because you’re gonna have a bunch of competing, right? Almost all the companies I’ve worked with, you’ve got an internal representation called like an organization or a team that comes from your products. Yep. Then hopefully, you’re using a CRM, let’s just say using Salesforce. So Salesforce has an idea of what an account is, it’s called an account in Salesforce. Those are probably different definitions. Yes. And it’s possible that a single company in Salesforce has multiple teams associated with it. Then you’ve got this third heuristic, which is that people who sign up from the same non free email address are probably in the same company. So part of what you’re doing is you’re building a foundational data model. Is it just like getting the entire company to agree on what is an account? Yep. What is the source of truth for what an account is? So that’s like, when my sales team talks about, you know, revenue by company, they’re using the exact same language as my marketing team, and my product team and my finance team? Yep. Okay, so that’s kind of track one, which is just like building this foundational user account model that lets me see my funnel. Part. track two, for most of the companies I work with, which are plg companies, is starting to operationalize that data in order to proactively engage with your most important customers.

Eric Dodds 29:25
Makes total sense. And you’re doing that on the marketing side and the sales side, like when you say operationalize what does that encompass?

Peter Chapman 29:37
Yeah, so I think about this as having two domains. One is, I call it hand to hand combat. You want to make sure that the people who are talking to customers are talking to the right customers at the right time and they know what to talk about. That’s the most important thing, like step one. Even before you’ve built a sophisticated account and user final thing is you need to make sure that the people who are paying you are talking to you.

Eric Dodds 30:07
Hmm. Yeah, I’ve been told very often that it sounds simple. It’s like, Yeah, I mean, that sounds obvious. But that’s actually quite hard.

Peter Chapman 30:17
Yeah, it’s funny. I remember my early days of Heroku, we were faced with this problem, right? We had a ton of paying customers and a really small sales team, we had like three salespeople, and 10s of 1000s of people paying us. So the question of what we engage with was paramount. And people were really developing, like incredibly sophisticated mechanisms to determine what a promise and account for like, right, someone was like, Oh, I think if their usage grows by more than $100 in a week, that’s a signal, right? someone’s like, Oh, I think if we see a production database, that’s a signal. And my signal was a, if they’re in the top 20% of revenue. That’s the signal right? Before you start looking at product usage, or revenue growth, the first thing we’re doing is we’re setting up a sort of good defense, which means the people who are already paying us the most need to have a relationship with us. Yeah. That was a total tangent, take me back, was the reflected question here?

Eric Dodds 31:26
Yeah, no, we were talking about, like, what you do, you know, sort of the uses that you have the day like, we use that operationalizing. And so I was asking, you know, what does operationalizing encompass right? So in sales is obviously an example like, which company? Should our salespeople try to build relationships with? What other things do you like on the marketing side? How are you operationalizing the data? Great,

Peter Chapman 31:52
so I almost always handle sales before marketing. If you are, you know, every single company I’ve worked with their revenue falls into a heavily skewed for either distributions, right? Where it’s like a huge number, a huge amount of the revenue is coming through a relatively small number of customers, which means that we have to invest in really good personal relationships with those companies before it makes sense to worry about our long tail. Yep. So sales comes first. And that means two things. One, it means telling your sales people or your account managers or your customer success people who they should be talking to. And the second component of that is telling them when to initiate a conversation and what’s relevant. Right, so I’ll give you an example. We’re just building alerts here. And an alert could mean like, all right, if a customer spends over $1,000 in a month, assignment to a sales rep table sales rep. But another alert could be if a customer’s usage drops by 20%. week over week, send that sales rep a Slack message, trade a task for that sales rep. In the CRM. Yep. It could also mean, if a customer is using this feature of the product. Tell the sales rep because this is a good thing to initiate a conversation around. Yep. Once please cast us.

Kostas Pardalis 33:20
Yeah, sorry for interrupting. But like one of the things that I observed, like hearing the conversations that we talk a lot about, like signals. And okay, you gave some heuristics about Yeah, like, don’t start over engineering, trying to figure out like, you know, like, the best possible signal there, start with the basics, like someone’s 20% Go and like to chat with these people. But as you go through, like, what you’re saying, Now, like, you keep coming up, like with signals, and it feels, to me at least, like the signals are very context sensitive, like it’s business or like products, actually, it sounds like has to do with the product a lot. But those are a bit about that, like, how do you think about finding the right signals there? And while they look like,

Peter Chapman 34:14
Great, I use a four step process. Signal Number one, existing spent. Good news about this one, it’s both your most important signal and it’s pretty easy to capture. Yeah, Signal Number Two demographic data, by which I mean company’s customer size and customer revenue. You’ll use a data augmentation tool like Clearbit. To get this information HubSpot now has this built into the product. You want to make sure that if someone from Nike signs up for your product, they’re immediately getting excellent support and a really friendly competitor. Signal Number Three So you’re gonna get like 80% of the value you can get from these two alone. Okay? And very often, if you’re an early stage company, the my approach is like, Let’s build these and then pause, this, give your account team time to digest this, see how they do when they’re, like, get people used to this new signal based assignment world, make sure that we’re actually seeing an effect from this. Let’s let this marinate for at least three months before we touch it at all. Okay, so let’s fast forward in time, let’s say you’ve built your foundational signals. Your customer team is not overwhelmed, and you’re trying to eke some more percentage points of growth out. Now it’s time to get to signals three and four. Signal three is usage. And I’ll say it’s, it’s sort of choose your own adventure, right? Where you might develop your own intuition about what features signal, a propensity for growth, and start running experiments. It’s pretty easy to measure this stuff, right? So you could just say, let’s use the production database example. You might say, All right, I think that everyone who uses a production database deserves account management. So let’s start assigning these folks to account managers. Or, let’s start assigning most of these folks to account managers. And let’s measure what has. And the fourth approach, the most sophisticated and expensive approach is ml derived signals, where instead of you guessing what the signals are, you dump all your data into a vendor’s database, and you say, Hey, you tell me what I should be looking for. And that vendor is going to spit back a scoring model for you, that you can use to generate a signal.

Kostas Pardalis 37:07
I would assume that this last one also requires a substantial amount of data to make sense of that, right? Like, it’s not something that’s because I can see especially I don’t know, like, first time founders with engineering backgrounds that, you know, they, they feel like, okay, technology concepts of everything. So let’s throw a model on this data and see what the model was going to say. But usually, the model is just random shit coming out, because you don’t have data yet that can support that. So I think I, in my experience, always my advice is like, avoid the sophisticated stuff until you are really growing and have a lot of data points there. And most importantly, in my opinion, you have personally built intuition about your business. So outside of what the math can tell, also, you can, like, use your intuition to assess how these things work or not work. But anyway, that’s my, my experience at least. Eric, back to you again.

Eric Dodds 38:16
I agree with that. Because the other thing is, I was actually going to ask you about this, Peter, like, one of the things you have to think about is that in an early stage company, a lot of things change, right? And so if you tried to develop really sophisticated, really first sophisticated was, you know, sort of propensity scores, right. But then there’s a significant change in your product, or the way people use it, or, you know, those sorts of things. It’s like, some are pretty dynamic and early stage companies, right. And so it makes a ton of sense to sort of focus on the first two steps, because those are going to remain stable, even, you know, or somewhat stable, right? If you will get existing revenue, and then the demographics, right? Even if there’s significant changes in your product, or model, those are going to be fairly durable.

Peter Chapman 39:12
Either. The other thing that makes this tricky is that the ideal signal is not propensity to spend, it’s a signal of how much you’re able to be influenced by human interaction. Hmm. Right. So there’s, it’s easy to get sort of false positive signals that measure inevitable growth. Right? Like you might say, if you’ve doubled your data into a machine learning model, it’s very likely that model could be like, oh, people who put down their credit card are way more likely to spend money. It’s like, well, yeah, we know that. But I don’t know that everyone who puts on credit card information needs to talk to a sales rep. Yeah, we

Eric Dodds 40:03
Some were super nerdy at RudderStack. And so we had were, I wanted to look at a couple different, like multi touch models for our funnel, I let’s just see what you know. And it’s funny, like the first one is the guy like, here’s all the data points and whatever, right? And that’s like, people who respond like one of the things is like, oh, man, people who respond to an SDR email are really, they’re the most likely to go on to have like a sales deal. Right? And it’s like, wow, that’s, that’s amazing. Yeah, that stuff is super interesting. I feel like I’ve been monopolizing the conversation here. And you haven’t, you’ve jumped in. But what questions do you have for Peter? Yeah.

Kostas Pardalis 40:53
Okay, we’ve talked about like, plg, we gave a definition. That’s one of the things I like and I really want to hear about. But one of the things that I’d like to ask you, Peter, is, you’ve seen growth in tech, from, let’s say, the early SaaS and cloud days, up to today with like, the craziness of, you know, like people literally killing for GPUs out there to do something like with AI, right. And things, again, like my feeling is that things change, like very rapidly, but they also remain constant. In some ways. They’re like some things that you know, like learnings that you can take from the days of Heroku. And they still apply like today, right? So I’d love to hear from you about that, like these things that what have you seen, that remains, let’s say like, constant when it comes to building growth functions, right? And what has changed because of like, not that, I wouldn’t say that necessarily the technology, I would probably say more of like, the demands from the markets. Because my feeling is like that drives more change than actually like the technologies out there. But I’d love to hear from you because you do have, in my opinion, a very unique experience, going back from all these different phases of the industry.

Peter Chapman 42:30
What’s the same and what’s changed? You know, one constant is that if you want to sell to developers, you need to talk to your customers. And I know this sounds maybe really obvious. But I need to say it in public. Because I watched so many developer tools, startups get built by engineers who would love to not do sales, right, this is maybe the trap of building a successful big company is, you might delude yourself into thinking, well, I’ve got this great open source product, I’ve got this great ready to use product, I’ve got this great product that anyone can sign up and use for. So all I need to do is build a great product experience, and really good documentation and maybe hire a support team. And then roads should just happen, right. But the lesson that’s been hammered into me time and time, again, is that if you want to see real significant, sort of like venture exciting revenue growth, you need to get serious about account management. And very often that means and this is tricky, because when you think about sales, it might feel somewhat orthogonal to the company culture that you as a technical founder are trying to build. Right, you probably don’t see yourself as a salesperson. And you, as an engineer, hate being sold to and don’t want to have to get on a call with someone to use their product. So figuring out how to integrate count management into a Developer Tool Company, not just from a technical and operational perspective, but also from an organizational and cultural perspective, is both critical to company success. And difficult, right, like we probably don’t want to hire your standard enterprise sales, bro. You want to hire someone who speaks the language, who gets the culture, who knows when to engage and who is also smart enough to not be the pushy salesperson when that’s not the right approach. So that’s the constant. You have to do sales and you have to do sales. Right. And it’s hard. It’s part of this industry. Yeah. Yeah. I’d say what’s different and unique about machine learning is boys apart. To find margins, you can read about this, we’re seeing company after company come out with the fact that they have negative margins. I was just a duplicate and replicated that negative margin for my entire time there. I could talk a lot about why margins are hard. But maybe the TLDR is that, like, boy, is the machinery expensive. And I don’t like that the machines are really expensive. And the market expectations are really low. And everyone is racing to grab market share. So there’s a real market temptation to produce negative market products.

Kostas Pardalis 45:48
That’s very interesting. Do you think the answer to this is, will come from? Like, I don’t know, some hardware, like having more availability of hardware there and like pushing prices down, or from, like a paradigm shift of like, how products are built on top of this hardware? Because at the end, like I mean, okay, like, we wouldn’t like to be very, you know, like, reduction is here at the end, a company like replicate what shells is like an API on top of GPUs, right? That’s what it is, like, gives GPUs to people like to go and do it in the same way that like a company like Spark, what they’ve shorted, like Databricks, who added sales is to a very specific group of people to be disciples, right. And that’s where we started, like, building your like margins to like the abstractions that you answer. And like, obviously, it’s like with multi tenancy and like, all that stuff, that’s okay, well, at least like we should be usable, as they have, like, figured out a little bit like better, I would say, like Uber like two GPUs, but where do you think like the margins will come from at the end? Because they have to, right, like it has to turn into a viable market at the end. Like we can’t just donate, you know, like hardware out there using money from pension funds. Right.

Peter Chapman 47:13
Right. Thank you to the school teachers of America for powering our API’s. I’m thinking about, yeah, where do margins come from? Question. This is a lot of different levers here, I’ll spit out some of the categories that I think about when I think about markets. Okay, one is totally be under control. And it’s just like, as harbor gets cheaper and better, it’s easier to find margins. So you could say, like, hey, got bad margins. Now. We can just wait. Maybe we can just wait. And margins get better. To your internal infrastructure engineering is paramount to find good margins, trying to figure out how to approach this here. So there’s a trade off. There’s a trade off between latency and cost efficiency in any infrastructure product, but it’s especially impactful on machine learning products. Let’s say you’re let’s say you’re building an API on top of machine learning model, or you know, you have an API that just serves images, you probably want, you care a ton about user experience. So you need a relatively fast response time. And you’re already burdened by the fact that like machine learning models, like they’re, they’re inherently chunkier than, you know, asking your Rails app to tell you what’s in the cart, or whatever. Yep, in order to get the lowest possible response time. You want to keep whatever it is you’re serving, running all the time. But it’s really expensive. And so, you know, what I see companies doing is they’re like running stuff. Let me backup with it. The faster you’re able to boot up an instance, the better your margins are. So the ability to like quickly lower load, whatever machine learning model your serving onto your instance, and let it serve, let it respond to calls. That has a huge impact on your margins. Consolidating hardware has a big impact on margins, right? You get the best prices when you buy reservations. Do you know what a reservation is?

Kostas Pardalis 49:52
I think so. I mean, it’s like you go to someone like AWS and you pay up front for a certain amount of compute, and usually for a certain period of time also and you get a better price for that, right? It sucks.

Peter Chapman 50:08
Yeah, exactly. So part of the way that vendors, like Heroku, or blech replicate fine margins, is they buy a lot of computers, and they commit to using that computer over a long time period. And that allows them to deep deep discounts. And then they charge a price that’s maybe that’s somewhere between, say, the reserve price and the on demand price, the price that a customer would pay, if they’re using it without reservation. This is kind of where you want to be competitive, right? It’s nice to be able to offer prices that are similar to on demand, while you’re paying reserved prices. And it also means that your customers, if your customers compared you to running it themselves, they’re your you’re compelling, even without the product being really stuffy. Does that make sense? It makes total sense. Yeah, getting this right is tricky, right, because if your usage is super spiky, it’s hard to reserve the right amount of instances, if your usage is distributed across a bunch of different hardware types, it’s hard to find savings on reserved instances, if you want to consistently be on the latest and greatest hardware type, which is coming out in six months, it’s hard to be on the best reserved instances. So you know, back in the days of Heroku, we were just selling CPU computers. We didn’t care that much about being on the latest and greatest instances, like it was nice, but a CPU instance a CPU instance. And that meant that if we’d like bought some three reservations, and they eventually became outdated, it was kind of Stein, if you are trying to be the hot new machine learning startup, you want to be on the latest and greatest invidious stuff, which means that making a three year reservation might be painful for you. But if you don’t make them reservations, your margins so that was a deep dive into why margins are hard and ml unhappy to go deeper or abandon it entirely. You know, it’s

Kostas Pardalis 52:18
it’s very interesting, because it’s, I mean, again, it’s I think it’s one of the things that remains constant in a way, that’s okay, you need to figure out your margins at the end. And figuring out the margins is not straightforward, especially when you’re talking about infrastructure, products, right, like products. First of all, it’s like, you can’t have a simple model of predictive demand, right? It’s no like a SaaS application in that it’s much easier to, first of all, predict what the user is going to be there. And also, you have some very standard tools to improve your latency, for example, you can put like, caching there, right, like these things that we’ve done, like in the past, like, I don’t know, like, 20 years ago when we were like application building. But when we’re like building and selling games, right, it’s like, it’s, it’s quite different. Right? And I wouldn’t like to ask you, do you see like a case there that these companies are the end mates have to break ties with the cloud providers and like, starts getting their own hardware and actually building on top of that, like, are we going to see, like a full cycle going back to the data center? Again, for these to happen? Because that would be? I don’t know, like, very interesting.

Peter Chapman 53:40
Yeah, it’s really tempting. I’ll try and dig it up. There’s a great essay by Marty DiSalvo, on why buying your own hardware makes sense at a certain scale, and the impact that can have on your margins, it is much cheaper. That said, boy, is it hard to run a data center, you know, like, you’re gonna have to really, it’s a whole new organization that you need to run, and a whole new set of inventory and finance problems that you need to solve. What I’m seeing right now, is a scramble for market share, not margins. And I think as long as we’re in that stage, where market share Trumps profitability, no one’s going to be buying their own machines. Right. The mandate is to move as quickly as possible. And the way to move quickly is to get on a large cloud provider. Yeah,

Kostas Pardalis 54:44
I’ll tell you one. One last question from me. And then I’ll give it back to Eric, Mercy ML and AI, whatever we like to call it. They always like two parts of it. Okay, there’s inference which is like what drives I Trouble, likely user experience on the ends, right? Like me, low latencies, and all that stuff. But there’s also training and writing that is expensive, right? Like, it is like a big investment for the companies, from your experience so far, like what is actually driving, most of the growth out there for these new AI companies is more than Ferencz parts or like the training parts.

Peter Chapman 55:28
Inference 100% training is expensive and training can take

Kostas Pardalis 55:34
that time.

Peter Chapman 55:36
But most companies are not training continuously. They’ve, you know, that’s like a discrete stage in their own development. And then once that train model, you’re letting it run for a long time.

Kostas Pardalis 55:54
And we try not to cause the potential for better margins for the vendor.

Peter Chapman 56:00
So I want to zoom out a little bit here. Because I actually think, if you’re in if you are, I think that reselling infrastructure is a poor business decision. You don’t want to lose, if you position yourself as an infrastructure reseller. Your customers are always going to compare you to buying directly from a cloud provider or running it themselves. And it’s hard to find margins there. I think you want to find your margins on enterprise features and functionality. Okay, right. Like, yes, you have to pay for infrastructure usage. But that’s not where you want to make money. That’s actually where you want to be competitive. And then you want to find the margins for feature specific stuff. Okay.

Kostas Pardalis 56:55
That’s interesting. Okay. We need another episode to go through that. I have a lot of questions about Eric, but to you.

Eric Dodds 57:02
Yeah, well, we are. We’re actually here at the buzzer. Brooks is telling us that we have used all of our allotted time costs, which is par for the course, Peter, this has been such an awesome show. We’ve learned so much. And we’d love to have you back on several topics that we didn’t get to. But thanks for giving us some of your time today.

Peter Chapman 57:24
My pleasure. Thanks for having me.

Eric Dodds 57:26
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.