This week on The Data Stack Show, Eric and Kostas chat with Emilie Schario, a data strategist at Amplify Partners. During the episode, Emilie discusses what your first data team hire should be, why data analysts are needed, and helps define various roles in data.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.
Brooks Patterson 0:24
Hey, The Data Stack Show listeners. Brooks here. Usually, I’m behind the scenes keeping things rolling for the show. But today, I’m coming out of hiding to share some exciting news. We have another live show coming up and we want you to join us for the recording. This time, we’re bringing back Tristan from Continual and Willem from Tecton to talk about the future of machine learning. We’ll record the show on August 10, at two o’clock Eastern, 11 o’clock Pacific. So mark your calendars and visit datastackshow.com/live to register today.
Eric Dodds 1:01
Welcome back to The Data Stack Show. Kostas, we had paid from Netlify on the show a while back. And both you and I walked away from the show sort of enamored with the way that the data team seemed to operate, how efficient it was their structure, there were just a number of things where we sort of said, Man that feels like best in class. And today, we get to talk with Emilie, who really helped architects, that team. And my burning question is she was at GitLab, before Netlify hypergrowth phase, they added 1000 employees and a little over two years. And she was on the data team. And then sort of went on to do a number of things. Beyond that, but my sense, my hypothesis is that she actually took a lot of those lessons from the experience at git lab, and that sort of exponential growth and exponential nature data. And that really informed a lot of how she built what is now the preeminent Netlify data team. So I would love the backstory on that. That’s what I’m gonna ask. How about you?
Kostas Pardalis 2:09
Yeah, I’d love to chat with her about the role of the data inside the company, and try to understand a little bit more about the people who work under this organization, like data engineers, data analysts, and actually figure out who are all these people that should be members of this organization. And I think we have the right person to have this exact conversation today. So let’s go and do it.
Eric Dodds 2:37
Let’s do it. Emilie, welcome to The Data Stack Show. We have been so excited to have you on the show. And we have so many things to talk about. But let’s start where we always start give us a history of your work and data and more if you would like and what you’re doing today.
Emilie Schario 2:56
Thanks so much for having me, gentlemen. My name is Emilie Schario. I’m currently a data strategist in residence at Amplify Partners, which is an early-stage VC fund focused on Dev Tools, Data Tools, and cloud infrastructure. I was previously director of data Netlify. Before that, I was the first data analyst at GitLab joined the company at 280. People watch the company grow to over 1300 In less than two and a half years, and spent my last year there as interim chief, Interim Chief of Staff to the CEO, while we hired our chief of staff, and have had a couple of other data jobs along the way, but have spent most of my career in the modern data stack. I also help admin a data community or community of data practitioners called locally optimistic, which is a wonderful place for people to just talk about data problems and how they think about them and what tooling looks like and all of that goodness that goes on when you have a mindshare. So that’s me. I live in Columbus, Georgia, which is a couple hours outside of Atlanta.
Eric Dodds 4:07
Very cool. Okay. First of all, locally, optimistic, amazing. I could not recommend it enough. I’m sure a lot of our listeners are already involved. But if you’re not, absolutely check it out. Amazing community, we learned so much from it.
Let’s talk about GitLab. So this has been so interesting to me, being part of a hypergrowth phase in a company, like you were a part of where let’s say you add 1,000 people, maybe a little over two years, almost every part of the organization is having to reinvent itself. What’s interesting to me is that when we think about using data to grow, it really touches every part of the organization. And so, you were involved in the data practice, as every part of the organization was reinventing itself. And I’m sure that the impact of that on the data team was significant. Can you just tell us what that was like? Where did it start? What your first couple of months on the job, what did the data team look like? And then I’d love to know, what were sort of the big milestones along the way in that hypergrowth period?
Emilie Schario 5:23
So when I joined the GitLab, data team, we were three. So Taylor Murphy was our manager, I think, was his title, but he led the team. And he reported directly to the CFO. And then Thomas law piano was our data engineer. And I was our data analyst. And today, we would call what I was doing analytics engineering, but 2018 was both not that long ago, and also very, very long time ago, in data timeline, right. And so I mentioned that we reported into the CFO because that meant that our priorities were financial. When the asks came to the data team, we were focused on finances on F PNa. And serving that part of the organization.
Eric Dodds 6:10
I asked about that because that caught me off guard right out of the gate because you don’t typically see that?
Emilie Schario 6:20
I don’t know that I would say you don’t typically do see it. I think we see a couple of places where data teams start. Excuse me. So we see data teams generally start under finance, if they’re enterprise lead enterprise sales, lead businesses, yep. Undergrowth, if they’re a product, lead growth business, under product, if they’re still trying to figure that out, right. But that’s kind of the three buckets I would put data words in, in like in terms of origin, right? And then the question becomes, alright, this is where the data team started, how do we serve the whole company? Hmm. There are different pressures at play in each of those contracts. And so data team reporting into the finance, org finance, for those who don’t know a ton about how you manage your profit and loss statement, finance generally falls under GNA, right General and Administrative. And the two other categories you care about, there are sales and marketing and r&d research and development. And as a crude, crude rule of thumb, product and engineering is gonna go under r&d, and sales and marketing and sales and marketing and things like HR and the other things fall under GNA. And there is a big pressure for GNA to always be a tiny portion of your expenses. So if your data team is under finance, your data team is probably going to find themselves underfunded. There are ways to mitigate this. For example, one model that I have found to be very successful is what I call centrally reporting, but locally prioritize. So your data team should report into your data org. But if Sally is going to spend all her time working on marketing, then her headcount should be funded by the marketing department. And if Adam is going to spend all of his time working on product, or growth, or whatever the department is, their headcount should be funded by that respective person. We shouldn’t think of like, the team where the data org reports up into should not drive their cost model, because we don’t, we don’t want to handicap our team before it started. We should really think about like, how do we serve the business? And then let’s make sure we have the people and let the headcount budgeting allocation for that support our business goals.
Eric Dodds 8:57
Yep. super interesting. It’s the classic like if your team is a cost line item, watch out because when things get tight, they’re going to be coming for you. Okay, that is so helpful. Let me ask you a question.
Zoom out a little bit. There’s an old adage, “follow the money.” If you’re close to the money, you’re close to the problems and sort of the most important things. Do you think that is true and beneficial regardless of the headcount in cost-center considerations? How did being really close to the money influence the way that you thought about data for functions outside of finance? Sort of sales, marketing, engineering, even did that influence the way that you thought about that sort of having a financial lens on the work that you did?
Emilie Schario 9:47
I think having the CFOs priorities drive our work is what really stuck out to me. So finance was definitely a piece of it. And in fact, like I said, 2018 is both not that long ago and a very long time ago. But we’re definitely. But it was ARR and MRR reporting and things like revenue retention and customer retention were a huge part of our early projects back then I remember racking my brain and like, what are the edge cases of retention? And then I’d get lab not only do you have to think about it on a customer level, but customers roll up, right? And so if you sell to a company, and then they have a parent company, how do you think about how many customers you have? So there was a lot of complexity there. And I think having those products is the foundation on which everything else we worked on came from really drove how I understood, I personally understood the company. More than that, though, I think the CFO priorities driving the data team’s work then meant that other parts of the organization felt underserved by data. And the consequence of that was that we saw miscellaneous data hires pop up in other parts of the org. Interesting. And so that’s where the this like emphasis on separating headcount from reporting structure, like if you’re hiring data, people, they should get a manager who understands their job and can help them with their career development, and can help shape their work independent of where their priorities are coming from.
Eric Dodds 11:43
Yep. Yeah, that’s super interesting. One thing that you said that sticks out to me is just getting aligned on definitions, and like, having really short definitions is hard. But it sounds like that was a really big emphasis. But part of the sort of the hidden cost of that in the way that things were structured was that even though you had a really tight definition, you weren’t necessarily like the team wasn’t structured in a way that that type definition actually served other teams and sort of provided value to them by making things simpler, providing data. Fascinating. Okay, so let’s, because I know Costas has a ton of questions, and I could keep going all day, we’d love to hear about maybe one or two more big things that you learned to GitLab. And then the main lessons that you took from GitLab, to Netlify because we’ve had Netlify, on the show, like mean, amazing team sort of preeminent in the data space as operating so well, and being a model, and you are in many ways an architect behind that. So, I’d love to know, what were the big things that you took from GitLab to Netlify?
Emilie Schario 12:53
Shout out to Paige, who spoke on the show. Paige is absolutely incredible. I think one that I realized was I saw a lot of the consequences, pros and cons of having data people spread out throughout the organization. So one thing that I really brought to me with Netflix, when I joined Natla, phi was this, essentially reporting locally prioritized. And so as the team grew, we were having other executives allocate their headcount to the data team, but we would, and we would have them fund them, but very much from a, okay, but you’re going to establish a business partnership with that executive, and 80% of your time is allocated towards them 10%, to professional development, and 10%, to technical debt, and helping maintain our data infrastructure. And that was a rule of thumb that overall averaged out to be true. We found that to work really well.
Eric Dodds 13:58
Can I ask us an org chart question here? I’m just thinking about our listeners who are managers and who are dealing with some of these things because there are sort of two ways to do that. Your actual boss is the someone in the data. Org, right, the leader of the data org, and then you have a dotted line to the functional area. Yeah, for the inverse, right, where you actually operate under the functional area. And that gets tricky with evaluations and a number of things Paige helped us break some of that down. What do you think is the best way to do that?
Emilie Schario 14:34
I do think you should report into the data org. So that should be your direct line with a dotted line to the rest of the business, or to whomever your business partner is. The thing that startup people hate is like, matrix org might as well be a curse word, right? Like if you say that too loud, does the startup people come after you with their pitchforks? And they’re like, No, we’re functional. And I’m like, data is cross-functional try again. And so like the only way data org works really well is if it’s a matrix org. And so the only way a data org can truly be effective is if we’re matrix orcs because data where it moves the needle for the company isn’t when you’ve just got product data talking to product data, or when you’ve just got sales data talking to sales data, right? It’s when these things come together, when you’re looking at data that originates from Salesforce, and you’re enriching it with product analytics to understand what drives conversion, or what are predictors of upsell, like those are the things where the data team really moves the needle for the business. And so if we think we’re going to have functional data organizations, we’re going to have 12 data teams within the company. So I frame it in this like, centrally reporting locally, locally prioritized way, because data for startup people are allergic to matrix organizations. But like the high output management, the good old great, Andy Grove talks about how matrix orgs are the way to go. Yeah. And so that is what it is.
Eric Dodds 16:18
Yeah, I love it. Okay, one more question then Kostas I’m handing the mic to you for a long time. And this is just sort of a selfish question. At GitLab, who is the first team to make a rogue data hype?
Emilie Schario 16:31
You’re asking me to throw someone under the bus.
Eric Dodds 16:34
Oh, I am. That’s so— Okay, sorry. You don’t have to answer that. You’re totally right. I just had a suspicion that it was marketing because I work in marketing.
Emilie Schario 16:42
Yeah, that seems like the kind of thing you would do. Is that what you’re saying?
Eric Dodds 16:46
Yes. I’m not throwing someone under the bus. I’m trying to self-incriminate.
Kostas Pardalis 16:52
She did that rather well.
Eric Dodds 16:56
I’ve done this before, You don’t have to answer the question.
Emilie Schario 16:59
Here is what I will say. Very rarely do rogue data, people show up with data titles. So they might have like, perfect operations analyzer, or some variation of buzzwords that don’t love anything. Love it a visualization engineer to really—
Eric Dodds 17:21
Yes. My question is answered, and my guilt is laid out for all to see. So thank you so much. Okay, Kostas.
Kostas Pardalis 17:31
Oh, thank you. Very good. Thank you. So, all right, you’re talking the two of you for quite a while now about data works, right. And I think it’s a great opportunity to start providing some definitions. So let’s start with what the data organization is. What’s the mission of a data organization in the company?
Emilie Schario 17:51
Good question. I say it’s on every data leader who comes into an org to make sure your team has a mission. Our mission at Netlify was pretty straightforward. And I referenced it a lot when I think about my work today because well, one, I wrote it. But two, I think it’s a really good example of how having this thing to look towards an end drive kind of as a it’s not a Northstar metric, but it is like the light at the end of the tunnel to point people to so our data team mission at Netlify was that the data team exists to empower the entire organization to make the best decisions possible by providing accurate, timely and useful insights. So it’s really about making the best decisions possible. In terms of what is a data org, that’s one of those like, touchy-feely, I know what it is when I see it sort of thing. But I tend to think of it as everyone in your company should be a data person. I asked someone wants, like, what is your first data hire when you’re starting a company? And the pushback I got was, when you’re starting a company, all of your first hires have to be data people because it doesn’t matter if they’re in marketing or sales or product, they just have to have this sort of data-driven pneus about them. If, if that’s the way you’re building company. And so I think about that a lot because all of your people have to be data people throughout the company doesn’t matter what their job title is some fuzzy combination of operations analyst is fine. But beyond that, the data team are the people who are managing kind of your data stack and infrastructure and whose goals are to use those tools specifically to help drive the best decision-making company. And so when we think about data teams, some people like to think of them or frame them as, like supportive functions. The data team doesn’t always roll out the next marketing campaign, but they make sure marketing has the information they need to roll out the best campaign they can. So it’s a bit of a squishy answer, I don’t know that there’s one that I could give you, that would be better other than I know it when I see it.
Kostas Pardalis 20:14
Oh, it makes total sense. Okay, and then what would be your definition of, let’s say, meaning viable data organization?
Emilie Schario 20:28
I think the early, the easiest way to get started is probably as data warehouse, some off-the-shelf ETL tools, some reverse ETL, or data activation, and some easy way to access that data using, whether it’s a notebook or a BI tool, or whatever it might be, or just like a way to download CSV, CSV to put them in a spreadsheet, I think the minimum viable data org, their goal needs to be putting data in a place where other teams can access it, where the rest of the company can access it. And that’s why when I ran through that list, I mentioned specifically data activation, or reverse ETL, or operational analytics, whatever people are calling it these days, but the general idea of let’s take data that only exists in some sim systems and put them in other system, let’s democratize and make this data accessible to other folks in our company. I think that is the most low hanging or high ROI work that a data team can tackle early on, is really give people access to the data that they need to work well.
Kostas Pardalis 21:49
Yeah, it makes total sense in terms of roles, like let’s say, company considers of like starting with data or right, like what should be the first roles hired to give this organization? Where should we start from?
Emilie Schario 22:06
In 2022, if you’re getting started with like a Snowflake data warehouse, then you can get started with an analytics engineer who’s going to manage your full infrastructure pretty easily. You don’t need a DBA you don’t need a lot of custom data engineering. In a world where hearing is such a precious commodity, right? You talk to any engineering leader, and they’re like, hiring right now is so hard. Because it is it’s because there’s much more demand for engineering time than there is engineering. And if you’re making that calculus, it almost always makes sense to buy versus build. And so when we look at one of the big advantages of the modern data stack, it’s that you can go by so many of the pieces and have everything up and running in an afternoon.
Kostas Pardalis 23:01
Yep. And, okay, my next question, because you use like the terminal Olympics insignia, right. And, again, we will stick with definitions, because I think it’s recorded, and it really helps like people to understand because we could be using terms, but we don’t spend, I think, like enough time making semantics around this time for like, well communicated, right. That’s important, especially for people who are out there considering what the next step of their career should be. So, okay, analytics engineer. What does this mean? What is an analytics engineer?
Emilie Schario 23:42
Good question. So I think of data team roles as falling into four buckets. And I call these the core four roles because if you name it, it makes it marketing. You’re something could ask Eric later. Right. So core four role. Copy to a data engineer, data engineer moves data from outside of your ecosystem in analytics engineer works with data inside of your ecosystem. Data Analyst focuses on surfacing insights to the business, machine learning engineer, builds and productionize as machine learning models. There is some wishy squishy, soft gray boundaries here. Everyone needs to be able to push insights. Yes, everyone needs to be willing to do whatever the unsolved the problem that’s in front of them, and that’s okay. That’s called working. The general ideas if you’re, the bulk of your time is being spent in one of these categories. Your job titles should be reflective of that. And you’ll notice not included as data scientists Right, because if you ask 10 people, what is a data scientist? Even if they all have that title, they will give you 10 different answers. I know because I’ve tried. And so I think that when job titles don’t mean anything, we should get rid of them, right? Like the language we use is so important. And so data engineers move data from outside of your ecosystem in analytics engineers work with data within your ecosystem, data analysts focusing on surfacing insights to the business, machine learning engineers focus on building and production Ising machine learning models.
Kostas Pardalis 25:42
I love that. That’s very, very clear and precise. Thank you so much for that call for roles. I think everybody should, you should do something with that. A lot of marketing can Alvin and double V’s. So alright. That’s, that’s great. So we started with analytics engineer. Right. And then we have the actual that’s interesting, because I would expect to hear from you that like started like with data analyst, to be honest, because I think that’s also like probably the most common theme that like probably companies do, especially if they are not, let’s say, I’d say that very engineering-driven company, right. Yeah. We also, I mean, one of the mistakes that many times we do is that we consider, like every company out there is like a tech startup in Silicon Valley. Right? Like, we have way too many engineers, how we do things and how we think and that’s not actually the reality out there. Right. So let’s say you have like a typical company that at some point, like, wants to start delivering data that they have. And I would think that like, they will start with the data on arms, right? Like, and but you said no, like, you shouldn’t do that. Like, it’s better to start with an analytics engineer. And my question is, Is this because let’s say, when you start with an analytics engineer, you kind of have like, a little bit of a data engineering, together with some, let’s say, capacity to do the actual analytics, and you’re gonna have, let’s say, one, or all of them bridge all the different needs that you need at that point?
Emilie Schario 27:21
Yeah, I’m gonna answer your question with a sad story, right? So what happens when a company hires a data analysts first, right? Someone who’s there’s no tech stack, there’s no data infrastructure, they are just gonna, like pull some spreadsheets from places and combine them and do some, like Google Sheets or Excel was right. The business loves it. People have numbers, how exciting. There’s no automation underlying it, though. So every Monday they have to rerun the Executive Report for the Tuesday meeting. And then there’s another report that they build. And it’s really exciting. And it’s a monster spreadsheet. They’ve got historical revenue information for all time that needs to go into the sales VP meeting. So now every Wednesday they spend the whole day rerunning the sales, VP revenue meetings spreadsheet for the Thursday meeting. And then Friday comes around, and they realize they spent half of their week rerunning spreadsheets and then didn’t get anything done. And this becomes a world where you have to continue to throw data people at the problem. Because there’s no automation, there are no systems, there’s nothing that lets that data person scale. And so over and over, what we actually see is that when companies do this, two things happen. One, they get very frustrated with data and go back to the beginning. Or those people develop the technical skills to bring more engineering practices into their organization. An example I’d point you to is Claire Carol Claire, Carol is a product manager at hex. She was previously the DBT Community Manager. And she tell you her career story is that she was an XL person who stepped into a data analyst role at a company where there wasn’t a ton of engineering support for her. And she learned things like Git and the command line and DBT and SQL and all of that, over time as she grew her career. And the end result is that her influence in the company great, but it’s unfair for us to say like, the only way for our data analysts to be successful is if we force them to acquire more engineering skills, right? There is a fundamentally different skill set and surfacing insights and being an analytics engineer, which is what Claire would tell you her career journey was and so I think we set people up, set our data orgs and our data people up for failure. If we don’t, right hire the wrong All? So to answer your question, I think that hiring an analytics engineer early on is, in a lot of ways, the best of both worlds, you get a little bit of that more complex engineering skill set when that’s the solution you need. But you also get someone who’s very comfortable working with your data communicating with stakeholders, and is expected to also be able to surface insights to the business.
Kostas Pardalis 30:27
Okay, and then why do we need data analysts or when we start leading data analysts, if like we can, let’s say, we start with analytic engineers, then we’ll get data engineers at some point to make sure that like, we automate the whole like in and out of data. So when the data analyst becomes like, a need for the team, for the data org.
Emilie Schario 30:51
When you’re building your data team from scratch, there are two models that I’ve seen be particularly successful. One is, you want to take a divide and conquer approach early. So you want to service let’s say, four different functions in the business or five different functions of the business. And so in that case, you take the approach of hiring an analytics engineer who’s going to be a business partner to each of those, and they’re going to build out the core modeling, and they’re going to be responsible for the insights, right, so if we were to think of, if we think of a spectrum where we have a zero to one, this is a little bit hard for people to listen and visualize here. But hopefully, they’ll indulge me, if you think of a zero to one line. And we think of our data, infrastructure as having three parts where zero to point three, three is, is moving data in, we think of working with data in our ecosystem has point three, three to point six, six, then we think of business insights as point six, six to one, right? An analytics engineers mandate is not just that middle section, it’s the right two-thirds. And so we can focus on hiring them to do kind of the full stack Enos of it. Or you focus on a particular part of the business, you have an analytics engineer, focus on just that core modeling, and you bring in a data analyst who’s now going to really focus on it. A lot of the like, which approach is best is specific to your business? How well do people already understand the data? what already exists? What numbers are people used to looking to? Are you being driven by a particular change agent in your organization that’s going to drive your priority. And so it’s a little hard to come in and say like, this is the right way to do it. Right? There is no one right way. It’s a lot of the context of your organization. But understanding the trade-offs of each, I think is a great way to understand and make the decisions within your organization.
Kostas Pardalis 32:58
Okay. We’ll get back to that a lot to understand. And have you ever seen, or you have experience of like, what the result is of starting date? Our good starting with data engineers?
Emilie Schario 33:11
No, I don’t think I have. I’m thinking about this. But I’d like to think I have weighed enough people to avoid that scenario. But I don’t know. I don’t I don’t know, though. I know companies that have some weird title allege going on, that makes it hard to like, really tell who they hired, right? If you hire a BI engineer, what does that mean? Yeah, if you hired a look, ml developer, what does that mean? That is part of why I think it’s so important that we centralize on these core four roles is that people should be able to see data engineer and have a good understanding of what the skill set being asked of them.
Kostas Pardalis 34:03
Okay, the reason I asked that is because actually, both very good. I have an experience of that. That’s actually rather stark. Now we’re starting like to share a little bit of embarrassing side information, Eric, but I think that’s fine.
Eric Dodds 34:19
I love it. I see takes on the data section.
Kostas Pardalis 34:22
Yeah, but okay, rather’s like, I mean, started as a company, because the brothers itself is like, okay. It’s above all pipelines. Right. So it’s mainly like people who work there that are like, systems engineers and data engineers. So we had data engineers. And when we had to start creating some kind of infrastructure to collect some data, we started with the data engineers, and the result is—
Emilie Schario 34:52
Wait! Can I guess?
Eric Dodds 34:54
Emilie Schario 34:55
You had a lot of data, but not a lot of insights.
Kostas Pardalis 34:58
Oh, yes, but I’ll give you a little bit of even more embarrassing information. And I would say that you end up with a Snowflake instance that has a database that’s named Eric D.
Eric Dodds 35:14
You just put it out publicly.
Emilie Schario 35:18
Now I know why Eric hired his rogue data person.
Eric Dodds 35:22
That’s exactly right.
Emilie Schario 35:24
But isn’t that exactly it? If you don’t, if you don’t empower people with the data they need, they’re going to do whatever it takes for them to get it or if they’re good at their job, right. And so, so that is exactly the problem. But I think what I’ve seen, engineers love to engineer it. That’s why they’re engineers, right? And so they nerd out about CI and, and linters, and all that kind of stuff. And don’t get me wrong, I too, was an engineer, once upon a time, a mediocre one, but an engineer nonetheless. Right? And we cannot, I think one of my jobs as a manager has always been to help coach my team members, I’m like, I do not care. I mean, I care. Don’t get me wrong. I’m not here for tech debt. But I don’t care that much about how cool your engineering infrastructure is. I care about the impact. We’re driving to the organization. And you cannot lose sight of that, no matter what you’re doing.
Eric Dodds 36:30
Yep. And I would say it’s interesting Kostas, reflecting on that good old Eric DB. Like, the funny thing is, when that happened, it just, it wasn’t a huge deal. Right, like you’re trying to this and I think this is kind of what you’re talking about? How do you keep sight of the longer-term goal, Emilie? And when we were like building those analytics use cases, there were just a couple of questions we needed to answer. Right. Like, you just need to answer a couple questions. Right. And that seems so innocuous. Right? And then you don’t realize that on sort of the back side of things, like while you’re creating a lot of future check that we can’t, which can be dealt with. But there’s always a cost to that. Right? Like, okay, well, now you trade between insights and tecta, right. And the more you choose insights, the more the tech debt grows, and you sort of eventually have to pay the piper. And so it’s a very, it’s a very slippery slope, right. And it’s very easy to do that early on in a company, especially if you sort of take the like, well, there’s an engineering solution to every problem. And we just need to answer a couple questions. cost us some, this has been cathartic. I’m admitting all sorts of data crimes publicly, which is free.
Kostas Pardalis 37:48
This is actually a therapy session for you. That’s why we are doing it.
Emilie Schario 37:53
The Data Therapy Show coming soon.
Eric Dodds 37:55
Yes, that’s gonna be called Eric dB.
Kostas Pardalis 38:00
Yeah, well, that probably says something about how territorial you are when it comes to monitoring data. So I know, this is a deeper conversation that needs to happen offline.
Eric Dodds 38:12
It’s getting deep.
Emilie Schario 38:14
In retrospect, Eric, it sounds like you just needed a better code name. Like, rather than naming it Eric, you should have named it some Disney movie you watched recently to throw people off.
Eric Dodds 38:26
I didn’t name it. An engineer named it.
Emilie Schario 38:28
Mm-hmm. I worked at a place where our replica was called Jaku (which is a Star Wars reference for anyone who didn’t understand). But at the time, I did not understand I had never seen Star Wars, yet I have today things have been redeemed. But I remember like, tjuku, this is such a weird name. Why would anyone come up with this? And they’re like, Wow, you have so much to learn about being an engineer.
Kostas Pardalis 38:59
That’s so true. And something just about something like the conversations I’ve had is that one of the most common crimes that engineers do is over-engineering. And that’s like, extremely becomes extremely obvious, like in an early stage startup, or when you start something from scratch. And that’s exactly like what happened, but I’d rather start, right, like, at the end, we had way too much data. Like, it was like, extremely hard to separate, like noise from Sheila Bair, because we don’t ask from the engineers. Okay, guys, we need data. Oh, sure. Wait, and you’ll see, I mean, we delivered will give you all the data that you will ever need. And I think that’s, that’s, that’s quite important. And the lesson that I’ve learned kind of the hard way by transitioning from being like a software engineer myself to get other guys getting other roles, that are the over-engineering can be like Hmm, really cards, let’s say, think to deal with. Because more engineer does not mean a better solution.
Eric Dodds 40:08
Yeah. But I think the other thing is, and Emilie would love your thoughts on this because I think one of the things that makes that difficult, Kostas, is that when we were going through the whole cycle, we were a really early stage company so scrappy. And so many times in that phase, like your engineers are sort of your de facto data engineers, analytics engineers are sort of everything right? And so you get this dynamic of you ask for something simple. The kitchen sink has sort of thrown out it. And that’s really challenging, right? Because you only have so many hours in the day, and you’re still trying to figure out product-market fit and all this sort of stuff. Right. So Emilie, would love your thoughts on that, like, how do you mitigate that? Because I’m sure we have listeners who are in that environment where— Look at RudderStack. Things are great now. We have such a great analytic setup, but we certainly over-engineered things early on, we actually have, we’re now using some of that data. So in many ways, like, Oh, I’m actually glad we did that, we probably should have done it a little bit differently with a little bit more calculation. But I would think that’s really common, right? Like, that happens all the time to an engineer who’s building a product they throw it engineering is sort of the hammer, and everything’s a nail. And so you throw that data, and you end up with sort of over-engineered things. But Emilie, would love your thoughts on that.
Emilie Schario 41:32
Yeah, I think I have seen it all the time. And one another way this manifests sometimes is engineers love their tooling. And so suddenly, they’re using Prometheus for their, their business manager, right. I mean, it happens. And so people know, the tooling they know. And, and it’s unfair, as a data practitioner to assume that like everyone is going to know the best practices of the modern data stack. There’s this great blog post by Vicki Brakus called the you don’t need Kafka that specifically picks on we work a little bit, which I am a big fan of doing as, as a big fan of following the—
Eric Dodds 42:16
We did a little bit in the show prep.
Emilie Schario 42:19
That’s good, yeah. And so I think about it as like, we there’s going to always be this natural, cool engineering tendency to want to over-engineer and the thing that is going to pull us back on that is just this, what is the thing that drives the biggest impact? What is the simplest solution that drives the biggest impact? And something I use as my own anchor? Often, it was something I learned from the GitLab CEO, we had some problem in front of me and I was talking to him about it. I was like, I’m not really sure what to do next. And he said to me, what can you do that moves the needle that you can ship in the next hour? Not in the next day, not in the next week? In the next hour? And so that forces you to really think like, what is the smallest change I can make that makes a difference to this problem? And I come back to that as like, what can I ship in the next hour? And I would ask my, the Netlify team will tell you, I would ask them that all the time. Yeah. What’s the one-hour version of this? There? Like, one hour? That’s never enough time? I’m like, Yeah, but what is the one-hour version, and that helps you scope to focus on impact. So So that’s part of it. The other is, as you grow a company, part of what you’re doing as you’re hiring is just like filling out gaps in your org skill set. And so it’s okay that your engineers got started with the engineering tools that they had, and they gave Eric his own database, like, please go run. And part of what you do when you’re ready to hire a data leader is say, data leader, you have to accept this technical debt that’s already in place. And at the time, we made a business decision around a trade-off between getting the information that we needed with the tooling and people we had in front of us versus doing it the right way and hiring a data person. Right And and I think that that’s okay trade-off an okay trade-off for companies to make. We just need to every once in a while look up and acknowledge that that’s the trade-off we’re making.
Kostas Pardalis 44:47
Makes total sense. I would say. By the way, Eric is the best ruler or it has been renamed.
Eric Dodds 44:57
Let’s cover that in a future episode. That’s part two of true crimes in data.
Kostas Pardalis 45:10
Alright, so every week we talk about, like the different roles and the core four, four roles, as you said. And I’d like to ask you, can you help us like identify? Let’s say, the important traits that each one of these roles has, like, let’s say, if you could I create like an archetype for a data engineer, or for an analytics engineer, like what do you look when you would be hiring for this, or these roles? You mentioned at some point, for example, like communication skills when we were talking about the data analyst, for example.
Emilie Schario 45:47
So I don’t know that I have an archetype for each of those, but I will tell you something that has been core to my own hiring philosophy. So I grew up inside of a Dunkin Donuts, my mom has been working at Dunkin Donuts it since 1999. So 20+ years now. And I mean, when I say I grew up inside of a Dunkin Donuts, I mean, the emergency contact at school was the Dunkin Donuts down the street. Because if anything happened, someone from the store could come pick me up. And so I remember and I have these memories of like my mom walking across the dining room and seeing a straw wrapper on the floor, and picking it up and putting it in our pocket. And I think about that I’m like, here, my mom was the manager of the store, like, someone was going to clean the dining room at some point in the next hour. But she saw a problem and she just kind of fixed it. Right. And she didn’t solve it with the perfect solution. It felt like she took the straw wrapper and she put it in the trash, you just put it in her pocket for later. Yes. So the next time she was at a trash, she dumped it. So I look for that quality when hiring. And another way to frame it is floor sweepers. People who are gonna see the mess in front of them and clean it up. Or if they see that the trash is full, they’re gonna take out the trash, right? I, when I’m hiring, I don’t want people who are like, this is my job. This is the boundary, this is what I do. And that’s that I want people are going to see a problem and fix a problem. They’re driven and they’re taking initiative, they don’t need the mandates issued to them. And I care much more about that than I do about specific technologies you’ve worked with, or companies you worked at or your education background. If you are a floor sweeper, then I can teach you all the rest. But I can’t teach you to be the kind of person who walks by the straw wrapper and puts it in your pocket.
Kostas Pardalis 47:52
Okay, that’s some amazing advice for hiring in general, I would say, like for eight hours. Okay, one last question from my sides. And then I’ll give the microphone to back to Eric. All these roles we are talking about? They’re pretty new, right? I would assume that most universities out there, they probably don’t even mention data engineering, or engineering, or what was the rest of their roles that we talked about? What are the paths that people don’t get, especially like younger people who are looking right now to figure out like what to do with their careers, if they want to, let’s say to become like a data engineer, or like an analytics engine, er, and are like an MC lobster Junior ml engineer. So what are the bots there? And do they ever cross? Also?
Emilie Schario 48:46
This is a hard question, because I don’t know anyone who works in this, who would tell you their college education was particularly relevant. And some of the best folks in data that I have ever worked with, never went to college at all. And so here’s a little bit of like, we’re gonna see how things shake out and what the next decade looks like. But today, I don’t look at educational backgrounds when I’m hiring, and I make sure that education isn’t a prerequisite for any of the roles that I’m involved in hiring for, like, they shouldn’t be they don’t necessarily move the needle. If you’re doing advanced statistical research, then you probably need a Ph.D. But otherwise, if you’re trying to calculate ARR and MRR, your education background doesn’t really matter.
Kostas Pardalis 49:42
Makes sense. All right, Eric. All yours.
Eric Dodds 49:46
Okay. We are at the buzzer, unfortunately. So I have one last question, but this may be the most important question of the entire show. What’s your favorite kind of donut?
Emilie Schario 49:58
All of them.
Eric Dodds 49:59
I knew they were going to be difficult because it’s a difficult question for almost every BIOS some people know but like, it’s hard to choose.
Emilie Schario 50:05
Yeah, it’s definitely like a mood influence thing last night, a commercial came on for an IT hat but in the commercial was like strawberry frosted with sprinkles. And I turned to my husband and I was like, strawberry frosted looks so good right now. And so sometimes the strawberry frosted, sometimes it’s a blueberry cake. Sometimes it’s a chocolate glaze. Sometimes it’s a Boston Cream. Sometimes it’s a fresh French kroehler. They’re still kind of warm. He’s still like, not set in. All of the above is the correct answer. Yeah.
Eric Dodds 50:38
Love it. Love it. Well, Emilie, this has been such a fun time in the show. We’ve learned so much. Thank you for sharing your time. And your insights. I know, it’s been helpful for us and our listeners, and we’d love to have you back sometime.
Emilie Schario 50:50
Thanks. I’m hear and ready.
Eric Dodds 50:52
Kostas, first of all, Emilie just seems like such a fun person. I had a great time on that show laughing. And honestly, I feel way lighter for some reason, as a person, after getting it out there that Eric DB is something that exists. But in all seriousness, I think one of my biggest takeaways was her recommendation on the order of hiring and the impacts that can have. And obviously, I mean, it’s funny to talk about Eric dV. But that sort of exhibit a of what can happen when you sort of have an engineering first approach to data without necessarily trying to establish the underlying questions around value in the organization, which sounds funny to say almost right, like when you say engineering approach engineering first approach to data. I mean, that sounds very natural. And in some ways, it sounds like the correct thing. But when you’re thinking about how to build a team, it may not be and so that’s going to stick with me. I’m going to think about that a lot this week.
Kostas Pardalis 52:00
Yeah, I totally agree with you, I think we have, again, a great opportunity to talk about, first of all, like the role of the data team sides in the Data organization inside the company, but also the part where we discussed about engineering engineers, and over-engineering, I think it was a great opportunity to see what happens many times to our customers, when we as engineers, we actually over-engineer, the solutions that we provide to them, right? And data is like probably one of these cases where the customers are internal rights. And we can, let’s say get exposed to what it means to over-engineer something without having very clear business objectives there. And so that’s what I’m keeping from this conversation, because it’s like, kind of like a realization that I also made during this conversation. And it’s super, super valuable outside of all, obviously, for all the rest of like the conversation that we had with here about the roles, the organizations and how to position data organization inside the company and grow it. It was a very, very insightful conversation.
Eric Dodds 53:21
I agree. All right. Well, thanks for listening to today’s Tech Show. Tell a friend about it. If you haven’t, we love new listeners, and we will catch you on the next one.
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at firstname.lastname@example.org. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.