Episode 73:

What a High Performing Data Team (and Stack) Looks Like with Paige Berry of Netlify

February 2, 2022

This week on The Data Stack Show, Eric and Kostas chat with Paige Berry, a staff data analyst at Netlify. During the episode, Paige discusses sharing insights, vague terminology, data team management, and more.

Play Video

Notes:

Highlights from this week’s conversation include:

Paige’s career path (2:44)
Paige’s role and responsibilities at Netlify (6:38)
Sharing data insights (8:55)
Scope in the context of delivering an insight (12:39)
Defining “insight” (15:10)
Where the client journey begins (16:43)
Miscommunication because of vague terminology (20:06)
Netlify’s internal knowledge repository (23:01)
Breaking down Netlify’s hub and spoke model (30:45)
What data tools to use and when (35:21)
The metric layer and BI (44:17)
Next steps in the data space (49:42)

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Automated transcription – may contain errors

Eric Dodds 00:06
Welcome to the data stack show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show. Today we are going to talk with Paige who is a Staff Data Analyst at Netlify. Now, she also has engineering jobs, and we’re going to hear about that, maybe even throw in a little story about Perl, about the writing pipelines in Perl, which will be very exciting. But I’m interested to hear about our call like the the other side of the data stack as it exists in the people who are working with the data delivering it, etc, all the time about the technology, the data itself and the tooling around it. Paige bridges the gap between the technology and the infrastructure and then the processing and delivery of actual data products in the form of of analytics of a team. And so I’m just really excited to hear about that, because we don’t get to talk about that very often. How about Kostas?

Kostas Pardalis 01:21
Yeah, first of all, I’m very, very interested to hear about how it is like to work in work in Netlify. Some like it’s a product that I also use for my personal blog, and has a very different way of approaching like infrastructure in their infrastructure. So I’m British. And usually it’s like something that you that you see that when you have companies that they innovate a lot. And they are very innovative. And they bring something very new. Usually, this is also reflected to the practices that they have, like how the organization is like, given to the people that they have there. So yeah, I’m very happy that we have patient with us today, I think we are going to have a very interesting conversation. But it’s going to be both technical about technology, but also about the organization, the teams, how we work together, how we communicate, analytics, and data and all that stuff. So I’m really looking forward to this conversation.

Eric Dodds 02:21
Well, let’s dive in and talk with Paige. Paige, welcome to the Data Stack Show. We’re so excited to chat with you.

Paige Berry 02:28
Yay. Thank you so much, Eric and Kostas. I’m really excited to be here.

Eric Dodds 02:32
We always start in the same place, which is where you tell us about your career path and what led you to what you’re doing at Netlify.

Paige Berry 02:42
Alright, cool. Well, my career path is not super straightforward but it all has a theme of data and it all has a theme of learning on the job. So my first job with computers was the IT Help Desk at the university work that where I got the job, even though I didn’t really know much about computers, because I answered every interview question with I don’t know, but I can learn and to be true. That’s been kind of the theme of my career. After that I had worked in a role where I got to administer databases and write reports. I taught myself SQL for that job. It was really fun. Then I worked and college administration at Reed College, I got to do some programming and C and Oracle PL SQL. And I built an ETL pipeline in Perl, which I don’t recommend, but it did work.

Kostas Pardalis 03:34
That’s impressive.

Paige Berry 03:36
It was a really fun challenge.

Kostas Pardalis 03:42
You should keep that.

Paige Berry 03:44
I probably do have it saved somewhere actually, on an external drive. I should frame it. Yes.

Kostas Pardalis 03:52
Or create an NFT out of it.

Paige Berry 03:54
Oh, there you go. Now that is a fun idea. So yeah, after that I worked at New Relic company here in Portland where I live as a Software Engineer where I got to learn Git and processes for collaborating on building a code base. But what I realized during that job is that I still loved playing with data, most of all, and that’s been the case at all of these positions. So I moved into a role as a data analyst, first for the support organization. And then for product, where I got to work with things like Redshift, and Snowflake and Airflow and write code in Python, learn what DBT was. And then I also got to learn about a lot about hiring practices because our teams doubled in size after I joined both of them. And so I get to help with the hiring process and with building a team culture, and got some good practice on how to teach an organization to be more data informed, which was really interesting. All of that’s been really helpful for me as I joined Netlify in March 2021 as a staff data analyst. So, yeah, that’s the career journey.

Eric Dodds 05:03
Very cool. Okay, before we talk about some of the specifics of what you do at Netlify, I have to ask about the IT desk. Is there like a weird computer issue that a student brought in with their laptop that sticks out where you’re just like, “That was so bizarre. What happened here?”

Paige Berry 05:23
Boy, it has been a lot of years so I’m trying to think. I don’t think anything’s really jumping out. What I remember most about that job, though, was that the first few weeks, what I would have to do was go to the professor’s office who’s having a problem with their computer and I’d sit down at their computer and they’d say, “The computer is broken in this way. It’s doing this,” and I’d say, “Great,” and then I’d pick up the phone and call my manager and say, “They said, ‘Blah, blah, blah,’ what do I do?” And he would walk me through how to troubleshoot and solve the problem. That was a lot of how I learned how to fix computers.

Eric Dodds 05:59
That actually is good in terms of working with data, troubleshooting software engineering, that’s a great way to break down the process and all that. I’m sure there were just a lot of restarts to the fix whatever.

Paige Berry 06:11
Yes. Have you tried turning it off and on again? Yes. And there were plenty of unplugged as well.

Eric Dodds 06:19
Oh, man.

Paige Berry 06:20
Plug it in. It might work. Plug it in.

Eric Dodds 06:23
If only data pipelines were like that. Okay, so what does your day-to-day job look like at Netlify? You probably do a lot of different things, but just help us understand what your role is and your responsibilities and team and all that.

Paige Berry 06:36
Sure. Yeah. At Netlify, we have a data team that follows the kind of Hub and Spoke models that we’ve got a few of us who are do the hub work. And then we have other people on the team who work with spokes. So different business partners, we’ve got someone who works with finance a couple folks who work with product, someone who works with growth, etc. I’m a member of the hub team. So that means that I do things like I pick up tasks and projects that maybe don’t fall under one of the spokes or, or I will pick up some spoke work. If that if the folks on that spoke are have more than they can get done, I also get to spend a good amount of time doing proactive insights work. So looking through the data that I’ve been working with, and if I like I’m doing a task for stakeholder who’s requested data, and I say, Boy, there’s something interesting here, let me get this task done. And then I want to dive into this and actually see what’s going on with this data, I get to do a good amount of that work. And then there’s also work around like team building culture, building, things like that, if we’re doing hiring, I’ll help with that. That part of the process? And that’s, yeah, that’s a lot of it. So there’s so it’s a really good mix, really, of data analysis and exploration of all do analytics engineering, if I’ve got to build some data models to get the analysis done. Even some data engineering, bringing in data from new sources, I get to do a little of all of that, which is great.

Eric Dodds 08:13
So fun. That sounds like a really fun job. It is I love it. Okay, we have definitely some questions to ask about the team structure on hub and spoke. But first what you wrote a post a while back around some of the ways that you share insights that come from data. And so I love your description of the freedom to go in and explore. But I think it’s really interesting how you’ve taken the results of those types of explorations and syndicated them the team. So can you tell us I think our I think our audience would just love to hear how you do that, because I certainly learned a lot reading the post.

Paige Berry 08:52
Oh, awesome. Thanks. Yes, yeah, one of the interesting things about when you’re searching through data, you come up with an insight, it doesn’t really do much unless other people hear about it. So figuring out the best way to get that, that information shared has been a fun thing to work on at Netlify. So we we our norm for communication is using Slack. So we have a Slack channel that is dedicated to sharing data insights. And when we’ve discovered something that’s interesting in the data usually helps to figure out okay, what are the things that are really important about this insight if somebody’s got five seconds or 10 seconds, and all I can do is just skim the next post in the channel? What do I want them to make sure that they’re seeing? So that helps to really clarify like, Okay, this is what’s interesting about this data, and kind of get it into a few bullet points, even like bolding some of the numbers and words so that if All they read are those bolded numbers and words, they understand what’s the like the too

Eric Dodds 10:04
too long can’t read like,

Paige Berry 10:06
the punch line. Exactly, exactly. And that always helps to add at least one chart because this can be really visual stuff is super helpful for if someone’s all they’re gonna do is look at a chart, they can see just from that, like, oh, there’s something intriguing here in this picture. And maybe that’s all I’m gonna look at. But I know there’s something here. And then in the, in the, in the back of their mind, they can remember, if they’re working on some project and a couple a couple weeks later, it’s like, oh, there was an insight about this data. I’m going to go back and read that, because it might inform like the decision we’re making right now.

Eric Dodds 10:44
Yeah, I love that. I love the thinking about it. assets that you collect over time that can inform future work, right? Because, yeah, you so many times we produce data reports or whatever. And you just basically throw them away. My Google Sheets is a nightmare of ad hoc analysis.

Paige Berry 11:03
For sure. Yeah, exactly. We keep these also in insights feed. So we have an internal, essentially blog, or it’s really, we have an internal handbook at Netlify. And the insights feed is a part of that handbook. So it really is like a blog, people can go and read through a fav, but a way for like, on vacation for a few weeks, they can come back and say, Okay, what, what is the data team posted? Since I’ve been out and catch up on all of the insights, insights feed as well.

Eric Dodds 11:43
One quick question for you there. As we think about our audience and all of us as people who are on some level trying to deliver data products, whether that’s infrastructure or insights. One of the challenging things when you’re delivering insights is the scope and the context because it’s like if you’re seeing a small slice of something, not having a larger context can be challenging. How do you think about the scope in the context of delivering an insight? Especially when you think about kind of the, like, the too long can’t read side of it? And I’m asking that selfishly, because I’m trying to do a better job of this inside my own role because if you just share kind of a number without any context, now that that’s happening that sometimes for like, I, that’s interesting, but I like I don’t quite get it, if that makes sense.

Paige Berry 12:36
Absolutely. That makes a lot of sense. Part of the process that we go through when we are doing a practicum insight, actually, is captured in a GitHub issue. So as a data team, we actually capture all of our issues, all of the work we do in GitHub, We operate kind of similar to a software engineering team in that way. And when one of us is going to start a proactive of insight, we open an issue and begin with the question: what are we what’s prompted this exploration? What are we curious about what has happened recently, that made us think, oh, this would be really important to look at? And all that context is at the beginning of the issue. And then as we do the exploration, we add comments to the issue for every step. So that includes like the SQL code we’re running, some of the results from that, like intermediate charts that show part of what we’re trying to get at. So, we’re taking someone on the journey of this exploration with us through this issue. And certainly not everyone’s going to have time to read that, but it can help a lot when we’re then going to write the post about the insight to go back to that beginning like and include the “I did this insight to answer this question, and this was the reason why I was looking into this, and then this is what I found.” That’s often how we can incorporate some of that context to let people know why this is actually important to the business.

Eric Dodds 14:16
That makes total sense. I think about the the Mark Twain quote where he says, “I would have written you a shorter letter, but I didn’t have the time,” and so hearing about that process makes total sense. If you’ve wrestled that question to the ground, you then have the ability to deliver that in a concise way that includes the context. Very cool. Okay, Kostas I’ve been dominating the mic here. Please, I know you have questions brewing.

Kostas Pardalis 14:45
Thank you, Eric. My first question is pretty simple, but I want to ask, what is it insight for you like and for your team? What constitutes an insight?

Paige Berry 14:57
Oh, that’s a good one. I’m trying to think of a good example. I had one that I was working on that was, because I noticed that there were, there was like this particular error happening in the product. And I noticed it, the completely dropped off at one point. And that was my insight, like something changed here. And I don’t know what it is. But I’m really curious as a data person about why this would change. At this point, I’m getting to post that was really fun. Because then people in our engineering org, were able to, to come into the channel and say, Oh, I know what it was this software engineer applied this fix. And we thought it made a difference. But it’s really cool to see in this chart, like the impact that that work had, that was a really fun one. And then they’re, they’re definitely ones around, like, connecting specific feature usage to specific outcomes. That’s something that is an area that I tend to have a lot of fun with. And so a lot of the insights I dig into, tend to be really around that, like when a team is using this, this is what tends to happen within the next three months. And those are really fun.

Kostas Pardalis 16:15
Well, that’s very interesting. So the way that I hear it, like, as you described it, like, it feels like you’re some kind of explorer. You’re out there in the ocean and you’re finding the bliss Island and the other island creating the map. But how do you start this exploration? Like, what’s the motivation? And what’s the what’s the journey?

Paige Berry 16:38
Yeah, that’s another really good question. That way, initially is prompted by a stakeholder request, there’s, there are we get requests from anyone across the company. And when we’re looking at the data required to fulfill requests, we know it’s about something that somebody is currently interested in working on, which is going to tie to our goals as a company. So that’s like a good initial pointer, I guess, to what could be important or interesting to know about, usually, once the request is filled there, then you’ve already become acquainted with the data that is going to matter for this particular thing people are curious about. And at that point, then, okay, well, this is the data that people are asking about, let me dig further into here, or let me pull in this other data that can inform that in this way that is new. And that’s often how we get started on one of our insight explorations.

Kostas Pardalis 17:43
No, that’s, that’s cool. And how you one of the issues that I always have, like when it comes like to data and trying to be like, let’s say data informed, or driven, or whatever you want to call like organization, is how people can agree upon like, the definition of the problem that we are trying like to solve or the insight. That’s why I also asked you about, like, what’s the definition of insight? Because many times, even for things that we take for granted, like, what is the revenue or what is like the cost, or like stuff that we instinctively take for granted, it’s there or not, I actually like the definition because it might be like quite different. And usually you figure that out when you start working with data, because you have to be very precise, like with how you communicate there. You’re very experienced like with with that, because you have to work like with many different people like So how big of a problem is it? Is it that first of all, like a problem? I just think that this is like something that is important. And second, like how do you deal with it?

Paige Berry 18:49
Just to make sure I’m understanding fully that the problem of making sure that the definitions of the metrics we’re talking about are and are agreed upon? Is that where

Kostas Pardalis 19:00
you, you said that like, someone like a stakeholder might come with a request, right? They express like, I don’t know, like a problem that they have, right? And then you have to start like working in like digging into data to help him with that. But yes, first of all, the two of you, you have like to agree on what the problem is, right? Yes. So that’s what I’m asking. Like, how how big of a problem is this? If it is maybe it’s just my problem, to be honest.

Eric Dodds 19:28
I think active users like the DAT, like, would you say that Kostas is like, Hey, could you tell me if our like, number of active users has gone up, right? And it’s like, okay, the number of questions that we have to ask to like, you could get to give any number back and it’s probably not wrong. It just depends on the definition. Maybe that’s maybe I’m going down the wrong path, but that’s what jumped in line for me.

Kostas Pardalis 19:53
Yeah, no, no, that’s that’s exactly what I like to say. So yeah, I’ll do how do you sure about it.

Paige Berry 20:00
Yeah, exactly. I definitely enjoy, I think it’s a joke around that of like, somebody says, can you tell me how many active users we have? And the answer is, Well, that depends on what you mean by active users or daily users. Yeah, every word needs to be unpacked the definition for what someone’s looking for. That, I think, is a really big and interesting and gnarly and fun opportunity to figure that out. And I feel like a lot of folks are talking about that in the data profession in different ways that we can try to have a solution for that, like, defining metrics in a separate layer. And once we get that defined, that is the, the source of truth for what that metric means. Because that is definitely the part that I considered the most interesting, but also can be the most time consuming part of responding to a data request. So yeah, there’s there. I don’t know if there’s really, I don’t know that there are many, very many times in my career where somebody has asked me for data, and I have been able to just go get it without coming back with at least one if not several rounds of questions to really understand. Okay, what first of all, what’s the problem you’re trying to solve? What is what’s the decision you’re trying to make? Because there may be actually other things that I can give you that will help with that even more? But if it turns out that yes, what you’re asking for is exactly what you need. So let’s still actually figure out the definitions for the words you use to make sure that the data and probably like what I’m telling the database is what you would tell it if you could.

Kostas Pardalis 21:48
Yeah, that’s, that’s interesting, I think we are going to show so like probably products around that, like, Okay, we have like the data catalogs, which kind of like from the enterprise space and with like products like Alibaba, these are again, they’re like more on the, what I would call like, the steel on the syntactic level, and all that much like on the semantic level, because, okay, we can agree like on wealth, it’s of the data fields that we have mean, right? Going from that to the point where we build, let’s say, knowledge, like institutional knowledge, like what is like the KPI and how all these things connect together, I think there’s a big gap there between the two. And we need more of like a knowledge, like, let’s say Kabbalah, probably something like that. I wanted to ask you, do you feel that this internal wiki that you have acts as a knowledge repository for the organization where the data team can communicate all this knowledge that is generated by doing all this analysis and creating all these insights?

Paige Berry 22:58
Yes. I really appreciate that use that word. We kind of considered a Knowledge Hub. And it isn’t necessarily only this insights feed, but there are a few things that fit into what we think of as that this Knowledge Hub. So there are, there’s the insights feed, there we use DBT. And we have our DBT docs, which is another aspect of knowledge where our models or DBT models are all very well documented. And then we have another layer, we are using transform, which is one of those metric, metric repository kind of products, where we are able to define a metric in transform with an explanation around it. And then our stakeholders can run charts and even do some, like depth, slicing by dimensions and stuff on these charts, knowing that the definition of the metric is something that the company has agreed on. And so the there doesn’t have to be that question of like, is this actually really daily active users? Because we have we know it is we decided on that as an organization, this is the definition in the SQL, you can even like, look at the SQL and transform. So that helps with another piece of that knowledge. So it is different places. We kind of think of it as all this together are different facets of knowledge about the data and the business.

Kostas Pardalis 24:32
Is not a repository something that’s owned and managed by the analytics team or some other team of the organization?

Paige Berry 24:41
Yeah, it’s all owned and managed by us on the data team, so any of us are able to administer any of these systems, add to any of them, edit stuff in there. It’s all essentially a big part of our code base.

Kostas Pardalis 24:58
Oh, that’s super cool. Can you tell us a little bit more about like how you started doing that, like how you decided to structure this knowledge repository in this way? Whose idea was that to have like a wiki there, first of all?

Paige Berry 25:14
I’m trying to remember exactly who came up with that. I think it was a combination of our director, Emily Cerio, is the director of data at Netlify. When I started and someone else on the team, I can’t remember who else on the team, I’d have to look it up. But there it was, it was definitely like others on the team who said, Hey, we could do it this way. Or let’s put it here. Because we all kind of try to, we know this is an issue. And we are all trying to think of ways to solve it as a team. So I’d have to look in the history of our slack to see who exactly suggested which part.

Kostas Pardalis 25:52
That’s nice. You also work a lot with building teams and building the right culture. How do you motivate a team outside of running the analysis and working with the data also to sit down and write the right contents in order to communicate with the rest of the organization? How do you do that?

Paige Berry 26:18
Right. That’s another interesting aspect of this. I think at first, we weren’t sure about it. When Emily Cerio suggested it— She’s the one who really encouraged us to start doing this as a data team as a way to kind of keep us out of getting trapped into being a service organization and only ever being reactive. When she first brought it up, I think at first we were all like, Oh, this is different. This is new, I haven’t really done this before. How am I going to make time for it? Is this really going to be helpful. And with her encouragement, we went ahead and started trying it. And the reaction, the response was so awesome, there were there was the really great conversations in the slack threads every time we would post an insight, there would be questions that came out of it, other people in the company would have conversations with themselves in the threads, we could see. So clearly the benefit that was was bringing the the way people were thinking about data in like these new ways that it made it easier and easier to like, make time to do these and get them posted. It’s part of our weekly data team meeting where we talk about the insight we did the week before plan, who’s gonna post their insight post, when to make sure that we don’t have like them all clumping up on Monday or something. And it’s become part of what we do as a team. And those of us, there are several of us who really try to pay attention to like, hey, if we’re not certain, we’re not doing as many this week, maybe people got busy, let’s make sure that we pick that up next week, we have someone on the team was really good at reminding us of that Adam Stone, he’s wonderful. We even have a Slack bot in our team channel that will ping on like Tuesdays just say, Hey, how’s everyone doing on their insights post in the thread here? If you need any help with yours? So yeah, it’s really become something that we’ve tried to weave through the whole fabric of the data team and our processes.

Kostas Pardalis 28:32
That’s amazing.

Eric Dodds 28:33
One question, and this is really practical, but I think about the service organization, you mentioned that concept. Having a background in marketing, I know the pain of inadvertently becoming a unit that just answers questions and tries to do that quick as you can and you’re not actually pushing value back, which can be a tough place to be. One thing I think when you start to push value back in to other parts of the organization, especially if you can build a repository, is that the number of questions that you get starts to decrease because you can refer. It’s like, “That’s a great question. It was answered. Go check this out,” or whatever. You kind of train people to like, “Maybe I should go search that repository before I ask the question,” or make a request or whatever. Is that dynamic happening inside of Netlify?

Paige Berry 29:34
Yes. And it’s so exciting. It’s another piece that is so valuable about this, we make sure that it’s part of the company onboarding that people learn about it from their very first week. And it’s been so exciting to see questions come up in our our public team channel and have other people in the organization come in with the answer, because of things that we’ve posted. earlier. It’s exciting. It’s like, yes, we’re all in this together. This is something we all get to do. It’s so fun.

Eric Dodds 30:11
That’s great. I want to talk a little bit more about the hub and spoke model and the different roles that are on the data team. I think this is a really interesting subject and I think our listeners really appreciate understanding some of the learnings that you have from the hub and spoke model, so can you just give us a quick breakdown of who’s in the hub and then who are the spokes and how did those roles differ, especially on the spokes side of things?

Paige Berry 30:40
Sure, yes. Okay. In the hub, we have our data engineer, an analytics engineer, me a data analyst, and then our data evangelist. I know, and it’s another one that it’s like, I’m not sure, but it’s Laurie Voss, he does a lot of speaking at conferences. He did the jam stack survey and has been hosting a lot about the results. Yeah, so it’s a fun role. And yeah, so there’s four of us in the hub. And then we’ve got two data analysts working on the product spoke, we’ve got an analytics engineer working on the growth spoke. And we’ve got another analytics engineer working on a finance spoke in a data analyst working on finance. I think that’s it. I need to sort of count on my fingers to see if I got it all. What was the other part of your question?

Eric Dodds 31:45
Well, you answered it, actually. It was the roles in the spokes. So one other question there, just because we’re getting back to definition of terms here, we’re coming full circle in the conversation. It’s interesting to me, and this may just be semantics and role names, you only mentioned one data engineer, which makes me interested in the scope of skills and responsibility of an analytics engineer working in a specific business function. Do they encompass some data engineering type tasks and some analytics tasks as well?

Paige Berry 32:24
That’s correct. That’s exactly right. All of us on the team are able to do work across the board, we would probably need some of our data engineers assistance with some of the data engineering pieces, but the those of us who are analysts, but the analytics engineers can absolutely do really any part of that stack of work. So it works fine to have just one person in that role supporting spoke because they really can do like what is needed.

Eric Dodds 32:57
Very cool. As we’ve heard about different team structures and stuff, those people are extremely hard to find, right? Because of those two skill sets, you need someone who can really handle the technical side of it and the engineering side of it, and also the analytic side. But it makes total sense, like that’s such a high value role to be embedded in another team. I love that we’re getting to like data team management stuff here. How do the embedded analytics engineers… Who’s their boss? I guess. I couldn’t think of the right business buzzword.

Paige Berry 33:35
No worries. Yeah, exactly. Before Emily left, she managed the entire data team, so she was the manager for everyone, hub and spoke. And the person say, Who is the analytics engineer on a spoke would get their prioritization and their work tasks, mainly from a specific stakeholder on their spoke team. Now we’re at the moment, we’re actually like, hiring for a director. So while we’re in this position, we have folks who are on the spokes actually being managed, as well as given prioritization tasks by the person that they work with on this boat. And those of us in the hub have an interim manager. But the idea is that when when we have director of data again, they will again, become the manager for the whole team.

Kostas Pardalis 34:32
This is great. Let’s focus a little bit more on the technology side of things because I think we have covered that. This is a super interesting conversation. We don’t have these kind of conversations too often. We focus more on the stocks and the technologies people use, so that was amazing. You mentioned that you wrote your first ETL pipeline in Perl. Do you still do that?

Paige Berry 35:01
I do not. That was, goodness, that was a while ago, maybe 2017. It was a little while ago.

Kostas Pardalis 35:10
What kind of tools are you using? You mentioned something, like you mentioned, for example, like DBT, Explorer (which is like metric suppository). What kind of tools are you using today?

Paige Berry 35:20
Yeah, we use DBT. It transforms our metric store. We use snowflake as our data warehouse. Airflow. I’m trying to think what else. And then we also use, please Fivetran. We use census for reverse ETL or operational analytics. I think that’s mainly it. In terms of how we do our work, we use GitHub for our code base and stuff. What else? Are there any areas of technology that you’re curious about that I haven’t mentioned?

Kostas Pardalis 36:00
I’d love to learn more about Explorer because symmetric suppositories is something that we haven’t touched that much. Before we get there, I’m wondering, you mentioned Fivetran, for example, which is, of course, helping you lwith your pipelines? Do you also build your own pipelines? Or do you rely only on Fivetran to move the data around?

Paige Berry 36:21
Our data engineer also builds pipelines.

Kostas Pardalis 36:24
Okay. Why?

Paige Berry 36:27
Oh, that’s a good question. I would have to ask him. Yeah, I’m not sure. That was set up before I got here. I guess, why are you asking? Is that a surprise?

Kostas Pardalis 36:43
No, it’s actually something that is happening. Okay, first of all, I have a very long background in data pipeline. For the past seven years, together with Fivetran, the whole industry is trying to like to, let’s say, abstract, as much as possible, like these ETL, or whatever process that we are talking and remove the need of like creating custom scripts, right? So you’re gonna use, let’s say, the platform and have like this SaaS cloud experience, like in connecting your data source and move your data around. But as soon as it’s still it’s not enough. And I’m trying to understand why. And that’s why I’m asked you. And I have asked like also other people to be honest, because it’s not uncommon, actually, it’s pretty common to see that you might have multiple different vendors in the same company use like state data together with like, 500, for example, or you can have one of these and at the same time how, like, a number of Python scripts that exists out there to do like, some of the pipelines for different reasons, right. Like some people that do it for performance might because the specific pipeline that Fiberon has, it’s slow, or for cost reasons. But yeah, that’s why I’m asking because maybe you could add a little bit more color into your own experience on why we still have to build our own pipelines.

Paige Berry 38:03
I see. Got it. I see. Yeah, I’d have to think about that probably from our data engineers perspective. That would actually be something I’d really be interested in asking him about, so I think I’ll do that later today. Thank you for that prompt, but it’s very possible that’s still a holdover from what we’ve what we had to do before. And that when you’ve got that stuff figured out for your, your company, you’ve got all of the whatever logic, you might need to get the data in the right shape to. It could be that it’s, it’s difficult to just move away from that, or maybe there just hasn’t been enough of a reason yet. But I would really need to ask him to be sure. So it’s a great question. And thank you.

Kostas Pardalis 38:53
Let me know after you learn about it, like I’m curious, yeah. You didn’t mention something as part of your lifestyle. You didn’t mention anything about the visualization layer.

Paige Berry 39:05
Oh! I knew I missed something.

Kostas Pardalis 39:07
What do you use there?

Paige Berry 39:08
Thank you. We use mode for visualizations for him our insights and exploratory work. And then we use the the visualization capabilities and transform and our metric repository for for stuff that is, is a little less exploratory, for stuff that’s really more for stakeholders to be able to quickly get a number or do some kind of like, higher level dimensions slicing. So the two kind of work together but we’re also still always try to communicate about like, here’s this is when you would use mode and this is when you would use Transform.

Kostas Pardalis 39:50
Can you help me understand when I should be using— Let’s say I work for Netlify. When should I use one and when should I use the other?

Paige Berry 40:00
So Transform is great for metrics that we’ve already configured out as a company, already have definitions for, and that you that you’re interested in, looking at, or learning more about, that we already have available, the dimensions you’d like to slice by. Mode is going to be better if you’re starting from more raw data, we don’t have the metric really defined yet, you’re kind of trying to explore how different models maybe interact with each other. That’s kind of where we start for building the definition for a metric.

Kostas Pardalis 40:38
Yeah, that’s super interesting. I’m wondering, what does the process look like when you start from the exploration part, like using Mode, until you reach the point where you have a metric that you can well define it and put it into Explorer? What happens in between? How does this work? We’re very curious to hear about that.

Paige Berry 41:08
Yeah, I’m thinking about work that a couple of my teammates did to define something called activation. They had a list of it was several people working on it, we had two of our analytics engineers were working on it from our data team side. And they had a list of things you’re gonna look at, like different metrics, different things teams might do, and different, like time sections, I guess, from after sign up and when they might do them. And so they did a bunch of work in this GitHub issue and use mode to do different visualizations and different cuts of the data to start to really understand how we can actually define like activation for a team. And they were able to kind of narrow down the funnel of options until they came up with they don’t activation is this and when a team has done this that’s considered activated. And now we have that. So that metric goes into transform, where somebody can run a chart to say, what’s the, like number of activated teams weekly, since the beginning of the year. And you can see where that number is going, or what’s the percentage of activated teams that have done X or Y, and you can look at that and transform. But that’s because we have that metric that we figured out by doing all this exploration and mode with all this raw data.

Kostas Pardalis 42:40
Do you use Explorer together with BI to like mode, or like completely, like disconnected?

Paige Berry 42:47
When you talk about explorers that transform?

Kostas Pardalis 42:51
Sorry, sorry, sorry. That’s okay.

Paige Berry 42:54
Yes, so transformed, did develop recently, a mode connector. And so we have been using that we set that up a couple of weeks ago, I think, a few weeks ago, and have been doing visualizations in mode with metrics that are defined and transform, which is really handy, because there are because we do have people kind of doing work in different areas. So for instance, if I know I want to do an exploration that has to do with revenue, I don’t have to necessarily go like talk to the analytics engineer working on finance and say, Hey, what’s the newest definition of revenue to make sure I’m getting it right? We have that definition and transform. So I could either look at the sequel and transform, or now I can actually just pull in that metric into mode, use it in my visualization, and I know that it is the correct definition of revenue, and I’m getting the right numbers.

Kostas Pardalis 43:55
Okay. If I understood this correctly, you define the metric in transform with SQL.

Paige Berry 44:04
Yes, it’s a combination of SQL and YAML files.

Kostas Pardalis 44:09
Oh, okay. I have to check Transform. I saw that you have like on your background like the conference that happened last weekend and I think there was a lot of discussions there about like the metric layer and like these new thing, but it’s called like the “headless bi,” whatever that means. Is this like a rebranding of BI, in your opinion? Or is it just like an iteration on the BI because the needs today are different than they were 10 years ago when look out started for example. Because look here, if you think about it, like with these two layers that he had, like the visualization part and the lukeman parts. With looking at my you are pretty much defining also, let’s say, metrics. Right? Okay. It wasn’t called like that back then. But I’m just wondering about that, like, I really tried to position let’s say the product categories like the headless bi and the metric layers, and what’s the differentiation of the end and why it has to be created, compared to BI tools. So you have been working in that for quite a while. So how do you feel about that?

Paige Berry 45:32
Do I feel like the talk of the metrics layer as a rebranding of BI, or just an extension of where it needs to go next? I will have to think about that, I really will. So much of what I feel like we’re all trying to figure out is, how do we make working with data, something that is easier for our stakeholders and users and also easier for data analysts and so so trying to find solutions that makes life better for both groups, is, I think what we’re working on. And so that continues to be because if you go one, too far down one road. Yeah, maybe easier for analysts, but it doesn’t isn’t great for stakeholders, you go too far down this other road. Yes, it’s much easier for people to do a little more self serve stuff. But it can be really, really hard to get there in a sustained way. So I guess this is a new, maybe a different way of trying to solve that problem of maybe kind of isolating a certain set of like difficulties and saying, Okay, we’re going to address fixing this set of difficulties with this new, a bunch of tools or areas of creating tooling in. Yeah, that helps. And really, I’m thinking about this as I talk. So I may have like more to say later.

Kostas Pardalis 47:16
I don’t think that it’s easy to answer any way. The reason I asked you was more like to have this kind of discussion, like, “Well, maybe I can get an answer.” Because yeah, and I couldn’t stop thinking while you were talking about that. 10 years ago, when we were talking about bi, it was all about visual visual interactions, no codes, no SQL, like, great. Everything should be like self serve without any technical knowledge. And when I asked you about transform, you told me, Oh, it’s equal in the bunch of YAML files, right? Which is not exactly what let’s say, a CFO would never use. Right. Right. So I see like this back and forth between being, let’s say, completely non technical to get into a very technical and I think, at the end, like the solution is probably somewhere in between, probably we need both, right? And we need to find like, let’s say the right balance there. So there is a reason that all that stuff exists. But yeah, it’s not answered yet. That’s what I’m trying to say. Right? I think that’s still in the process to figure it out. And all the students are tough, like attempts like to try and figure it out. So yeah, yeah, that’s great.

Paige Berry 48:25
I just wanted to say, I think part of it is that stakeholders are an end users are not a monolith, and so different, they’re different, like layers of ways that they can use data and to Yeah, so we’re just still continuing to try to figure out that there aren’t necessarily solutions is going to work for absolutely everyone, but maybe we can get like areas or groups. Okay, this is a good solution to help like this type of end user, this is a good solution to help this group of stakeholders.

Kostas Pardalis 48:57
Eric, you have the microphone.

Eric Dodds 49:00
I think we’re close to time here. I just need to say though, I really appreciate you distilling all of the complexities of these challenges into saying we’re all trying to figure out how to make working with data easier, both for ourselves and for the stakeholders that we serve. Really, when you distill down all the tooling, all the team structure, all that stuff, that’s really is the goal, so I appreciate that. I think that’s like one of the best concise explanations of what we’re all trying to do. One last question for you since we’re at time here. You have such a wide skill set, you can do data engineering, you work day to day as an analyst and many other things. What advice do you have for our audience if they say, Okay, I’m listening to Paige and I would love to work on a team like that, in a role like that, but I’m maybe early in my career, or I’m maybe working at a company that maybe doesn’t value data or the data team in the same way that it’s very clear that Netlify does. What advice do you have for them to take some next steps?

Paige Berry 50:20
Oh, that is really cool. One of the things that I realized that has helped me get to where I’ve, I’ve been able to, and get the experiences that I’ve had is, I’ve managed to keep my, I guess, my excitement about working with data. Regardless of what else is going on, it’s like, there’s this fundamental love of what working with data is and what it can do, and that no matter what else is going on, there’s, I get to get into a database and look at data and figure stuff out. And that is always so much fun. It’s like just enjoying the process of discovery the data has, and keeping that passion and love is something that has really helped me no matter what situation I’ve been in, because it’s motivating. It’s always been motivating me to find that, that joy, even when stuff around is difficult. And that’s a Yeah, I think another aspect of that is always being curious and curious about what new stuff is going on in the industry, that curiosity and willingness to learn and be interested, can open a lot of doors. It really can. There’s pretty awesome data community, especially around like the coalesce data conference, that was that there’s, there are people who love sharing, we love sharing our love of data and working with data, especially with people who are earlier in their career. And so getting connected to a community of people who have this passion, who are continuing to try to make the industry like better for all of us is really powerful, especially early in your career.

Eric Dodds 52:19
That is so helpful. And I think two things here, one, it’s very clear that you’ve maintained that sense of curiosity and love of exploration, which I think is great. And just for my own experience, I know that maintaining that if you’ve gone through the process of building teams, and a company tripling in size like it, it’s hard. The physical pressure of a system that’s expanding that quickly. It can kind of snuff that, the flame of curiosity out. So I think that is wonderful advice. Well, unfortunately, we’re at time, but this has been such a good conversation, I learned so much. And it’s just been really fun to hear about different sides of data that we don’t normally get to hear about on the show so thank you,

Paige Berry 53:01
Yeah! It has been an absolute blast for me, too. I’ve loved every minute of it. Thank you both so much.

Eric Dodds 53:07
Okay, Kostas, my takeaway is, and I kept thinking this throughout the conversation, I’m just blown away by how many different things, it seems like the Netlify data team is doing, that you imagine as the best way to do something, actually having a team structure where you have embedded analytics engineers that have broad and deep skill sets, having a knowledge repository each person, each analyst on the data team delivering insights on a weekly basis to the team, managing that with version control, and GitHub’s exploration process. There were just so many things where I was like, “Wow, they are super high functioning,” it sounds like. It was extremely impressive.

Kostas Pardalis 54:00
Yep, 100% percent. I think it was how to, like probably one of the most interesting conversations that we’ve had so far, when it comes to the organizational side of like data. And like, when you hear when you listen to beats, like, you can, you almost want to be part of this team, like, you something great is happening there. And at the same time, they’re also having fun, which is amazing. That’s the holy grail of like, work environments. So and what I would like to add to what you said, which something like 100% agree with you is that yeah, it is like a matter of the organization, but it also like, heavily starts and it’s like related with also what the individual does, like if you remember when like we asked her about, Okay, how did you convince your people there to start like, writing all this content? That’s part of like everyday job. It was amazing too. Hear from Share, like how determined the people were to do that, because it wasn’t part of the job description anyway, right now it might became the end because yeah, they show the value of this. Yeah, but it’s amazing. Like, when you get people who love what they’re doing what? What can come out of it? Yeah.

Eric Dodds 55:22
It wasn’t like a result of like, reacting to some sort of pain. And so we’re gonna like deliver this right now.

Kostas Pardalis 55:30
Or it wasn’t like we brought, like it comes all down from the Big Four or whatever. And like, because you have to do that you won’t like

Eric Dodds 55:42
yeah, if that came out of out of the big four, I think we’d have to have that person on the show. Yeah.

Kostas Pardalis 55:49
Anyway, it was amazing. Like, I really enjoyed the conversation with here and, Mike. I don’t know. Maybe we should have an episode. We’ll have the whole team from Netlify the episodes

Eric Dodds 56:01
and be awesome.

Kostas Pardalis 56:02
Let’s do that. Yeah, the old the rock together. I think it’s going to be fun and very interesting.

Eric Dodds 56:08
All right, well, we’re at time here. Thanks for joining The Data Stack Show and we’ll catch you on the next episode. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.

🎙 Sign up for The Future of Machine Learning Livestream!

🗞️ Signup for Our Newsletter

Episode 73:

What a High Performing Data Team (and Stack) Looks Like with Paige Berry of Netlify

February 2, 2022

Notes:

Transcription:

About the Podcast

Sign Up for The Data Stack Show Newsletter