This week on The Data Stack Show, John and Matt welcome Pedram Navid, Chief Dashboard Officer at Dagster Labs. During the conversation, Pedram shares his career evolution from consulting to his current role, where he oversees data, developer relations (DevRel), and marketing. The discussion delves into the synergies between DevRel and marketing, emphasizing the importance of understanding developers’ learning preferences. Pedram explains data orchestration, highlighting its role in managing and automating data workflows. He also discusses Daxter’s unique asset-based approach, which enhances visibility and control over data processes, catering to users from novices to experts, and so much more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
John Wessel 00:28
All right, welcome back to The Data Stack Show. We’re here with Pedram Navid from dagster, the chief dashboard officer, Pedram, welcome to the show. Great. You’re here. Thank you. Yeah. So I think it’s your second time on the show. It’s been a little over a year. We’d love a quick kind of update, and then tell us a little bit about your current role.
Pedram Navid 00:46
Yeah, I think last time I was here, I was enjoying consulting life, which meant lots of bird watching, lots of looking outside, being outside. Since then, I’ve joined Dexter labs about a year and a half ago, initially to run data and Dev Rel and also marketing, so far less time to do bird watching.
Matthew Kelliher-Gibson 01:06
It’s too bad. Back to the grind, as it were. Okay, so we’re gonna spend a few minutes chatting. We’ve been spending a few minutes chatting preparing for the show. I’m excited to kind of get into, like, kind of how you’ve gotten to this point, and orchestrators in general. What are you looking forward to chatting
Pedram Navid 01:22
about? Yeah, I mean, we can always talk about orchestration. We’ll talk about data platforms, how we got to where we are, could be kind of a fun story. We can always talk about AI. We can talk about data engineering and how you somehow accidentally end up running marketing. Could all be fun.
John Wessel 01:37
All right, I’m excited. Let’s do it. Pedram, excited to have you. Let’s talk a little bit about how you ended up at dagster. So you were doing consulting, had some time to kind of work, work as you please, and now you’re back at a startup. So tell us about that process. Yeah,
Pedram Navid 01:56
I think what happened is, I was actually consulting for dagster. Initially, we had a great relationship. And Pete Nick, CEO and founders asked me if I wanted to join initially, I said no, because I was enjoying my freedom too much. But one thing I found with consulting is your scope of work is often limited and you don’t get to see things you know, fully, end to end. And I also kind of missed the camaraderie of having, you know, people to work with and teams. And so after thinking about it, I reached out back to them, and it said, hey, you know, if that offer is still on the table, I’d love to chat about joining. And so we talked about a role which was initially just a dev role with, I believe maybe data on the side as well, a small team of two or three people. And that was almost June of last year. And so it’s been a year and a half now since I’ve been here. A couple months ago, we took on marketing as well as part of Dev Rel, which initially I wasn’t so sure about, but now that I’ve seen it operate, it makes a ton of sense for dev real marketing to be close together and working together.
John Wessel 02:56
Yeah, that’s really interesting. So I’ve had a previous experience too, where I ended up having a data team and marketing as well. Tell, tell me about maybe some of the unexpected synergies there. If you’ve got Dev Rel marketing data kind of on the side, like, what, what’s what’s come about, you’re like, Wow, this is cool. This is unified. Yeah,
Pedram Navid 03:15
if you had told me initially that I would be on a dev rel team, reporting to marketing, I probably wouldn’t have taken the job, because I’ve always felt like marketing didn’t quite get dev rel. But this way it’s kind of flipped. It’s like marketing and DevRel reporting to me, and I’m okay with that. So what I found is that the real side of the house is like the content arm. That’s just a technical product. We target technical people, yeah, and so we just need technical people who have experience in the field to create the content for me. Content is a broad term. It’s not just blog posts. It’s tutorials, workshops, webinars, and how to do actual integrations. Our team, our dev real team, has built integrations that have like, one deal. So Dev Rel is like the producers of like the marketing arm of Dixter, and then the rest of the rest of the marketing org is really in support of the distribution of that right, where the dev rel team probably doesn’t have expertise is how to get their content out into the world, whether that’s through like paid ads or events or campaigns, that type of thing. And so having the two teams together, it’s like, really, actually, a lot of synergies. I hate to use that word, but it is exactly that, yeah, where do they sit together? We’re in the same meetings, every meeting, every week, we talk about what we’re working on, and, you know, the advanced person picks up on something the dev real team is working on, and so does the campaign manager. And then three of them together, they’re like, All right, let’s go build something more holistic around that, rather than just this one off, you know, content that you created?
John Wessel 04:39
Yeah, that makes a lot of sense. So we’ve got Matt here co-hosting today in place of Eric. Matt, you’ve been that technical data audience before, you know, looking to purchase products or things like that. I’m curious, and I’ll have to ask you the same thing. Pedram, like, what really clicks with you if you think back about like content? Uh, or maybe even just interacting with people around these technical products, like, what do you think, what mediums or what like, what can you think of this really like, clicked with you in the
Matthew Kelliher-Gibson 05:08
past? Yeah, I think anything that lets you kind of see how the product actually works, yeah, in a real kind of way, and not just the super trivial kind of look, one plus one equals two types of manners, right? I think that helps, especially because previous to a lot more of this was very marketing. So was everyone feeling like they were trying to bait you into giving them your information or trying something or whatever. So things that kind of give you that ability to see it, and I think that have that credibility of professionals who’ve used it and who can show you now, this is, like, this is what it’s actually going to help you with, yeah?
John Wessel 05:47
So not like, the, you know, 10 tips for personalization in your marketing using data. But, yeah, so same question to you, Pedro. I’m like, What have you found that works? Because it’s the data, at least in my opinion, that data technical audience, it’s a tricky one. It’s a tricky one to find, a tricky one to need to resonate with. It
Pedram Navid 06:06
is, and it is like this meme, almost of like, developers hate being marketed, too, and I don’t think that’s true. I think developers need a certain type of marketing that works for them along their journey, and their journey often might look different than, you know, someone like a leadership role, for example, and you just have to, because a developer is going to sit down and what they want to do is, almost every single time is they want to try the product, want to figure out, is this thing, what it says it is, is it useful for me? Will it work in the way that I need it to? And so a lot of the dev real focus, and the focus of di x’s marketing arm is to enable developers to be successful in their entire journey, from like, becoming aware of the product, trying it out, learning about it. And so things like docs matter a lot more to a developer than they might to a technical leader, even as a director of data, for example, you probably aren’t going to sit down and try Dix or you might care more about, what are the future benefits? How is it solved? You know, the five things my CEO keep selling to me about, but your data engineer is going to want to actually try the product and make sure it hits the things that they actually care about,
Matthew Kelliher-Gibson 07:11
well and that can also be a little tricky, just because of the technical ability of data people, there’s a pretty wide spectrum you can fall on there. There’s some that are very like they came from software engineering, and then there’s others that are very self trained and might be coming more from the I’m doing data engineering or whatever, because I have to, and no one else here to do it, and so I’m scraping together YouTube tutorials and stuff like that. So how do you kind of do you guys have a specific part of that you’re targeting, or do you try to kind of have content more of a wider swath of that spectrum?
Pedram Navid 07:47
We definitely do the whole we try as much as possible, the whole range you have to there are like, I What I’ve learned is that not everyone is me and like, I like a certain way of learning that other people don’t, and people like ways of learning that I refuse to use. A great example is DAG street University. It’s something we spun up last year. It’s like an online course. It’s structured. You go through lessons, and that’s the last thing in the world I would ever want when they suggested it. I’m like, I don’t know about this. Guys, all right, we’ll try it. And people love it. They love it. They get five out of five. Like, if you look at our ratings, it’s like 4.8 out of five. And we get weekly emails like, how much they enjoy it. And it was, like, completely foreign to me, because that’s not how I learned. What I’ve learned is like, you have to provide scope for everyone. There’s people who want structured training. There’s people who want to just read the docs. Some people just want to install it and look at the source code. That’s like the range that you have to deal with. And all of that has to be good, your source code, your documentation in your app, your code has to look good in a way that people can understand it, interact with it, all the way up to your tutorials and videos. And people want to sometimes sit down and watch, like, a 30 minute training video on the product as well. And so we do all of it, and we hire for that too, right? We have, like, people on the referral team who are much more focused on the earlier stage persona, people who are, like, getting first started. And we have people focused on much more, like, deeper understanding of the product as well.
Matthew Kelliher-Gibson 09:15
Yeah, I am definitely one of those that, like, I do not want to watch a video. I don’t for any reason, I cannot sit through a 30 minute technical video.
John Wessel 09:23
I’m one of those that likes to pull up the code. We’ll reference the docs when needed. We’ll struggle through it, and then we’ll think to ourselves, I should have just watched that video, but so yeah, I mean, that’s a really wide persona, and DAX is a super flexible tool. You can use it a lot of different ways. And that’s got to be a challenge as well. Where I come to it with like, oh, I have this specific problem, and you’ve got a tool like, well, we can solve lots of problems. Like, how do you bridge that gap?
Pedram Navid 09:54
That is also a great question. We are looking to bridge it through product improvement. So we have some. Coming up, thanks for the components. I don’t know if I’m allowed to leak it yet, but it is coming. It will be more focused on providing, almost like building blocks to develop the data platform. And so it’ll be a command line based tool initially, but you’ll have, like, your YAML schema. You’ll have very easy ways to plug and play different integrations. That’s like our approach to sort of addressing that, while always being able to expose the underlying Dexter framework, which, as you said, is extremely flexible, which has both its pros and cons. The pros are like, you’ll never really be constrained. If you can do it in Python, you can do it in Dix. There’s essentially limitations, right? The cons can be like, for a very simple setup, it can often feel like a lot to go through, right? If you just want to orchestrate like one simple task. Yeah, that makes sense. So let’s zoom out a little bit for people that have no idea what Daxter is, maybe haven’t even, have never even heard of orchestration like that kind of analyst persona. How would you describe just the general field that you all are in the data orchestration field to someone that was like, I have no idea what this is. Yeah, it’s a great question. Everyone orchestrates. They just might not do it intentionally, or they might not know that they’re doing it right. Orchestration could be as simple as you log into your computer once a week and you click on a button and you kick off a process. It’s a very manual orchestration, but it’s totally fine, and often it’s the right decision for you. It can become a little bit more complicated when you start to use something like a cron scheduler that runs every single day or every single week at a certain time, and that’s often enough for many tasks. When things start to get a little bit complicated is when you need to add dependencies or you need to be resistant to failures. Essentially, once those two things come into play, like you want to make sure that a runs before b every single time, you can’t rely on Cron. You sometimes can, like, fudge it. You’ll say, you know, you start at 12, and we’ll start this one at three, and I’ll hope it never takes more than three hours exactly, and it will always succeed. And if that’s true. You probably don’t need an orchestrator, but often what happens is, I think people realize they need an orchestra a little too late. What they thought was true no longer becomes true. You can’t really observe from that. Well, your tasks take too long, something fails. Or, even worse, your vendor is like, Oh, by the way, that thing we sent you two months ago was wrong. Here’s an update. They go and fix that, and it’s like, well, I can’t rewind time, and my cron scheduler doesn’t know how to rewind and so once you start to get into these types of things, that’s where orchestration orchestrators come into play, and they start to manage some of these, like more complexities for you,
Matthew Kelliher-Gibson 12:34
I feel like you said there that you know where you bring it in probably later than you should. I feel like that’s a recurring theme for a lot of successful data things are, you know, if you would have brought this in two months ago, this is a five minute fix. Now, we’re very limited in what we can do and, you know, type of a thing, but I also don’t know if there’s a way around that.
John Wessel 12:53
Yeah. I mean, you don’t know what you don’t know, right? And if you’re, especially if you’re doing something for the first time, it’s like, oh, like, this works. And then, or in that, like, my favorite, because I think most people, if you’re in a data role, get to that, at least get to that time gap thing, where I’m gonna have this run at midnight, this run at 2am, this run at 5am everything’s fine. And then usually, if you get in that world, you have some bad mornings where, like, the first one failed, and then it’s like, kind of a house of cards. And then, because some of these take, maybe, you know, hours to run like it takes, like you’re kind of suck, like you basically lost a full day for having the data correct, I
Pedram Navid 13:33
I think that’s experience, right? Like you get burned, hopefully only once, and you learn your lesson, or you work with people who have been burned, and they’ve learned their lessons, and they’ll impart that on you, or you listen to The Data Stack Show, and you’ll like learn, you know, things not to do. It’s also human nature. I think it’s so much easier. I mean, this is why ice cream tastes good. You don’t really think about the consequences, right? Running pipelines on cron feels good because you don’t have to think about the consequences, until it’s too late. So we try to educate people about how it’s probably easier. It’s not that hard to set up a pipeline in Dixter, just cron, like you can do it just in Cron. You don’t have to use any of our advanced features. We have a cron scheduler. We have it in Dexter, and you’ll get a pretty UI, which is more than you normally get out of corn. Yeah, sure. And that’s worth its weight in gold. And then from there, you can evolve as you need to. You don’t have to go and, like, build these complex dependencies if you don’t want to, but get started with something when it’s simple, when it’s just, like, a few tasks, a simple DBT pipeline, very easy to do in Dix, or you got a great integration, or do it in a different or two. It doesn’t have to be dinkster, like there’s others out there, but get it in something that you can observe. Because I think every engineer knows observability and logging are critical to any system. Yeah,
John Wessel 14:48
That makes a lot of sense. I’ve used extra for a couple of projects, and this is, this was kind of interesting. I had it last weekend, so anytime, no, it was two weeks ago. It was around. Kind of like that, that new year’s Christmas holiday, and I got an error. I had set up an alert. I got an error, which was handy. And I thought, like, you know, what is this like? Like, I better check on it. And sure enough, you know, it’s an API, like, access to night type error, because I was pulling data from an API. So, like, what happened? You know, figure out, like, do I need to have the credentials expire? What happened? It was funny. So essentially, what happened is I was pulling data from like 28 different locations on this project, and essentially one of the locations had closed at the end of the year, but since I had everything, like, separated out, it was like, okay, cool. Like, I can just, like, turn that location off, and, like, everything keeps going, and it’s not a big deal. I think those are the types of things that, like, you know, had it been the other way, where essentially everything, like, cascades through and you’re like, oh, like, I’m gonna have to, like, rewrite a bunch of stuff, etc. Those are the fun moments. So I guess I’m curious, from your perspective, obviously, there’s lots of different orchestrators out there? What’s special about dagster? And maybe even, what special about DAG sir, for analytics, orchestration specifically? Yeah,
Pedram Navid 16:07
orchestration has been around for a long time. I think, like, Cron is like the classic, right? Yeah, from there, I think airflow is probably the next biggest registrator most people have heard of, and that’s the task based orchestrator, right? So you’ve got a thing you want to do. You tell it and it runs, and it’s like black box. And you sort of hope every box continues the way you want it to, but you have no ability to, like, peer into the box. What DAG sir said is, like, what if we split that or reverse that, and instead of telling us about the task, tell us about the things you actually care about, or let’s discover those for you. So a great example is, I think a DBT project, everyone sort of kind of gets what that is. It’s a collection of like tables that you want to materialize at some you know, regular cadence, the traditional airflow. Way would be to have a DBT task that just runs your DBT project, and then you sort of assume all those models in there are completed in DAG sir. What we do is we flip that around, and we actually expose every single model as an asset. And so DAG store is what we call an asset based orchestrator, because everything you care about is now represented in this big graph of things that you can sort of follow all the way through to their logical conclusion. And so you can see all your DBT models within the DAG review. And you can actually be kind of clever about it. You could run the whole thing at once every single day, if that’s what you want. Or you can say, You know what, my stakeholders care about. These five models run everything that depends on those on a five minute schedule because they really want those things to be updated. And then these other models over here, those put them in a group that runs once a day, whenever you feel like it doesn’t really matter to me, as long as they’re refreshed daily. That’s something you can start to do with Dixer, and then, because you have this asset view, you can start to connect things outside of DBT and as well, in a really intuitive way. Maybe you have a BI dashboard in sigma. Maybe you have, you know, some stuff happening in the motor stack that you want to connect it to, some files dropping into three buckets, FTP, all these things start to connect, and you build a lineage on them. And so you can be really clever about the full end to end orchestration of this thing, rather than just focusing on a specific task. And so diets have really been, I think, the next level of where we are going with orchestration. And in fact, airflow is even starting to move in this direction, which I find really validating that, like this is really the future of where orchestration is going.
John Wessel 18:22
Yeah, I think 122, benefits that I’ve seen from this, like asset style orchestration has been essentially what you said, One Time Compression. Because if I have separate, like extract jobs that then load into a warehouse, and then I have to transform, and it’s all like linear the time compression to get that one, essentially, one report that I need to be fast, like, fast, as in, like, very up to date, is there’s just a limit, right? Like, if I’m having to do all of it here, all of it here, all of it here, then there’s a time compression. But since everything is computer based, now there’s also a cost implication, right? Because if I can compress some of these, like, times. For the ones that I want to go really fast, I can also do the opposite for things that, like I only need that once a day before I was running this whole thing and everything was, like, every five minutes. I can delay this 80% which I don’t care that it’s a day old. And then that’s computer savings in your warehouse, potentially getting things in your ETL tool? Yeah, I think that’s a big deal. You could
Pedram Navid 19:25
take it even further, because you’ve exposed this like data lineage. You get all these side effects almost for free, and that’s something we’ve actually learned ourselves. It’s like now you have this data catalog essentially, right? You understand all your data assets, and you have the source of truth of where your data is defined. Well, now you can search that, and now you have a data catalog for free, like you don’t have to go and maintain a separate one. Data quality becomes something you bolt on top of your actual execution. It’s not an afterthought. It’s like as part of your pipelines, you can start to emit what we call asset checks for data quality things. And like you said, Time Compression becomes a much more interesting problem because. It can actually be a very declarative index. Or instead of saying we want to run these things every day at five o’clock, you can say this asset needs to be updated by this time. Do whatever it takes to make sure that’s done. Make sure, like you run all its parents whenever you need to. And now you’re limited by only the chain of things that matter to that asset and not everything that comes before it so we get a lot of really, I think, nice side benefits of this asset view that I don’t think we really knew we were going to get when we first started going down this path, but it’s become really interesting
Matthew Kelliher-Gibson 20:30
well, and that, I think, speaks to one of those things that you see is that a lot of teams find themselves kind of their their drown in whatever their process is, and so they can’t really see what the next thing they could be doing is, and it’s only once they kind of free up that space or that mental thing, because, okay, now I’ve got deck through this, running this, and I don’t have to think about it. Oh, now look at these other three things that have popped up that we can do that were never part of our initial plan of, you know, we were just trying to, like, not have to spend three, four hours every day troubleshooting or fixing or running whatever. And it’s like, now that’s gone. Now we can actually see more opportunities that we could have never thought of before.
Pedram Navid 21:13
100% there’s that old cartoon of, like a two caveman, and one has, like, a square wheel, and he’s trying to push it, and his friend with the circle wheel is like, oh, you should try a circle wheel. And he’s like, Oh, I don’t have time for that. I’m spending all my time pushing the square wheel up the hill, right? And I feel like that’s the same way with orchestration. Often it feels like, oh, like, just an extra step that I have to go through. But that extra step is like, going to compound your productivity down the line,
John Wessel 21:39
yeah. So I’m curious about the software space, software stack. So we’re in 2025, now. I think the modern data stack was declared dead last year, I don’t know, last year or two, and which I think practically means, like, people are seeing Cha like consolidation. Essentially, I’m curious, like, some of your thoughts on, where do you think that shakes out? Because we’ve got so many different layers we’ve added into a data stack of, like, extraction, observability, orchestration, transformation, you know, the list, really good storage, like, the list goes on. Like, how do you see that playing out in the next few years? Yeah,
Pedram Navid 22:20
I showed that anytime, like, you’re not enterprise ready until you’ve been declared dead. Like, that’s sort
Matthew Kelliher-Gibson 22:26
of the heat Exactly. Love that.
Pedram Navid 22:29
So the modern data stack, I think now enterprise ready. I think it’s ready for, you know, the mass market to adopt, and what we might call dead is being implemented. Still, there’s so many companies going through Cloud modernization efforts first, and they’re moving towards Snowflake, they’re moving towards Databricks, they’re moving towards DBT and cloud like, that’s not dead. So if we define modern data stack as, like cloud data warehouses and like, a few really good tools, that’s fine. Yeah, I think modern data stack sort of, if you want to talk about the 2020s version of it, where every, basically, function you had to do was its own company. Yeah, yeah, that’s probably dead. I don’t think people want 27 vendors to do three things at the end of the day, right? And so consolidation is going to happen. I mean, we’re sitting at a Dixie like our customers are, asking for us to, like, combine catalog and quality until one thing, our catalog will never be as good as a full featured catalog, but you go out and buy and pay like, grand for like, that’s not where we’re competing, but there’s probably some elements of those things that you can combine within the products you’re already using. Yeah, that’s going to continue. I mean, I think my strand is doing this with their transforms. I know you guys at RudderStack are doing this as well. Thanks for doing it. I think it’s just natural. And what’s gonna happen is what happens all the time. We see a bunch of consolidation people get annoyed at the consolidators. Some new tool comes out, and it’s like, I’m really good at this. One particular thing. Interest is going down again. We get 100 of those things, like, it’s gonna be a cycle. And I think right now we’re just in the what are that that plateau productivity area where I think things are slowing down, it’s actually been really good for data teams. In general. You don’t have to pay attention to 500 different things. You can kind of just put your head down and get your job done. And the tools you’re using to do that just keep getting better on their own, which is a good feeling.
Matthew Kelliher-Gibson 24:18
Yeah. I think also during especially that peak, like, 2020, ish, 2021, ish time period, a lot of teams got very hooked on all the different tools and kind of, you know, I mean, I saw where the teams could kind of lose track of, like, well, what is this ultimately supposed to be serving? You know, well, look, but we’ve got, we’ve got all these different things, and we’ve got all this data in a warehouse, and it’s like, warehouse, and it’s like, okay, but what’s happening to it? How is it actually turning into revenue or savings, or profit, or whatever? Yeah,
Pedram Navid 24:50
I mean, and it wasn’t just data. What I realized now, I mean, I’m in marketing land a little bit, and the exact same thing was happening there. But. What was going on in marketing is everyone wanted a tool to solve their particular niche use case, and almost nobody wanted to do the work. They just wanted to buy tools to do the work for them. And you ended up with like, these massive marketing signs with like 4050, different tools to do, like three things. So it wasn’t just us, but it was everywhere. It felt like at that time, but I think we’re now in a better place where I think interest rates solve a lot of problems, to be honest, like,
John Wessel 25:24
yeah, sure, yeah, money not being free, yeah, it
Pedram Navid 25:28
solves a lot of efficiency problems. Anyway, I’ll put it that way, right? So we’re seeing that consolidation. It might not feel good to everybody, but I think at the end of the day, businesses are operating more leanly, and they probably aren’t, you know, losing a lot at that expense either. Yeah,
John Wessel 25:43
I think that’s right. So I talked a little bit about orchestration, what that is. DAG star is unique, you know, twist on that. I’m curious about your kind of career trajectory you mentioned when we were talking earlier, data science, data engineering. Now you’re in dev, REL, marketing, data. Tell us about that journey. I think it’s a little bit of a unique journey, and to be interested how that all played out for you. Yeah,
Pedram Navid 26:09
When I did things like that in high school, they asked you to fill out a survey. It’ll tell you what kind of job you had. I don’t even remember what it was, but it was like a job I’d never heard of and I never knew what I wanted to be when I, like grew up, I just sort of fell into different jobs based on what I was interested in at the time, data science was, you know, a thing that was everyone’s mind back in 2018 I think it was, I was listening to all the data science podcasts. Many of them are now defunct rip, but they were, it was the next hot thing, right? And so I was like, All right, I’m gonna figure out how to become a data scientist. And I did that for a few years, and what I realized was the new batch of data scientists that were coming in, they weren’t as technical as I had been. I spent more of my time programming than they had, and so they were great at building models, much better than I was, because they were trained in it, but they couldn’t deploy them at all. And so, yeah, I started building, like, infrastructure just to make it easier for them to deploy, because their code was better than mine. So I ended up becoming a data engineer by accident, and I found that really rewarding. It was great to, like, build something, and then the reward is like, someone using it, whereas as a data scientist, the reward is, like, maybe in a year, you’ll find out if your experiment was correct. Yeah, right. So for me, like, that instant validation of, like, knowing I built something that clearly, like, works or doesn’t and the person next to me is benefiting was super empowering. And so that’s how I started in data engineering. Did that for many years, eventually becoming a head of data at a company called Hightouch, which, back then, was really focused on the data persona. And as part of that, I was also doing what we can call Dev Rel, essentially talking about the product to data people. I ended up starting a team there, moving on to consulting, where I thought initially I was going to do data consulting and, like, help people with their data problems, but almost every company that talked to me wanted me to help them with their marketing problems. And even though I didn’t think of myself as a marketer, I think they saw like the devil activities I was doing and the success we were having at Hightouch, and they wanted me to replicate that for them. And a lot of that was just educating them that, like, you know, copying the thing that I did that won’t work for you. Yeah, yeah. It’s not the blog post that’s successful. I think a lot of people look at DBT, for example, and they saw their like, massive community, and they thought, oh, I should open a Slack community. And it’s like, well, why? Like, how, like, where’s the value to the actual user. Do you think, did people want 25 different slack communities, or do you think they want one or two places to hang out, right, that might already be like a place that’s covered for them? So it was more about talking through what were really marketing principles. But to me, it was just a common sense about how to get to data people in a way that made sense and that, I guess, like, put the mark of marketer on my head, and eventually I joined dagster, initially as DevRel, and more recently, DevRel and marketing and also data. Yeah,
John Wessel 29:09
That makes sense. There’s a trajectory. Makes sense, and I would imagine. So the alternative here is, like, let’s just, you know, for Dex, or like, let’s hire a marketer, right? Yeah, and there’s got to be, we’ve already talked some about the synergies there, but there’s also got to be this, like, scratch your own itch. You kind of get to market to yourself or to your previous self, which like that has to be an advantage, I
Pedram Navid 29:33
I think, for a company like Dexter, and for any technical company that markets the technical people, having a technical person who really gets the audience and the go to market motion and likes really gets it is critical. And I think we’ve even made mistakes with this as well in the past where, like, we’re an open source core product, like, by our nature, we are, and so we shouldn’t hide that fact. And I think if you talk to a traditional market. Editor, they might be, like, scared that people might use open source because we’re not capturing an email, right? So, yeah, of course, direct them to the email form instead. Yeah, take all the open source things from our website and bury it deeply, right? Like it doesn’t exist anymore, kill it and like, that’s the mentality of someone who doesn’t understand how developers might operate, right? Like a developer is not going to want to sign up for a course or fill out a form. They’re going to want to try the product, and they do that through open source. And so open source, to me, is not a competitor to DAG surplus or enterprise offerings open source, it’s like a channel. It’s a channel where people get to try it. And if people go out and they’re successful with open source, and they never want to talk to us. That’s totally fine by me. That’s another Dixie user out there in the wild talking about how great DAG stores are. That’s free marketing. And so for me, open source is part of it, and like you really have to understand developers to be able to market to them, and that’s really kind of why this marketing journey between developer and marketing made sense to me at first. I was suspicious. I think if you asked me as a DevRel person to report into marketing, I probably would have said, No, if you have DevRel and marketing working together and they’re all reporting to me, kind of felt fine. And I’m seeing it today like it actually works out really well, yeah.
Matthew Kelliher-Gibson 31:16
And I think that’s also when you get to the open source stuff. I mean, especially when you’re trying to do something at scale. It can be, most open source projects are really hard to continue at scale. So it gives you a way of people liking it, they trust it, and then they can go to, okay, how do I make this easier for myself to use over time? Yeah, we see that
Pedram Navid 31:36
all the time, like people don’t want to run and maintain infrastructure generally, right? It can’t be the only thing, because often the companies that are good enough at using Dixter, they can figure out how to deploy Dix or themselves. Eventually, it’s not that hard, right? So you do need to have, you know, things that are value driven in the enterprise offering, hopefully that will drive people to that. But also, it’s easier to get open source into an enterprise than it is a vendor. So yes, if I work at a big company and I really like Dixter, will I go and try the open source product and prove its value, or will I get into this, like, long, lengthy, lawyer driven vendor negotiation thing before I’ve even, like, shown it to my peers, but it’s a good idea. I’ll often start up a source. I’ll build some momentum, and then once we’ve proven out its value, we’ve hit either scaling limits or I just don’t want to maintain it, or we want additional features, I’ve proven it’s useful, I can go and have that conversation, and I’ll go contact, you know, a sales team, and have them start like, knowing that’s a journey that people go through is, I think, critical in building out, like, technical orgs that market the technical
John Wessel 32:40
people. Yeah, I couldn’t agree more with that. And there’s this other component to where you’ve got, you know, a team that’s vetting a product, proving it works, like imagine that you’re going through a traditional enterprise sales process, and I’ve done multiple of these where that you don’t get to see, touch, do anything with a product until, basically, the money has changed hands. It’s been a while since I’ve done one of those type deals, but I’ve done those before, and those are scary as a technical person, but a lot of times, and a lot of times is maybe driven by, like, marketing or sales, for example, they’ve got to have this product, and then you, as a technical person, stuck with, like, you’ve got to integrate, to implement it. So number one, like, for people that have been around a little bit, they have that in the back of their mind, as far as, like the alternative and hate it. And then number two, like, you have this other practical competitor, in a sense, where the open source product keeps you, I think, honest as a company, where, like, if you ever were to like, 10x your prices overnight, like, people could switch to over open source, for example. But if you’re like a traditional, you know, enterprise type thing, and you connect your product, and people are kind of stuck because it’s hard to replace, then people are stuck and they have a lot of pain to switch. So I think that’s another component that I’ve always appreciated about open source
Matthew Kelliher-Gibson 33:57
well. And I think the other one with that is like when I first came on to RudderStack on the marketing side, one of the things that I told them was, because there was, you know, I was talking to someone on the marketing team, and they were like, well, we really want, you know, RudderStack to be the reason you get your next promotion. And my reply to that was, I don’t know anyone on the data side who buys software to get promoted. I know people who don’t buy it because they don’t want to get fired, and the open source kind of helps you bridge that gap where we’re not saying, like, hey, I need to make a really big commitment that’s going to take time to implement, and I really hope it goes well, or I’m not going to be here in a year.
Pedram Navid 34:39
Yeah. Yeah. I mean, the other thing we’ve seen is, like, if you really want to get promoted, you build DAG, start from first principles, and it takes three years, and then you quit, right? You get that staff level engineer, and then you just like, all right, I’m out of here, off to the next one, and then what you built is, like an in house, shitty version of a product you go to bottom for, right? So there’s two sides to that. I think open source just makes it easier. If you. One, there’s this idea he might help be able to avoid vendor lock in as well, which I think really is appealing to people. But I mean, there’s also great software that doesn’t have open source, and people buy it and love it. There’s technical things you can do with it. But I think we all as engineers have seen those like Monster implementations that promise, like often the best ones are the ones that promise you have no need to talk to your engineers at all when you implement it, right? Yeah, sales process. Oh, yeah, you just like, plug and play and click a few buttons and you’re in. And then as soon as the deal is signed, oh, by the way, where’s your engineers? We need them to come implement this thing they’ve never heard of before, right? That’s the thing I think everyone wants to avoid, right? During these like, but
Matthew Kelliher-Gibson 35:41
yeah, the other version of that is, oh, we’re going to handle everything for you. We’re going to help you along the way. And then you sign the deal and they and you say, Okay, how do we migrate this data? And they go, Oh, well, it has to follow these, this standard. We didn’t do anything before that. That’s all on you. It’s like, well, that would have been nice to have known a month ago, Yep,
John Wessel 36:03
yeah, okay, so we played this game on the show where we see how far into the show we can get without mentioning AI. So what, I don’t know where we’re talking today. I think we did okay, but I want to talk a little bit about AI. And we got to talk about orchestration. You know, I think DAX is a tool you can also use to orchestrate when you’re, you know, pulling data together for AR, doing other things. I’m curious, like, what are people actually doing? So maybe, you know, people using Dexter that are more on the cutting edge of, you know, using llms and, you know, maybe AI agents. What are people actually practically doing with AI and orchestrators? Yeah,
Pedram Navid 36:43
We see a lot of data prep for AI within dynasties itself. We even see some companies building foundational models and doing experimentation, but that is, like, I would say, cutting edge, but bread and butter use cases. At the end of the day, I think AI engineering is data engineering, and we even believe data engineering is software engineering. So if you follow this logical conclusion, it’s all really the same thing, right? You’re moving around data, you’re transforming it, you’re storing it, you’re converting it, you’re embedding it, you’re calling APIs. Is that data engineering, or is that, you know, working with open AI and llms, like, that’s one in the same often the what we find is actually AI engineering is a little bit easier with the ISP and ml engineering, because you’re relying a lot on these, like third party providers, for example, for embeddings, your experiment like, you’re not doing a lot, you’re not training models, right? It’s done for you. You’re really just experimenting and, like, putting things out. And so we’ve seen a lot of companies do things like, I mean, rag is the big one, right? Everyone’s trying to like AI . It is great, but it needs context. Without context, it’s often garbage. If you go to Open AI or quad today and you ask it to write a Dix or pipeline, it’s often going to write really terrible code because it was trained on like, Dix or code from three years ago, which probably is valid anymore. But what we’ve done is we’ve built internally a rag model that uses our documentation, our GitHub issues, our GitHub discussions, to power what we call AI. It’s a Slack box in our Slack community, and it does really well. Is it perfect? No, but it’s like a lot better than nothing. And so yeah, I’ve used it. It’s pretty great. It’s pretty good, right? Yeah, not bad for a POC. And, you know, we could always make it better. Sometimes it gets confused, but it’s better than not getting an answer, which is always what I tell people. So context is everything, I think in AI. And so what is the context? Context is data, right? So, ingesting our data, transforming it, picking the right ones, adding metadata, running experimentation on those different context windows on different models. That’s really where the Dixter thing shines. It’s just like running these pipelines.
John Wessel 38:49
So help me out with this. There is basically a clone thing about a data stack, where the modern data stack from 2021 there’s almost, there’s a clone of almost every single component that’s like, AI focused, right? Like, there’s an orchestration tool, ETL tool, database specific. And I’m not, and I’m personally not, super knowledgeable about each of those components when it comes to AI, do you think that stays or do you think it all gets consolidated back? Because it’s not that different?
Pedram Navid 39:18
Yeah, it’s a good question. Maybe the vector databases stay, yeah, if, yeah, if they’re lucky, is back best guess, or did
John Wessel 39:27
they? But I don’t know technically how hard that would be to implement. You know, for snowflake and Databricks to implement, that
Pedram Navid 39:34
can implement some type of embedding already, yeah, right. That’s
Matthew Kelliher-Gibson 39:39
yeah. Snowflake already has a vector version of Yeah. So
Pedram Navid 39:43
Postgres has vector embeddings now. I think even Mother Duck DB habits. Is it that hard to store a vector of numbers? Probably not. There might be added benefits to using a dedicated vector database for I don’t know Sure. I think
Matthew Kelliher-Gibson 39:57
applications are going to become specialized. K. Services that you run into, yeah,
Pedram Navid 40:01
That’s my guess. And outside of that, the ETL stuff, I think we love reinventing things. Most people who are getting into AI today, they’re not coming into it from a background of data engineering, yes. So there’s some other tools, yeah. So if you don’t know the tools, you think you have to invent things, right? Or maybe you just want to build new things because old things are boring. Yep, some of those will probably stick around because they’ll be good enough that everyone uses them and they evolve. Yeah, I think a lot of them will fall by the wayside when we realize AI problems are actually data problems, and we have data tools to solve that already, right? Well,
Matthew Kelliher-Gibson 40:36
I think a lot of people still like this confusion. I feel like I still hear around there, which is this idea that we should be replacing all of our deterministic processes with AI. Yeah. It’s like, but I don’t need it to give me seven different answers to it. I just want the one answer that’s right every time.
Pedram Navid 40:57
Yeah. I mean, there’s people using AI as a calculator, and it’s like, well, a very expensive way to warm up the world. So I don’t know. Maybe we don’t need to do that. I don’t know. Sometimes all you need are if statements and a regex and maybe AI can replace that by the end of the day, like, whatever is faster is what’s going to work for people, right?
Matthew Kelliher-Gibson 41:16
I think on that one, A is just going to replace me having to look up how to write the reg X, yeah, that
John Wessel 41:22
is a decent application. So, yeah, along with the AI kind of questioning. I mean, you just kind of alluded to this. I mean, the it’s still very expensive, and the billions of dollars being poured into these companies mask the expense for now, like just, you know, just this week, it came out that the $200 a month plan still loses money for open AI, and I think they weren’t even necessarily expecting that. So what I mean, and of course, the thought here is like, Okay, we’re gonna keep investing money in this, and then we’ll have better hardware that can drive the cost down. We’ll have better, you know, models that don’t have to be, you know, trained, as, you know, in the same way to reduce cost. But, I mean, this is just speculation at this point, but it’ll be interesting. And I’m curious about your take, what does that curve look like because of it? Because eventually, like, the money, I think could run out before we get to that spot. I mean, I mean, I don’t know, what do you think? Just speculation on what might happen there?
Matthew Kelliher-Gibson 42:24
I mean, there’s already some evidence of plateauing. So do you remember
Pedram Navid 42:28
the great VC funded days of Uber and DoorDash, where it didn’t cost anything to use these tools and you’re, if you were smart, you would just abuse them as much as you could. Yeah, you would get the referrals, and the $100 here, the credits there, and it was like five cents to cross the city. You can get free food pretty much every single day, and that was wonderful. And then the company went public, and it would cost like $50 to go five miles, right? I know, yeah,
John Wessel 42:55
exactly anywhere near an airport. It’s like at least $50 even if you’re just going across the street. Yeah, it was
Pedram Navid 43:01
supposed to be better, supposed to be this utopia, and it ended up just being a company that makes money off people, right? And they did so at the expense of, like, killing their competitors. So will AI be the same way? I don’t know. Probably people need to make margins at some point. Cash is not infinite, right now. It’s really driven off massive amounts of funding. At some point that’ll change. Will come down, for sure, but when the margins go down, like the research also slows down, and so yeah, they will probably plateau, and will probably find them useful in some limited capacity. That’s probably not going to fundamentally solve AGI, for example. I just,
Matthew Kelliher-Gibson 43:45
and I think we’re also seeing that having the best model is not really much of a moat at this point. So it’s not like you can say, well, yeah, we’re going to spend billions, but once we get it there, we’re going to capture everything. It does sound a bit like that Uber time of it’s like profits don’t matter. We just need to capture the market. And then eventually, once we capture the whole market, we’ll make money off of it.
Pedram Navid 44:11
Yeah, yeah. It’s tough to capture the market when really it’s a commodity too. Yeah. So I think where AI differentiates is through product, actually. So, yeah, anyone can build a model these days, right? A lot of them are good. It’s great. Open Source models out there, integrating that model in a workflow is where differentiation, I think, really happens. And like great companies who really understand that, can make it all a lot better. So I think anthropic and cloud, for example, do a really good job with, like, their projects and the way they’ve sort of structured to make it very like useful in particular context for solving these, like problems and discussions. I use it all the time. Open AI, maybe not as good, I would say, product wise as anthropic these days, they have more features that I. Don’t end up using, but purely from like a chat agent with documentation store. I think Claude does a better job. Yeah, I imagine in a few years, we’re gonna find companies that really get the product perspective right. And they built really cohesive products which are really powered by AI, rather than just like an AI chat bot that is really good at generating responses, which I think we’ve sort of hit a peak on, regardless of how much better they get. Yeah, the other
Matthew Kelliher-Gibson 45:26
One, it makes me think of it a little bit, is like satellite telephone stuff, where it costs a whole lot of money to get the satellites up and to get the infrastructure there, and once you had done all of that, it was really hard to make money off of it. But then when the next people came around, and we’re just using the infrastructure that was already out there, you could make a profitable model, like a business model, off of it, like with a GPS, yeah, for example, yeah. Even satellite phones, that’s like, it’s still around, and the companies are more profitable with it, right? Because they didn’t have to pay to put all the satellites there.
John Wessel 45:57
Yeah. That’s interesting, yeah. So we have a few minutes left here. I’ll throw this to Matt. So Matt, you’ve spent a little bit of time with Daxter recently, and I’m curious. And you’ve got a data background, you know, Matt, Matt worked for a publicly traded company, and data, I’m curious. Yeah, has Daxter in the orchestration landscape struck you with what you used in some of your previous roles? Like, how is it different? What’s evolution like? Well,
Matthew Kelliher-Gibson 46:26
So most of the places I worked, we didn’t really have an orchestrator, so we had some more, like, pipeline related things, but we didn’t have, like, a dedicated orchestrator, and a lot of it. So it’s been an interesting little journey, having to get to know it a little bit more and, you know, try to sometimes wrap my brain around the concepts, because I think that’s usually it, because a lot of, I mean, there’s a lot of stuff that you get into, like, Okay, I’m planning things, I’m putting them in sequence or in parallel, and those types of ideas, a lot of it then comes down to what’s the framework that they’re using To talk about these things. What’s the language they’re using? What do they label this stuff? So does it make sense? Yeah. So I mean, overall, it’s been, I have the added twist if I’m also including RudderStack into this with some new stuff. So that has thrown some interesting frustrations at the time just learning the two things at the exact same time, right? But I mean, overall, it’s been, it’s one of those things that I can look at and I can see like, Oh, here’s how I could have used it. Yeah, yeah, oh yeah. When I had a team of 15, this is how we could have used this, right? The one thing though, I always I had to think about back then was kind of to go back to a point that you made much earlier, in that there’s this, like newer generation of people who are data scientists or whatever, and they got taught a very applied way of doing things, which typically was very software centric. And how do I, you know, call the function to train a model or whatever? And so when you get into that more, broader, kind of closer to software engineering world, they sometimes get a little scared. And so you really had to pick stuff that you knew you could quickly get them in and get them learning with. So remember, we had a software engineer as a contractor once, and he was going to show us how to modernize our stuff, and he did this whole thing of just basically tearing things apart, building it from scratch, and trying to show it how great it was. And I was like, Okay, that’s great, but no one but you can run this right. Like I got a team of people that, when you’re not here, I need you to run it, whereas something like Daxter is definitely one that you could see, okay, I can get a team of people to be up and running with this.
John Wessel 48:49
I think that’s a really big deal to do two things I thought of from my previous experiences, because I’d use it. It’s actually funny. I’d use the product called Run deck page. And I don’t know if you’re familiar with that one. It’s like a little bit more than a Windows Task Scheduler. But before like, we had like that, you know, kind of DAG type concept, but it’s interesting when you go through what you would do every day, and now you have words and language for it, I think that’s the most interesting thing about finding a good like a DAG, like a good framework for, oh, I didn’t know I was doing orchestration, like I just, you know, scheduled this, to run this and this, I think that’s one of the things. And then the second one, which Matt just touched on, which I talked a lot about. And I think orchestration is a big deal here, here, when you move, when your data team moves from like one, maybe two people to be more of a team. There’s three, four or five, however many people that convert from what I call single player mode to multiplayer mode, it’s a really big deal. The tooling becomes a bigger deal. The version control, you know, and I think, like DBT, for example. Is one thing that I think is a big deal if you’re moving into multiplayer mode for your data team, like DBT and people in that transformation layer, having a solution. There is a big deal when the orchestration is the same thing, where you’re now using the same framework, there’s less esoteric Ness when, like, how do we schedule a job is defined like we use this. It has specs and documentation. So, and
Matthew Kelliher-Gibson 50:23
I think knowing that, because I’ve been a part of at least one company where orchestration had the name, and it was an employee named Gary, and so he ran everything, and when he left, nothing could run right versus, if you and then we were scrambling, whole group of us to try to get things back together, right? But we also didn’t have any, like, we didn’t have the language, because this was almost 10 years ago now, to be able to be like, Okay, now what we need to do is get this into an orchestrator so that we’re not dealing with this anymore, and even just, I think the language of, how do I talk about these things? Okay, these are assets and stuff like that. Giving language to that can be very helpful in just helping. I think a lot of times, people get out of the kind of limited mind frame they’re in, if that makes sense, especially when you’re talking about things like, what does data as a product mean? Well, to a lot of data scientists who are very new, it means the model I built and explaining to them, Well, no, you have to have this. It’s, it’s the end to end, collection to delivery. Is the product, not just this little part that you build. So
John Wessel 51:35
one, one last, one last, take page. What? Where do you where, maybe specifically for dagster, or generally for orchestration Where do you think this goes in the last in the next couple years? What are the core problems in the space to solve for orchestrators such as dagster? Yeah,
Pedram Navid 51:55
It’s a good question. I think one of it is something we just touched on, is that not everyone knows what an orchestrator is and when they need it. And so I think at DAG stores, we have, like, two sorts of big priorities. One is just helping generate awareness of what orchestrators are, what a data platform is, the fact that you probably already have one. And like how to think about observing and having a single place to look at these things, right? You can’t just go to Gary every single time. And so having, yeah, one place where you can understand where everything is supposed to run, that’s, I think, a big piece of it. And the other is also just like lowering that adoption curve for people. So yeah, finding ways to make it easier, more plug and play, to use text with existing, you know, playbooks that you already have, and are pretty common across the industry. Building those out without losing sight of sort of the power of Python and dangster itself is kind of where we’re focused on.
John Wessel 52:48
Yeah, makes a ton of sense. Well, thanks for being on the show. It’s been really fun. Matt, thanks for being here, and we’ll catch everybody in the next episode. Thank you. All right, thank
Eric Dodds 53:00
you. The Data Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.