This week on The Data Stack Show, Eric and John welcome back Pete Soderling, Founder of Zero Prime Ventures and Founder of Data Council, to the show. During the conversation, the group discusses Pete’s career journey, from the first dot-com bubble in New York City to his transition to the Bay Area. The discussion highlights the significance of data culture, the rise of AI, and the current tech ecosystem, particularly in San Francisco. Pete emphasizes the evolving role of data engineering in the AI era and the importance of robust data management. They also preview the upcoming Data Council conference, focusing on data and AI integration and looking ahead at other notable speakers and topics for the conference. Don’t miss the special DataStack show coupon for money off your ticket to Data Council.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
John Wessel 00:03
Welcome to The Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business and human challenges involved in data
Eric Dodds 00:13
work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to The Data Stack Show. We have a very exciting multi time guest. Pete, soda Ling. Pete, welcome back. We’re super excited to spend some more time talking with you about all things data. Thanks. You’ve been on the show multiple times, so a lot of our listeners know you. A lot of our listeners have met you at the data Council conference. But for those who haven’t met you, give us the brief flyover of your career and how you got into running the largest conference for data engineering and investing in data companies.
Pete Soderling 01:00
Sure. Thanks, Eric. I’m Pete Soderling. I’m the founder of data Council, and I’m the founder and general partner at zero prime ventures. I’m a software engineer from the first internet bubble, as I like to say, back in the 90s. And I was a self taught hacker programmer in high school. Sort of made my way to the east coast in New York City, and had my first jobs in tech in New York City in the first internet bubble, and so ended up sort of becoming a founder in New York City, started a couple companies there. Then in 2010 I moved to the Bay Area, started a couple companies there, but one of the companies was a data infrastructure company, and that sort of got me into the early cloud data in for world, and got me really excited about data. And just sort of my geek mind went long on data, and ultimately that sort of culminated in me starting data Council, the world’s first data engineering conference in 2012 which was a long time ago. Now it’s hard to believe, and over the years, been sort of building out the data community across multiple dimensions, and ultimately culminating in starting a venture fund to invest in Day Zero engineer founders inside the data community and beyond.
John Wessel 02:04
That’s awesome. So we were talking before the show about AI and the impact on data engineering, roles, products. We’re talking about people starting these AI companies. I have no idea about data engineering, so it was a bunch of topics, but I’m excited about talking about all of that. What are some things you’re excited about?
Pete Soderling 02:21
Yeah, I think that’s really astute. I think there’s a whole generation, new generation of hipster hackers, quote AI engineers, which is amazing to see, not to be pejorative, because we need new, fresh blood in the community. But I think there’s sometimes a gap in what newer folks in the community might understand about older school, old, stodgy data management techniques and architectures and data infra. And so this year’s data console is actually an effort to put those two pieces together and to explain why all the new, sexy AI stuff at scale ultimately becomes and sort of demands a data engineering solution or data infrastructure solution, a data management solution. So we’re excited about sort of pushing that vision forward and putting these two pieces of the community together and really explaining why data needs AI, or, sorry, why AI needs data management. I should invert inverted and put it that way around.
Eric Dodds 03:12
I love it. Well, tons to talk about. Let’s dig in. Yeah, let’s do it. Pete, the first time we had you on the show was actually just just under three years ago. I think we can call it three years ago, towards the beginning of the show, and you have a really cool superlative for the show. You This is your fourth time on as a guest, and so you have been on the show more than any other guest, which is pretty cool, and I think that is the longest time span as well. So just really neat for us to look back at the history of the show and see that you’ve been a part of it in an important way every single year. So welcome back.
Pete Soderling 03:54
Well, thank you guys. I’m definitely out of good things to say by now then, so you’ll forgive me for being a little bit flat today, but it’s also been great to have you guys at data Council and physically recording shows and talking to guests and talking to speakers, and your support and participation over the years and being part of the data Council community, both as The Data Stack Show as RudderStack, it’s been really appreciated. So thank you. Thank you for that, and the community appreciates that
Eric Dodds 04:19
absolutely. Well, Pete, we’re super excited about data Council this year, and I want to talk about the themes, because we are pumped about I mean, that we’re watching the industry change before our eyes, but before we get there, you have this really unique perspective on being an engineer in the first.com bubble and in New York City that was an exciting time. There can be an electric feel when you see an ecosystem emerging and you realize, okay, this is potentially going to be something that that people look back on. And I can’t help but feel the same way about where we are now with AI and you are on. The bleeding edge of that, both the data Council and the companies that you’re investing in. So just give us a little bit of a sense of what was it like back then. What is it like now? How does it feel the same and what’s different?
Pete Soderling 05:11
Yeah, it’s a really interesting question, and I have an interesting perspective, because I was not in the Bay Area, which is typically seen as the pinnacle of engineering dumb, right? Like, that’s sort of the Bay Area. I mean, maybe you’re in Boston, you’re at MIT, but ultimately you migrate in the same way I see migrated from Boston, you sort of end up in the bay area. But so that’s where sort of the best engineers in the country congregate. Well, that was not me. I wasn’t stuck in New York City. And the interesting counterpoint, however, is that New York City had a pretty strong data culture. And the data culture came out of the quant teams at the banks. So the banks had, like, really pretty hardcore quantity data science teams. Well, they were also backing into sort of early flavors of data engineering to, like, satisfy the quants. Interesting. So there was an interesting like, sub culture of data, culture inside New York City engineering that was actually pretty meaningful. And then it’s no wonder that a lot of those engineers and scientists went to double click and double click turned into this high frequency, low latency ad trading platform, because that was like the high frequency, low latency system. Yeah. So those are so the first tech companies in New York that hit scale were also, like, sort of pulled out of the quant bank culture and even some architectures and even some business models. And so there was this interesting bright spot of, like, relatively hardcore data people and some high scale engineering as well in New York City in the mid 2000s and I think some of my interest in data probably sprung from contacts and touch points with that community. And when we started the data engineering meetup, originally in 2012 it we started inside Spotify office in New York City, and our first talk was from Eric bernardson, who spoke on Luigi, which was this data orchestration. Yeah, Spotify was open source, free air flow, yeah. So all this was happening in New York City. And you don’t think of New York City as being like a hot spot for data engineering, but in my particular case, that was kind of my my educational experiences were mostly there before i i moved to the Bay Area. But yeah, there was an interesting significance of data, not just quantity stuff, but also some of the early engineering stuff that was what was centered in New York City, believe it or not. Yeah, that’s just an interesting point that a lot of people don’t sort of understand, I think, from a necessarily an engineering culture perspective. I
Eric Dodds 07:34
love it. There’s, there are interesting stories of, we’ve had a couple people on the show who were also immersed in that quant world, in advanced trading and everything. And one really fascinating thing is that back in those days, physical proximity to connections for sharing data and the speed was a huge deal, right? And so like office location and bandwidth and networking was like a serious it was a data was a factor of real estate to some extent, which is kind of
Pete Soderling 08:04
wild. Yeah, there was data centers popping up in Jersey City, across the Hudson from Manhattan, and there were sort of big ISPs and data centers located over there. And the big banks were starting to build stuff there because it was a short hop from Manhattan. And yeah, those latency milliseconds when you’re a quant, when you’re a trader, in sort of a high frequency environment, especially when you’re trying to automate the trading and do more computerized, computer based approaches, was critical.
Eric Dodds 08:28
Yeah, so I love that sort of inside story from being in New York and the data. What is What Are you sensing now? What are you feeling now? What are the founders, the engineers that you work with that are companies that you invest in. What’s the feeling like now?
Pete Soderling 08:45
Well, I mean, I guess if my career and life is any example, I mean, I migrated to San Francisco, so I don’t want to be too pedantic about it and act like it’s a foregone conclusion, but I do think that a lot of the best engineering sort of ultimately ends up finding its way to SF. And I do think that in this current AI world, there’s not many places like San Francisco. I think there’s interesting research. Obviously in Paris and London has a bunch of Deep Mind people. And so I’m not really here to pick a winner in terms of which cities are best, but for sure, there’s a lot of super intense, concentrated interest, experience, funding, and a lot of that is sort of shifted, maybe back to San Francisco, maybe after COVID. If we’re thinking about this short time frame, were people thinking, really thinking that Austin was going to be a tech capital the world, or Miami, like I don’t know if people were seriously thinking that, and that’s not why we ran data Council in Austin, we just wanted to be in a warm, cool, fun place where people, yeah, could enjoy themselves. So people thought that I was maybe long on Austin, and again, not to say anything bad about Austin, because it’s an amazing city, but, and there’s smart engineers there, but San Francisco is still, in my mind, kind of unrivaled at the top of the heap when it comes. To this intersection of AI research, product hacking, funding startups, and these are all the things that I care about right now, which it’s not just AI research. Maybe there’s the strong AI research locations around the world, but when you put the full stack together of what it takes actually weaponize the startup product into a real growing company in the AI world. I think that there’s really no better place right now in terms of community support and ecosystem than the Bay Area. So that’s sort of my current take.
John Wessel 10:28
Yeah, as far as that stack, if you had to, if you pulled one piece, if we think about it like you’ve got these pillars for it, if you pulled one piece away, that you think makes the biggest difference about the geography being an SF what’s the one piece that you think really makes the biggest difference?
Pete Soderling 10:45
I mean, it’s hard to say I don’t, and I don’t want to, like, get too high on my own supply but, but, I mean, of course, the funding matters. The concentration of funding is like the Ising on cake. I mean, you have all, the other stuff comes before and is arguably more important, the engineering culture, the deep experience that folks have, the universities at Stanford and Berkeley, just the DNA of like product building and engineering and going to market, and this is the stuff that matters most. Like, I’m an engineer and a founder, and I did a lot of things with no money, because I figured out how to be scrappy. But then you sort of layer over all the investor interest and the depth of the funds, and the large, the size of the funds and things, and I think you just get, get this really, like, incredible juggernaut that is the Bay Area.
Eric Dodds 11:32
Yeah, we’ve talked a lot about that in the southeast, right? Like, what are the ingredients that make this? And I think one of the realities is just almost the the compounding economy of scale, where you have that mix of ingredients for decades on end that actually feeds the engine itself by having major, sort of record breaking successes repeatedly, and then that money gets put back into the system across a number of different variables, right? Startup companies, universities, research, like all that sort of stuff. If that machine, if that flywheel, turns for decades, it’s just, I mean, it’s it a juggernaut. I think is a great word for it.
Pete Soderling 12:16
And I don’t, and I don’t want to say that everyone has to, like, be two feet like geographically committed live in San Francisco, like I respect remote work and the equalization of sort of talent and across geographies and cheap living costs. I mean, God knows, like when I was a founder, I was sort of living in different places around the world, like partly to be cost efficient, and sort of phoning home and talking to the team other places. I don’t think that everyone has to physically be located in SF, but I think if you’re an engineer founder, you ignore SF at your own peril. And so that means that you need to somehow be connected there. You need to be spending time there regularly. You need to sort of honor what the ecosystem is, even if you choose not to live there, like we invest in companies in Europe and in New York City and all over the US, and we don’t demand that every founder lives in SF, but I do think that you ignore it at your peril, and you have to sort of come to terms with how you are going to embrace and leverage that ecosystem and sort of be a part of that ecosystem to the extent that you can, even if you don’t live there. And I think that’s smart engineer founders find themselves doing in some way such
Eric Dodds 13:23
a fascinating topic. Okay, well, speaking of the Bay Area, give us whet our appetite for data Council this year. I know there are a couple of specific subjects that we want to get your expertise on, especially around data and AI, and I think that’s going to be a big emphasis of the conference this year, but we’re super pumped. Tell us what we’re going to talk about at data Council. Yeah,
Pete Soderling 13:43
well, we are covering lots of sort of good, amazing stuff at data Council, as we always do, across a dimension of different tracks. I think we have 10 tracks this year. We have a new foundation models track. We have an AI engineering track, which is going to be awesome. We have a generative AI apps track. That’s kind of all on the AI side, and then the classic data side. We have tried and true data edge analytics, data science and algos, databases track. Andy Pavlos, coming from CMU speak, that we’re really excited about. This will be there from Mother Duck, the author of Mother Duck, awesome. Or I’m sorry, duck dB, the mother duck will also be there, which is yes, entity around duck dB. So, so yeah, Ryan from tabular of iceberg fame will be speaking. Lloyd from Looker, who’s now the author and the creator of Malloy, which is this drop in SQL replacement, which is quite cool. So we have, like, lots of old stuff, lots of known names in the data, classic data in for world. But then we have this new edge of, oh my god, like we’re embracing, we’re in this, living in this AI world. And what does this mean for all of us, data, people. And I mean, I believe that the mother of AI is data, but it’s sort of explaining to the world, like, exactly what that means and why we believe that to to be true, and how these two sides go together. I’m just part of the FEMA data Council this year, and we’re particularly excited
Eric Dodds 15:03
about that. I love that. I’m super excited personally, because the it feels like the pace of what’s coming out, even just in terms of, I mean, even down to the models themselves, right? I mean, every week it seems like there’s sort of major industry news, you have all of the companies that are proliferating around this. I mean, it’s creating entirely new categories of problems to solve, especially around data. And so just to be in one place, to have that concentration of those caliber of people, I feel like is going to be, it’s going to be make it possible to drink from the fire hose a little bit more than just following Hacker News. And yeah,
Pete Soderling 15:45
our tagline is literally like, Come and meet your data and AI heroes. IRL and data Council is such a special event because we sort of insist on it being in person every year, and it is tough to get geeks to like, come to the same spot, and sometimes I feel like we’re dragging them by their hair. I’ve probably said this before. And we cajole and we plead and we tease them with, like, amazing speakers and and then we put barriers in front of them, because the conference has to, like, make money to survive. And so it’s like, we try and make it as open source friendly as we can. And then there’s some commercial things that have to happen, and people have to book flights to come, and they have to, like, take time off work and figure out their schedules, but then you get everyone in the same room, and it’s just magic. And all of these genius people, tool builders, founders, engineers, long term champion bears in the world of data, and it’s just really such a special time. And it’s this IRL component that we think is really special. And we just look forward to every year
Eric Dodds 16:42
it’s it is totally special. I can speak from first hand experience as a multi year attendee. Well, Pete, let’s, can we dig into a couple of these topics and just start, I want to start breaking them down. And let’s start. I love how you sort of painted the picture of like, which is kind of funny relative to the actual age, right? And I mean, you know this better than anyone, but we sort of say, like, traditional data engine, it’s like, that’s actually in relative the world of technology is still very young. The AI is happening so rapidly. Rewind
John Wessel 17:12
for like five minutes on the data engine thing, like, because I think we start there and you’re like, oh, data engineering is changing. I think it’d be fun just been, like, just a couple of minutes on, like, data engine, like, what that used to be and what that became, because, like, I’m coming from, I was telling Pete in the intro DBA background years ago. Like, database administrator, separate role, System Administrator, separate role. And then data engine is like, okay, like, let’s pull in some of that, like, old DBA stuff. Let’s do some analytics stuff. So I think that’d be fun to start there, and then let’s move into what we think is, yeah,
Pete Soderling 17:47
yeah. I love it. I mean, we’ve seen the tool stack change over the last decade as the roles have blended into each other, and obviously the shifts left perspective, which means software engineers end up ruling the world, means that software engineers also end up ruling more job titles inside a team. And so probably most modern startups are hard pressed to like identify who the DBA is. The DBA? Yeah, definitely, all the engineering team and when people are responsible to manage the bits that they put in production, and that might include everything all the way down to the data storage layer. I mean, obviously there’s still DevOps teams, but even some engineers sort of cross software engineering is eating into the system ops in the DevOps world, right in the same way. So, so no, that’s it’s been fascinating to watch that whole amalgamation and evolution through the lens of data and data console and and then it starts to be more specific, like this year, this whole collapsing of batch and streaming systems like each other, right? I think estuary is speaking at data console this year. That’s an interesting thing to think about. And then you go down one more layer. Well, what supports that? Oh, it’s the lake house architecture and iceberg tables and the hoodie tables and these file formats that allow, like, near real time data streaming, like use cases on top of them, and yep, so all of a sudden, that starts to throw into question the orchestration layer, because all of a sudden, if you’re not orchestrating data into these different formats, into this, like, long pipeline, and you can just approach the data query where it sits, get access to where it sits and where it lives. Yep. Does that obviate some of the ETL ing that we’ve been doing across these systems over the last 10 years? So there’s all kinds of interesting implications, I think, that are that are buried in this. And obviously we see a lot of this evolution in the tooling, in the community. Yep,
Eric Dodds 19:38
that was actually something I’m trying to remember how long ago this was that Andrew lamb from InfluxDB had talked about this before the, I mean, iceberg had certainly been around, but sort of like right before the big lake house, sort of wave, when you had one house and a bunch of the company. Sort of got developed around it. It was interesting to hear him. He was talking about time series data, which has a bunch of his own unique challenges. And it was interesting to see him kind of like dream. I mean, there’s a couple years ago where he was this, like, I mean, what would be amazing is if you just, you literally fire hose everything into object storage, and you can just leave it there and you query it, right? And he was sort of talking about it as, like, this conceptual thing, and then rapidly, like, the tooling was like, Oh, wow, that’s actually how companies are starting today, not like, we’ll get there at scale, but it’s like, well, the tooling is now at a point where, well, actually you can just start with the lake house architecture. You don’t have to do any you can do most things streaming now instead of batch, and you can control costs. Really,
John Wessel 20:43
we had, yeah, we had a startup on the show that was, like, part of their core architecture is, like, well, it’s an s3 I think they, like, segmented buckets by s3 like, like, real time, like Reddit in process data, with some customer data, and then, like, it was ephemeral, and, like, after that was done, like, oh, they spun it down, yeah, right. Just some interesting things around, around that type stuff,
Pete Soderling 21:04
yeah, the s3 ification of everything is definitely a theme that we see in data Council and some of our investments at zero prime it’s not just that internal companies are trying to, like, go for cheap storage whenever possible, and re architect internal systems to do that. It’s that also the database vendors. And there’s a new class of databases coming up where everyone is trying to run on the cheapest storage possible. Because, hey, like, people are tired of their snowflake bills and and they want sort of more scale and better cost, better economics. And so we’re starting to see, like as three become a credible sort of base bedrock for a lot of data storage and a lot of applications and future applications that are starting to pop up. So that’s a common theme that we’re seeing across the industry. For sure,
John Wessel 21:48
another thing kind of around that, like the s3 thing I’ve seen, what are your thoughts on this? Like, I don’t remember the first time I thought of this was like, Wow, this makes a ton of sense. Just better leveraging this really powerful local hardware that everybody has. I think that’s another interesting theme. Like, have you seen that play out the last couple years? Yeah,
Pete Soderling 22:07
I think there’s something there. We have these MAC processors now on I mean, most of the developers that I know are sort of still Mac junkies. We have them two and three and four chips, I guess now, so there’s a lot of pent up, like power, and we’re seeing this obviously, in some of the AI features in our local machines. I guess it’s an interesting counterpoint to, like, moving everything to the cloud, because modal wants you to stop using your desktop period and just run your Python scripts as if they were locally, but they’re really like, on some remote Yeah. Audience, so, so we’re seeing kind of things go both ways. I’m not sure exactly if I can make a bet as an investor, yet on which one’s going to win the day. From a development environment standpoint, I do think that for sure, the models, like small models running locally, is going to become an increasing the powerful thing. And if you see this through Apple intelligence, and they’re trying to push down all this stuff onto the actual client hardware, yep, so definitely think that’s the thing. How this actually will impact data like classic data engineering or even engineering workflows and development environments? Is there going to be a battle between convenience and and cloud based sort of scenarios, sandbox scenarios, and integrations and things, versus just the power and the cost effectiveness of developing locally. I think that there’s a couple of interesting like credible factors pushing in both directions. So it’s hard for me to know exactly what that’s in. The security
John Wessel 23:35
and compliance angle is fascinating for me too, because Apple would argue, who’s obviously very invested in the hardware side of things, oh, like, locals better, and all the reasons why that’s better for privacy and stuff. And then Cloud venues would vendors would argue, like, oh, well, the local device could be compromised. You want it all controlled in the cloud, that’s better. So I think that, like, tension is there, which also drives people both, I think, two different directions, super interesting. Maybe
Pete Soderling 23:58
everything’s just going to run on the blockchain in the future. Yeah, your local machine is just going to be a node and ephemeral blockchain, and all the crypto people will be right at the end of the day. And we’re, it’s like, that would be the ultimate iron. I don’t know if I could live with that. I don’t Yeah, that would cause
Eric Dodds 24:13
a pretty big, like, you know, internal crisis for a lot of people. Now,
Pete Soderling 24:19
my screen service turning on. I’m mining some eth right now. Sorry about
Eric Dodds 24:25
the it will be interesting. Actually, I I’m going to make it a point to talk to the duck DB and Mother Duck teams about their the local UI that they rolled out. We talked about that on a recent show, right? But super interesting. But okay, we obviously have to talk about AI, because there are a series of tracks at data council that are going to be focused on that at data council. So we talked about, sort of, what’s traditional data engineering? How is that changing? What are the trends? Okay, tell us what you’re seeing in terms of AI and data engineering in terms of the tool. And let’s start specifically with we just talked about. A number of tools that are sort of, let’s say, like more of the stand like advanced tools within a standard data engineering tool set. There’s orchestration, there’s Lake houses, there’s pipelines and jobs running. Now there are a series of tools that are like, essentially developed in the world of AI, but their data engineering tools. So speak to that a little bit. What are you seeing at data Council and especially with the companies that you talk to from an investment standpoint? Yeah,
Pete Soderling 25:27
so I guess there’s different ways to slice this right when you talk about AI engineering and how it’s colliding with sort of the traditional data infra world. Obviously, there’s a whole new workflow of tool set that and process development processes that any developer, any engineer anywhere is getting dragged into through chat, GPT and cursor and, oh my god, vibe coding and all these things. So So, of course, like at zero prime on the investment side, we’ve seen data engineering co pilots pop up and things like this. So, so that’s all sort of the data engineers workflow is changing in the same way that many other engineering workflows are changing, just commonly speaking, I think that in addition to that, there’s what I kind of want to want to bridge to and talk about. Is one of the things that we’re really passionate about with this year’s data Council is really acknowledging the intersection between what AI engineers are likely to face as they have successful applications and like tried and true work that the data infrastructure, data management world has done over the last decades. And what do I mean by that? Well, I think there’s this whole new class of AI engineers that maybe think they can just concatenate three strings and throw them against llms and have a successful AI app. Well, that might be true, but everyone knows that the success of your AI app is based on volume and scale, and so as you like, get a collect more data from your users, that becomes the actual differentiating piece around your AI wrapper, if you will. So I think there’s a whole class of engineers that might become that successful in hopefully they are success that successful in their AI companies and applications, but then all of a sudden they have to manage all this data, and it becomes a classic data management problem. And there’s a whole generation, I think, of engineers that might not know and understand sort of what we’ve been mucking around in for data Council for the last 10 years, which is tried and true best practices of architectures around data storage and data processing and and cleaning and scale and governance and privacy and all these things. And so I think there’s a really interesting compliment between these two worlds. And I think that if folks want to really understand what mature AI engineering looks like from a data management perspective. They need to put those two things together. And we think the data engineering, the data Council community, is uniquely positioned to really bridge that gap and help these founders, new AI founders, kind of get dragged into the world of proper data management, and we think it’ll be incredibly useful and powerful tools for them. So that’s one thing that I think puts these two pieces together in a really interesting way that we’re quite passionate about this year at data
Eric Dodds 28:05
console. Can you speak to and I really hope that there are multiple listeners out there who are starting their own AI startup, and if so, and if you’re listening, please reach out to us. We’d love to have you on the show. Pete. Would love to talk with you if you’re growing quickly. But can you speak to that person who is thinking about starting a company, or maybe has started an AI company, and they are realizing now, like, oh yeah, that’s actually going to be a major problem. What do they need to be thinking about? They may not need to take immediate action now, but, or maybe they do, but what do they need to be thinking about? If they 10x they’re going to face problems that they probably don’t see right now?
Pete Soderling 28:49
Yeah? I mean, well, I think this is just very general advice, but I think it’s all about the quality of the people on your team. This is more startup founder advice than it is technical advice, but I think finding an advisor, or somebody who’s actually been through data management at scale, who’s worked at one of the larger internet companies, or at least a scaling startup, and has gone through a lot of the orchestration pieces and the data storage pieces, and has had to choose between different kinds of data, databases, some of these are non obvious things, someone who hasn’t gotten sort of fully in the weeds on them, even questions like, oh, do I need a standalone vector DB right now? Or should I just be using vector storage that’s getting bolted on and sort of integrated with all the other major data tool income like these are real things that I think modern engineers have to figure out, and no better way to do that than to put someone on your cap table, either as an advisor or an angel investor or someone who’s actually gone through these challenges before, and they’re probably going to look a little older than you, because you’re the young whipper snapper, like super smart AI hacker founder, and they’ve sort of been around data management for a while, and it’s going to be, look like, feel like, kind. Of legacy skills and legacy insights. And I think that’s the point, is that we need to sort of put these two worlds together. And there’s going to be a time gap in some of the skill and a skills gap that we need to find smart founders will want to compliment and sort of plug plug holes in their team and their cap table and get good advice from a technical standpoint. So that’s just general advice that I would give any AI founder who’s like starting a new AI company today, to think about that and to try to add that experience in their team, maybe before they need it, so that they’re not making sort of bad architectural decisions all the way along the way until they realize they have to be undone. Yep.
John Wessel 30:34
So I’ve got a question. Then you’re just it’s just came to me to with data counsel. I think one of your goal is, is to have those two people co located, right? Where you have people that are experienced, that have done it the traditional way, or done it the way before AI was a thing, right? And the new people. What? How have you thought through that? Like, because if you’re doing a conference, it’s, I would perceive it’s easier, like, all right, this is all about AI, learn, go all in, just do that stuff. Or this is all about kind of traditional like, how do you co mingle those to really grab both people?
Pete Soderling 31:06
I mean, partly it’s true. Like, people ask me this year, like, are you going to rename data Council? Like, does this require a top down, like, rebuild of rebrand? Is this a completely new thing? I’m like, Well, I don’t know. Maybe we should, and maybe that’ll be a different discussion for next year and beyond, but, but this year, we sort of like just segmented them into tracks. So we have about half of the tracks are a related tracks. About half the tracks are classic, classic data tracks. But then the cool thing about data consoles, we have office hours after the end of every single talk. So nice if you want to sort of dig, if you’re an AI engineer and you’re listening to a database this talk, and you want to dig in with that speaker after you go to the office hours, and you can sort of have a conversation with the speaker. And I think that’s where some of the interplay and the cross cross functional skills transfer will come. So that’s a very exciting layer of data console that we’ve baked in over the years this we’re very committed to these office hours that happen at the end of every talk, and so every speaker is totally approachable, which is why we’re in the sort of meet your data and AI heroes IRL, because you get to talk to them, and you get to spend, we’ll spend quality time with you, answering your questions. And that’s part of the magic of how we’ve sort of mesh these communities together. And we can’t even other than that. How could we structure it? So we just try and like, get the right people in the room and give them a basic opportunity to have time in the schedule to, like, let the mind smingle, and then they do the rest of the magic. And the community has always been amazing at that. So we just try to facilitate
Eric Dodds 32:30
very cool so great, Pete, can you talk to I’m sure you’ve seen companies who are leveraging AI to deliver some sort of data product to a data consumer. And one of the interesting, one of the really interesting trends there, is that the line between sort of data consumer and let’s say like engineer acting on data is blurring. And even before the show, we were kind of talking about the term data engineer, or that’s maybe a little one that’s maybe a little bit easier, is the term analyst is, kind of, is getting kind of interesting, right? Because it’s like, well, I mean, even on a personal level, I’ve never had a formal job as an analyst, but with a clean set of tables, like I can use AI to write enough SQL to where I used to have to ask people to do that, and I don’t anymore, right? And not I couldn’t do an analyst job as a professional. But at the same time, it’s changing, right? And I wouldn’t necessarily consider myself an analytics engineer, but the line is blurring just in terms of jurisdiction because of the tool set. Can you speak to that a little bit?
Pete Soderling 33:46
I mean, it’s just a I think the power of the AI tools, it’s so generalizable because it’s such a great sidekick, and in any collaboration environment, you can find the AI just incredibly useful. Now there’s all kinds of like workflow, hang ups and improvements and sort of, what does the full value chain look like, and what are the technical aspects, the tactical aspects of how we communicate with the AI? And how is that? What shape does that take? But there’s no question that it’s just changing. It’s changing every aspect of creativity, from image creation to content, right to writing to author and content to music, generation to engineering. Engineering is just another form of creativity, like like creators. Engineers are creators, and there’s a very artistic thing about being creative. And it’s no wonder that AI, which originally caught fire Gen AI, with all the creative types well, that very quickly the engineers got sucked up into that updraft, because engineers are creatorial, and the more we realize that and sort of adapt and are willing to be flexible in using this as a tool to help us. And the cool thing is, well, I don’t know. Maybe this is not right. Maybe there are like, old prostate engineers. Who are really, like hell bent on keep the AI away from me, just like there are some screenwriters unions that are really fucking scared about this whole thing. And but so far, like, I think, to the engineering credit, I haven’t seen a lot of manifestations of that. And so it seems like the engineering community overall is sort of down with using the being very utilitarian, and using the AI to help them create things better, faster, cheaper, and using these coding sidekicks as real collaboration partners. So that’s just a very generalized thought about it, but I do think it is really cool when you understand that engineers are creatorial, and that’s sort of how we fit in this immediate value chain of Gen AI, which has swept the rest of the creative
Eric Dodds 35:41
world, yeah, yeah, I love it. I was reading, I was reading an article about the first railroad that they built that was the, sort of the precursor to the Panama Canal. And I promise this, there’s a tie in here, but it just the amount of it took him, like, eight years to build this thing, and it was less than 50 miles, right? Because of all the difficulties and all that sort of stuff, right? But when it was complete, the mind boggling thing to everyone was that they dramatically underestimated the power of the like rapid exchange of goods from east to west and west to east, right? And it was just dramatically more economically productive than anyone projected. And they had, like, immense projections, right? And I really view it the same way. And I love your mindset, Pete, because it’s like, okay, if we can actually, like, imagine if that railroad got built in a week instead of eight years, like, what types of additional things could have happened within that eight year period, with all the economic activity, all of the different entrepreneurial ideas, I really feel the same way about AI where it’s like, okay, we’re just building the railroad way faster and removing a lot of the manual labor so that there will be a higher flourishing of human creativity. Absolutely
Pete Soderling 36:56
now, of course, like we’re gonna suck some new creators, co creators, into the bottom end of this vacuum that we might not have called engineers before. So like, what constitutes an engineer going forward might be an interesting discussion or debate, because the AI enables people who are, let’s just again, not to be mean, but otherwise unqualified to write code to all of a sudden, be more than dangerous at talking to a database or doing basic data analytics, as you mentioned, Eric So and I think that’s good. I think overall, like we want to increase the surface area of the number of people that can use these tools, and there’s real power there. But of course, like that could feel threatening to engineers who have spent entire careers and degree and lots of time and effort and blood, sweat and tears debugging code for a long time to call and they have their identity locked up, and I’m an engineer, and this person, this other person, is not, so there’s going to be a whole shift in how we I mean, just think of junior developers like leaving boot camps and how difficult It’s been for them to get hired in the last the last 10 years. Yeah, like that. We haven’t seen the tip of the iceberg. Yeah, once people start really coding with chat, GPT and so how does that fit into the ecosystem going forward? It’s maybe a little unclear, but it’s fascinating to think that AI can touch and enable and empower so many people to do cool technical things that otherwise might have felt like it was beyond their reach.
Eric Dodds 38:23
Yeah, I love it, all right. Well, we are at the buzzer. But of course, Pete tell our listeners where they can sign up to attend data Council and get all the information they need. Because if you haven’t signed up yet, definitely check it out and look at getting a ticket early. Yeah,
Pete Soderling 38:40
come and see us in Oakland. Data Council is back in the bay area this year. It’s April 22 to 24th data council.ai, is the site. We’ll include a we’ll create a discount code for data stack show listeners. So we’ll just call it data stack 20 and pop that in. You can get a nice discount on your tickets. Come and see us in person. Data Council is not online. It’s not live streamed. You have to sort of commit to be there. We make it worth your while, we promise, but yeah, come and visit us in Oakland next month, and I look forward to seeing you guys
Eric Dodds 39:07
there as well. Awesome, awesome. Always a pleasure to have you on the show. Pete,
Pete Soderling 39:11
thank you guys, it’s really fun. Appreciate the work that you’re doing. The Data
Eric Dodds 39:15
Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack has purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.