Welcome to a special series of The Data Stack Show from Data Council Austin. This episode, Eric chats with Pete Soderling, founder of Data Council. During the episode, Pete shares about how Data Council came to be, the investor side of the data space, and the place to work as a data engineer.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.
Welcome back to The Data Stack Show. I am on-site here at Data Council Austin recording some shows. You’ll notice that I said “I” in the singular. That’s because Kostas is out doing some really cool stuff with the Starburst team at the conference, so I am flying solo, which is maybe going to give Brookes some heartburn, but I have a great guest. I’m going to talk with Pete, who started Data Council. He was actually an engineer and a former life and has built this amazing conference. And so I’m just going to ask him about his background, and actually what led him to today to counsel. If I’m feeling intrepid, I might ask him about his fund as well, because he’s an investor, which is uncharted territory. But since Kostas and Brooks are gone, I can do whatever I want, so let’s dive in and talk with Pete.
Pete, welcome to The Data Stack Show. It’s so great to have you here.
Pete Soderling 1:17
Thank you. It’s really exciting to be here.
Eric Dodds 1:18
And we are actually live at Data Council Austin, which is a conference you put on and seeing the faces in the crowd in the opening was amazing, because I think, in many ways, people are just so excited to get together and talk about this stuff in person, even though we’ve been doing it for a couple years. So congratulations and thank you for putting on this amazing event.
Pete Soderling 1:43
Yeah, it’s my pleasure. I think everyone feels like they’ve been let out of prison or something.
Eric Dodds 1:47
Yeah. Okay, so give us the background. So you’ve been working in and around data for a long time. I want to hear about the founding of Data Council and you have a fund, which is super interesting, but how did you get your start in data?
Pete Soderling 2:03
Yeah, so it starts back a little bit earlier than Data Council. And I was an engineer, founder turned founder in 2003. I started two companies in New York before 2010. But I started two companies in SF after 2010, one of which became Data Council. But one of my New York City companies in 2008, I started an API-based cloud security platform. And it was designed for businesses to sell streams of data through our proxy software, to other companies. So it was a data-oriented play. And I ended up talking to lots of premium data providers think of like Bloomberg, or comScore, or Garmin, or these kinds of companies that essentially sell high-value data. And we had built this middleware, which was security, proxy chaos, metering, billing, sort of all this stuff. And they would plug their API, their data feed into the back of our proxy, and we would advertise it out to the end-user and help them turn their data stream into a business. Oh, interesting, right? That was literally the first time.
Eric Dodds 3:07
They’re producing data, but the infrastructure to monetize, like, the data is valuable. But like, it’s hard to build exactly infrastructure to monetize that, because like, if you’re Garmin, you’re like building maps.
Pete Soderling 3:18
Yeah, you build products, you’re responsible to sort of push your data, and give a context in your own product. Man, these companies do that well with the Garmin nav device or the Bloomberg terminal. But our thinking at the time was well, what if you unplugged the data stream out of your own product and offered it raw to providers or to other customers. Like, what kind of magic could they work with that same data?
Eric Dodds 3:43
Super interesting. Okay, so that sort of brought you into the world of data, and then what took you from there to starting Data Council?
Pete Soderling 3:55
Partially, it was the unsuccessful launch of that company. And I had to shut it down a couple of years after we started. But in the meantime, I had moved to the Bay Area and sort of gotten to the startup community there, which was sort of a next level leveling up for me personally. So even though I had to start to shut that company down was called Strata Security. I ended up getting sort of keyed into the data world. And by the time 2013 came around, I had realized that there was this whole sort of Strata of data engineering that was being ignored, because everyone was talking about the sexy quanti data science-y stuff that was kind of glittering, sexy analytics.
Eric Dodds 4:34
Well, back then, data engineering was still probably a fairly new term.
Pete Soderling 4:41
Yeah, it was definitely was not a rollout. Most hardly any companies. Maybe Facebook had the notion of a data engineer somewhere bumping around, but most people in the community we’re not even really sort of familiar with using that term.
Eric Dodds 4:54
Yeah, super interesting. Okay, so you notice that there’s this sort of theme emerging in the type of work that companies are doing in the data space and so you decided Data Council.
Pete Soderling 5:07
Yeah, so it started off as a meetup inside Spotify’s office in New York City. They wanted to attract more machine learning engineers to their projects. And I was doing a consulting project with them. And so we ended up spinning up this meetup. Because we saw this market opportunity, we call it the data engineering meetup. And it did really well in New York, then we launched one in SF, and it did equally well there. And by the time 2015, had rolled around, we basically had not just sort of helped the world define what a data engineer was. But we had seen the data scientists come to the group and the analysts come to the group and the AI researchers to come to the group. And it was apparent that everyone wanted to learn how to work together better with their peers, the adjacent layers of whatever this emerging nascent data stock was going to be. And so we find ourselves as a community kind of thrown right into that conversation. And because we had so much surface area, with different kinds of professionals across the data field, Data Council was born out of that meet-up, and we’ve been carrying the torch ever since.
Eric Dodds 6:05
Yeah, super interesting. Okay, one question. This is kind of a personal question, but you always hear the age-old wisdom that you learn more from failure than you learn from success. And so we’re at Data Council, there are five or 600 people here (which is huge, just coming out of COVID, so very successful), but also, you said, you had to shut your other company down prior to that. Do you think that’s true? Like, did you learn more from sort of shutting down that data company then maybe doing some successful things?
Pete Soderling 6:42
Yeah, there are definitely tangible and intangible things that you pick up along the way. And that’s part of this is just called experience. There are a bunch of things that I’m tuned into now, like Data Council is essentially my fourth company, and it was only because of the previous experiences, launching other companies, whether they succeeded or, or failed, maybe I still would have gotten similar experiences. So I don’t know if it’s that the failure breeds the wisdom, or if it’s just the experience that breeds the wisdom, or if it’s the same thing, but yeah, like, one thing I’m really aware of that we brought into Data Council is this notion of founder market fit, and also the fact that the founder has to articulate the earliest brand of the company, and I’ve been consciously infusing data counsel with that brand, ever since we started it. And I think, it’s becoming sort of bigger than me now, because the team is growing and the community is growing, but, but really, it’s kind of like Data Council is Pete’s conference. And it’s the conference that reflects my values. As an engineer, I don’t want to over-sponsor the conference, I don’t want trashy talks. Yeah, I don’t want white people white paper level content, see, but I want to be surrounded by the best, smartest people. And those are software engineers. And so I built a conference, that I want it for myself, and to sort of stick to those values, even through growth is something that’s been a bit of a guiding principle for us.
Eric Dodds 8:06
Yeah, for sure. Okay, another personal question. Have you sat in on some of the sessions? I know, from being involved in conferences from a leadership standpoint a couple jobs ago, you’re running all over the place, but I just knowing you and the conversations we’ve had, like you love getting into the technical stuff. Have you sat in on some of the setup?
Pete Soderling 8:25
A few. It’s a little difficult. We have 60 different speakers this week, and four sessions going at one time plus the office hours track. So there’s a lot going on. So unfortunately, not too much. But we produce all the videos and upload them for free for the community to YouTube. And so sometimes I consume them, I’m just like the rest of the folks that might not be able to be there.
Eric Dodds 8:45
Yeah, very cool. Okay, one thing I’d love for you to give our listeners some perspective on. Data Council has really helped shape some of the same terminology or definitions around roles in data. If you go back to 2012, 2013 data engineering is something that’s happening, but it hasn’t been sort of codified, like as a role or a specific term, at least as widely as it is now. What are the things that you have seen that have been really positive steps and sort of those definitions across the industry (i.e. roles, terminology)? And then what are some of the things that you think are like the industry is still trying to figure out?
Pete Soderling 9:28
Well, I think the Data Council community, just through the sheer innovation and power of engineering has really helped set forth sort of what the main pieces of infrastructure in a full data stack or full data system can be. We have a few data quality companies that are in Data Council and bumped around we have a few metadata companies, data catalog companies, ETL companies, so there are various folks, the metrics layers, I think you’ll see the emergence of all of these categories, generally being defined by people in our community or people with some familiarity or adjacency, to our community. We do sort of help each other establish a common vernacular and not just a binocular, but a common understanding of sort of what the building blocks are. What’s been interesting to me is that I think we have these parallel stocks, we have the data analytics ETL stack, then we have a machine learning stack that sort of runs in parallel to that, but they’re actually mostly different pieces. I’m starting to kind of wait. I’m wondering when we’ll start to see some consolidation, sort of across those two areas, like a feature store is kind of like a metric store. Yeah. And so I think we’re starting to see a couple of companies pop up that actually sort of pitch those combined together. So I think we’ll start to see maybe some consolidation across these two layers, this talk at some point?
Eric Dodds 10:58
Yeah, I think it’s interesting. We were talking about this recently, in that, in some modern companies, you really see the analytics workflow, almost becoming, in some ways, the front end of the ML workflow, right? Because if you get mean, with some of the modern tooling, right, you actually can get a lot of that initial work done. Right, which is super interesting. And that hasn’t your point necessarily been fully productized. But like, it’s interesting to see that happen within companies, we’re just kind of like, oh, wow, actually, like, there’s less work to do than we thought on the ML side, because sort of the analytics, data engineering, like front end of that, that really serves, like the BI use cases, is now happening in a way that sort of formats to like an ML workflow, which is super interesting.
Okay, so you also raised a fund. There are so many podcasts about investing and I know very, very little about that so I don’t want to get into that because I don’t know what I would say, but I am interested in your really interesting perspective: practitioner as an engineer, founder in the data space, and then sort of builder of community that has driven a lot of definition around this. Okay, so that makes me so interested in what do you look for in data technology as an investor, right, like, your thesis or whatever you want to call it? You really have sort of a really interesting combination of assets there that give you a perspective that I would think is pretty unique as an investor.
Pete Soderling 12:43
For me, it’s pretty simple because I’m such an early-stage investor. And also, as I mentioned, I was a founder and sort of have the zero to one sense. It sort of dawned on me. A few years ago, as I was thinking about all the things that I do during my day, and organizing at that time, data councils were running around the world and but yet there was, every once in a while, I got an A call with a founder from the community who had asked me for advice on their startup or fundraising or something. And those were the calls in my day that I look back on. And we’re definitely the high points of my day. And so when I realized that maybe Data Council was just becoming a vehicle or a platform for me to do more of that kind of work, that really made me inspired to take this next level. So I raised the data Community Fund, as you said, in 2020, we have some amazing investors, backers like Sequoia Bain Foundation, AngelList many other folks in the b2b data space. Oh, man, that’s amazing. So I’m very lucky to get that social proof from those kinds of folks. And in terms of what do I look for? I really invest in team and Tam, I’m a precede seed stage, very early-stage investor. We don’t necessarily have to be right in the same way that a Series A or B investor is right. We can sort of look at the founders experience, see if they’re a great engineer if they have some key insight that they’ve learned through their experience, preferably, usually at some previous company shirt that gives them some key angle, and a reason that their startup or their software needs to exist. That’s usually pretty evident if a founder has stumbled on that kind of specific insight, or if not, and they’re just trying to build a me too thing that overlaps with other stuff in the data market. With no significant code of market advantage, that’s probably a red flag for us. A key insight is one thing that we look for. And then obviously, like a really big Tam, a really big market for companies and for their solution to potentially win the day. We’re quite simple in the way we approach things and we write checks for founders at inception point first checks for very, very early-stage ideas.
Eric Dodds 14:58
Very cool. What a fun space began because you get to play in the technology division. But that’s also a very, sort of, I’m not saying later stage investors don’t have personal relationships, but the dynamic of that relationship with someone who has an idea, and they’re passionate about solving a problem I would think is pretty energizing.
Pete Soderling 15:17
It’s very exciting to be in a place where many of the companies that I’ve invested in now, the reason we even got access to those rounds is because the founder said, oh, yeah, like, I first spoke about Apache Hudi, Data Council in 2017. And that’s why I’ve had good vibes from Data Council, and you’ve helped me by promoting the open-source. Our video from the conference has racked up 1,000s of views on YouTube. And you really helped sort of expose our open source project in the early days. And this is why we have such fondness for Data Council as a platform and that sort of carries on into our investing relationship.
Eric Dodds 15:57
Yeah, for sure. Again, I don’t know a ton about investing. But I would think, in fact, as a VC and I looked at sort of the platform or deal flow that you have from the community that you’ve built, I will probably be a little jealous because you get to see these things as they’re happening, which is really great.
Okay, I’m gonna completely flip the question. And this may be a little bit unfair. And I temper this because I know you’re an investor. You have lots of companies here, but just in terms of your personal interest as an engineer, not where you would put your money as an investor. But if you were going to go work as an engineer at a data company, what part of the stack would you go work in? Just out of pure curiosity as an engineer. Like, I’m gonna write code to help solve this problem. Is it observability? Is it streaming?
Pete Soderling 16:50
You sort of take me back because it’s been a long time. I’ve thought about doing any real engineering, I’m very much an ex-engineer now. The thing that really made my eyes light up as a young engineering student was when I learned how databases work, and SQL and the optimizations across the data structures and the index saying and the query plotting and all those things, so I kind of always been a little bit of a database junkie. I’d probably go work with Kishore at Star tree or, or something like that, on some newfangled optimizers. are the guys that era dB working on some newer version of some optimized data system, I think that’s probably where I would tend to migrate.
Eric Dodds 17:32
Yeah, for sure. It’s interesting to hear you say that because the database space is pretty tough, right? There’s so much interesting technology. But if you think about the time it takes to really build the technology itself, that scale is very, like difficult to achieve. And then like bringing it to market so, but actually, just based on what you’ve done, it doesn’t necessarily surprise me if you would sort of go for the jugular on the difficulty.
Okay, last question. So we’re live here at Data Council, interesting, new thing that you’ve learned or new person that you’ve met that you will stick with you from this amazing conference that you’ve put on?
Pete Soderling 18:14
Well, it’s really cool to see—I guess I probably won’t name any one thing in particular—but it’s really cool to see lots of Python, open-source stuff sort of popping up in the perimeter of Data Council. We’ve never been a big Python community like, like the data stick the full-on data science community teachers, right. data engineers are not necessarily Python engineers. Yep. But we’re seeing like, lots of cool open source stuff pop up. I think 30 or 40% of the startups that I announced, were coming out of stealth on stage at Data Council were probably Python-related. So that’s just an interesting data point. I don’t know if it’s here, nor there. But something that I observed this week, that’s been interesting to me.
Eric Dodds 19:01
Yeah, I agree. The converging of what has been disparate parts of maybe not even technology, but like workflows and sort of interactions is super interesting. Very cool. Well, I can say from experience being on-site here at Data Council has been amazing. So to all of our listeners, you definitely should register and come next year. I’ve learned a ton. I’ve met some unbelievable people who have built some unbelievable technology, tons of interesting startups. So Pete, thank you for putting this together. I’ve personally benefited, and best of luck with your fund and investing.
Pete Soderling 19:35
Yeah, thanks for being here and for supporting the conference. Really, really appreciate this opportunity and want to welcome everyone to join us in Austin next year.
Eric Dodds 19:43
What a fun conversation. I think one of the big takeaways that I had from this conversation with Pete was that he really has a sort of a lot of experience of background, working as an engineer in the data space and that influences, I think has empathy for data professionals. And you see that both in the conference that’s running. If you were here, you definitely saw that. You see that in the Data Council in general on the types of content and things that they put out. And then also, I did actually get to talk a little bit about investment, which was uncharted territory, but super fun. And it was amazing just to hear about Pete’s empathy and sort of joy in working with the individuals themselves. And as we’ve said many times on the show, it’s really fun when people are doing exciting things in the data space, but with a focus on the people behind the technology. So also, we need to give a big thank you to Pete and the whole team who put the conference on and for allowing us to record on-site year. So thank you, several more good ones coming up from Data Council. So stay tuned, and we’ll catch you on the next one.
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.