This week on The Data Stack Show, Eric and John welcome Michael Drogalis, Founder of shadwotraffic.io. Michael shares his journey in data, focusing on distributed systems and streaming data. He discusses his background as a software engineer, his work with open-source projects, and founding Distributed Masonry, which was acquired by Confluent. At Confluent, Michael contributed to stream processing technologies like Kafka Streams and KSQL. The conversation covers solopreneurship, synthetic data challenges, and the evolution of streaming technologies, offering valuable insights into innovation and entrepreneurship in the tech industry. Don’t miss it!
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
John Wessel 00:03
Welcome to The Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business and human challenges involved in data
Eric Dodds 00:13
work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Michael Drogalis of Shadow Traffic. Michael, welcome to The Data Stack Show. Hey, thanks for having me. All right. Well, we have a ton to get into. Of course, I’m passionate about streaming, streaming data, and so we’re going to go deep on that, and we’re going to talk about solopreneurship and a number of other things. But first, just give our guests a brief background. How’d you get into data and end up at Shadow Traffic? Yeah,
Michael Drogalis 00:56
by trade. I’m a software engineer. I think the last thing that kind of inspired me as I was coming out of college was distributed systems and streaming data. They were all kind of really getting started around like 2010 or 2011 and I went out and built an open source project, ended up building a company on top of that. I sold it to Confluent, and then recently I left to go start shadow traffic, which we’ll talk about. It’s sort of the inspiration of all the problems that I’ve seen occurring in the last 10 years or so. And, yeah, awesome.
John Wessel 01:24
So Michael, we were talking before the show, doing a little bit of show prep. So many cool topics here. Eric already mentioned one solopreneur thing I’ve just read a lot about, and people are all like, what? Who’s gonna be the first 100 million dollar solopreneur? So that’s a fun topic. And then the streaming topic is just, it’s a fun one. It’s been going on a long time, and I think a lot’s happening there. What are some topics you’re interested in covering?
Michael Drogalis 01:48
Yeah, it’s always fun, kind of going into the details of the problems around synthetic data. I think people look at it and they think, Well, I can just use chat GPT to create some data or a little script to do it, and in some simple cases, you can. But as you start to go down this path, and you need to build more and more cases that reflect production scenarios, it’s actually a lot harder than you think. And reaching for a tool or it sort of has that defined as a set of abstractions that help you. It’s fun to go into the motivation behind those things and the use cases
Eric Dodds 02:15
and such. Well, let’s dig in. All right, let’s do it. Michael, I’m so interested in your background, because you got interested in distributed systems and streaming really early. Of course, in many ways, those technologies have become ubiquitous for certain use cases in the data stack. But tell us what you were doing. You were a software engineer, and what piqued your interests, like, what actually got you into that? Was it a use case at work or just personal research? Yeah, it’s funny.
Michael Drogalis 02:45
I had a college professor, and there was a class on distributed systems, and it was a pretty small class, and so I had a lot of individual attention. And he was just a really inspiring professor. He was telling me about Erlang and message passing and all these things. And there just weren’t a lot of people who were kind of working on it. And that’s actually great when you’re just kind of coming out of school, you want to find kind of a small community where you can participate in and feel like you can directly interact with the people who are working on these problems. And that’s kind of what led to me, like being a little bit involved with Kafka’s, that project. It’s huge today. At the time, like 15 years ago, it was just getting off the ground, everyone was very easy to talk to. It was very easy to follow all the trends that were going on. So it was actually a perfect thing to just jump into coming out
Eric Dodds 03:27
of school, very cool. And so you came out of school, did you get a job as a software engineer, or were you working on, I mean, you were obviously part of the open source community and interacting with that community.
Michael Drogalis 03:39
Yeah, I kind of got to work as a back end engineer working on analytics systems. I did some contracting, you know, and during that time, I The other thing I sort of fell in love with was functional programming. Closure is my tool of choice, and that’s another community that was just, this was, and still is rather small and niche, but I learned a lot there, and that maybe the first three years out of school, I kind of understood what it meant to be a professional software engineer and how to work with other people and that kind of thing. Yeah,
Eric Dodds 04:06
yeah. It’s funny. You’re taking me back because I founded a company with a technical co-founder, and he was, he loved Erlang and closure, like so much, and he was, like, very involved in those communities. And so you’re taking me, you’re taking me back a little bit. That’s fun. Okay, so you’re working as a professional, professional software engineer, you’re involved in the open source community, and then you start your own open source project. Yeah,
Michael Drogalis 04:34
So there, there was, sort of like, multiple pieces to this streaming problem. There’s, like, the back channel, or the backbone, I should say, how do you actually move data from A to B? And then there’s the problem of the whole like, like, what do you do with it? And that’s kind of the whole area of stream processing. And so in 20, like, 11 or 12, Apache Storm came out, which was basically the first mainstream attempt at processing data at real time. And I just felt like there were some problems in that project. It was a great first attempt. I thought. I could do a little bit better, specifically, if I sort of zeroed in on the problem of solving it, like the closure method in functional programming. And because I was part of, like a niche community, I got to build a relatively niche solution that people really liked. And so I got started on that, partly just because I wanted to have something of my own. I sort of felt like, as I left school, I didn’t really have an identity for my work, and it seemed like everyone who was doing really well had started some kind of an open source project. So that seemed like the fashionable thing to do. And yeah, I pursued that for a couple years. I built a community around it. I ended up meeting the co-founder of my next company through it. It was just a really great experience, again, learning how to get along with people that you don’t immediately work with new community members and that kind of thing.
Eric Dodds 05:41
Yeah, is this still around? I mean, that’s a long time ago.
Michael Drogalis 05:46
No longer. So when we eventually ended up selling the company that I co-founded, we kind of had to part ways with it. You can only juggle so many things at once. Yeah, totally. My heart goes out to open source maintainers. It’s a really hard thing to do.
Eric Dodds 05:57
Yeah, and what was so you met your co-founder, and then what company? What company did you found? Yeah, we
Michael Drogalis 06:03
called the company distributed masonry. And it was a little bit of a play that the project’s name was Onyx. It was sort of like a stone based thing. And again, our whole premise is that, like, we could build something cool in distributed systems. And so we tried, basically, to build a platform on top of Onyx. That didn’t really work because there wasn’t really a big enough user base to do like a SaaS and do it as a service, as a consumption based model. We tried, sort of like a function as a service sort of thing in 20, like 2015 Okay, lambda was sort of still getting started. We just whipped on a whole bunch of product ideas. And the thing that ended up working really well was actually kind of going back to something a little bit earlier in my career, which was seeing how we could support Kafka to do tiered storage. So the problem with Kafka, like 10 years ago, was that it was pretty limited in the amount of data that it could transport. You basically kind of had to size it per box. It was so very finicky about the way that you resource these things. And we basically had this idea of like, well, what if you could hook up s3 with Kafka and have unlimited streaming data. And it turned out that the technology that we built on X was actually a pretty good way to do an initial prototype of that, and so we built a product on that, and that was sort of what we had a little bit of success with before we eventually sold the company to Confluent.
Eric Dodds 07:14
Yeah, and tell us just a little bit about the journey of selling to confluent people. I mean, that’s pretty cool to start a company, it sounds like you went through sort of rapid development to find something that would have early product market fit, where it’s like, okay, this is a pain point that’s big enough to where there’s some traction, and then you sell to a company like, confident. They’re obviously much larger now than they were back then. But that’s a pretty neat journey as a first at that as an entrepreneur, yeah?
Michael Drogalis 07:41
Yeah. It was a lot of fun. There’s a lot of good stories in there. Actually, the company was just four folks, and we had been working together for maybe two, two and a half years, and the very first time we all met in person at the same time was at, like, the acquisition discussions, discussion, yeah. So we’re like, we’re actively figuring out, like, the rapport and the timing of how we talk together. It’s in the room. But that was fun. I mean, people like to ask about it. And it was like, surprisingly straight forward. I asked our lawyer. I was like, Well, how do I say this? And he was like, you just say it. Like, do we want to be acquired? Just say it. And it was like, a very, very direct discussion. And it made me sort of appreciate that. Like, as you move up in the stakes in business, it’s just worth being very direct. Nobody wants to waste time telling people what they want. That’s how you win. Tell you what they want. And I learned a lot of things that really accelerated my career by maybe like five or 10 or 15 years. Wow.
Eric Dodds 08:29
Yeah, it’s so funny, because hearing that story about the first time I met my co-founders in person, was at the acquisition meeting, feels so much like a post COVID, a post COVID story, or post COVID dynamic, because that is so much more common now to to have these really like intimate business relationships without actually having physically met the person.
Michael Drogalis 08:53
Yeah, we pioneered the fully remote model for better and worse. So it worked out for us. But yeah, it’s just sort of a funny anecdote.
Eric Dodds 08:59
Yeah, very cool. Okay, well, I know John is burning with a number of questions, but I do want to hear about confidence. So you went to work at confluent, you got into product, and then we’re there for a good while, probably at a pretty formative time for confluent as a company, because the last five or six years have been, yeah, have been pretty, pretty crazy, just in terms of Kafka, generally, a lot of the stuff that confluent has shipped, I mean, on the product side, so many really cool things they’ve done. So tell us a little bit about being confident. Maybe some of the big lessons that you took away. Yeah,
Michael Drogalis 09:34
conflict was an interesting experience. I was really lucky in that I got to work across just a bunch of different things. Primarily, I headed up a product for stream processing. So the way you can kind of think about confluence at the time I joined 2018 was that, like Kafka, was a relative success. They were trying to get a confident cloud off the ground and pivot from like an on prem company to a cloud company. And then in addition to that, it was like, well, we got to get more products off the ground besides Kafka. How do we get into this? Compute game. And that’s where all this ties back to the beginning of my career. I led kind of the early efforts for stream processing. So right? Conf, one has a number of offerings on that. They have Kafka streams, which is a Java library. And then the thing I primarily worked on was ksql, which was a streaming SQL variant, and that, I mean, it was so interesting trying to do this at a time when the company was like, trying to solidify its core offering, move to the cloud, continue through hyper growth. Lots of lessons learned there.
Eric Dodds 10:27
Yeah, wow. Actually, Brooks may be able to look it up for us, but we had a guest on the show who is also very involved in ksql. I think he worked at meta for a while, and then he went to Confluent, and I think he worked on K sequel, I’m not sure, but we’ll try to remember his name, because I’m sure you interacted with him. Yeah, probably very cool. Okay, John, I’ve been monopolizing the mic, so I’m handing it over to you.
John Wessel 10:52
So there are a lot of different ways we can go with this. But let me share something I shared with you guys before, before we started. So you’re new, so you’re no gig, so you’re confident, you’ve started a new thing now called Shadow traffic. In the moment, I, like, kind of understood what you were doing. It took me back almost 15 years of working on a B to B SaaS app. We were going from like a major version change for various growth reasons. We just had a ton of things, tons of features we packed into this particular release. So we get all the codes done. I’m more doing, like, database, back end stuff. Got a sys admin doing his thing, and then a bunch of developers, so we get the things like, Finally, in QA, the developers celebrate and say, it’s done, right? It’s not done. And then we have this problem of like, how do we test all of these, like, use cases, like, and it’s like, should we grab a bunch of production data and, like, run it through here, and then you run into like, at the time, we probably should have been more concerned about, like, privacy and security type things. So there’s that part of it, but there’s also just all these other like, edge cases, of like, well, we need to make sure we turn this feature off so we don’t randomly email 1000s of customers, or we need to make sure we turn this feature off so we don’t accidentally pump data into the accounting system, and accounting thinks it’s real. So there’s all these, like, really interesting things that came up, and we look for a solution. It’s like, maybe there’s a solution out there, nothing that we found, at least. So that’s my setup for, like, kind of, my personal experience here, you’re obviously really deep on this, this problem. So at shadow traffic, like, walk us through, like, some of the first maybe, how you even, like, came up with the idea and got into the space.
Michael Drogalis 12:33
Yeah, I basically observed, like, actually, over the last 15 years from when I started my career, I was just in this streaming space, and time again, it was, like, hard to test things. I mean, we would talk about these features. About these features that are really cool, like, oh, we could capture real time data. We could do real time joins. Look at how quickly we could update these aggregates. Who’s showing it? Where do you see this stuff? I mean, you could see it in unit tests, but, like, nobody was showing it for real. You have to actually go behind the scenes and look at people’s production metrics, or their systems that are just basically secured from vue, because they’re production systems. And I noticed just a litany of use cases for this. So engineering teams kind of need to do testing, stress testing, integration testing, edge case testing, sales engineers need to be able to have test data to exercise their systems to prove that they’re what people are buying. Developer advocates need to be able to put on cool demos. And it’s there’s a lot of places where it’s applicable. It’s this nice niche problem that I felt like, okay, one person could just go and solve this really well. And there’s all kinds of continuations. Once you solve the initial problem, you can do more and more, but that’s really what started it
Eric Dodds 13:32
all. What is what in terms of, I’m just thinking about your experience at confluent, especially thinking about stream processing, right? Because I think that’s an area in particular where you really do need to run a lot of data. I mean, the ideal testing is with your production data, right? Because you have all of these different messages coming through. And there actually can be many times, like a very high amount of cardinality with that. And that can vary over time periods, right, even just through, like, the cycle of a day, it’ll say you have international traffic right throughout the cycle of a day, you’ll have very different types of traffic come through. And so if you’re making an update to a streaming transformation that you’re doing, like, it’s really hard to test. Can you give us an example from a confluent person, maybe, from an actual customer, or some situation where that was really problematic, and how did you face it, and why was it painful?
Michael Drogalis 14:32
Yeah, I’ll give you two of them that are easy to understand. So like, imagine, let’s do, like a retail example. You want to push retail data through an event driven system, and then you have an application that’s sort of processing stuff as it comes in. And you may have two streams, like customers and orders. If you want to actually test this, you’ll have a bunch of customer IDs coming through and saying, all right, customer John and Eric and Brooks come through. And then a bunch of orders come through. Orders almost certainly have an identifier that refers to customers. How do you do that? Do you just pick John? And Eric and Brooks and randomize those. What if the messages for each of those three people don’t show up before the orders? How do you do that over a big enough key space that’s just like a clear problem you immediately hit when you start using these systems? Yeah. Um, another one is like, imagine you’re doing, like, a checkout process where you’re taking in web events where, like, okay, in my shopping cart, John puts the item in, it, views an item, puts it in his cart, takes it out, puts another item in, and then checks out. If you start to change the order of those events and you say, well, the checkout comes before the view item, like your application will break and again, like very basic stuff, you can write a unit test for these things, but if you want to test it, like production volume, with all of your systems together, which you should, you immediately hit this problem and it’s harder to solve than it looks.
Eric Dodds 15:42
Yeah, can we dig in a little bit to time stamps specifically? Because, from a very practical standpoint, that’s super challenging. Because, if you like, let’s say you generate a set of data, right? Because, I mean, a very common way to do this, and something that we’ve done in the past is you just write a script, right? Like, okay, that doesn’t seem that hard, right? It actually can take a lot of work if you’re trying to do it, I would say, if you’re trying to do it properly, as it were, right, where you’re trying to represent the cardinality appropriately, where you’re trying to main do sequencing and all that sort of stuff, right? A lot of that work actually has to do with timestamps, right? And so the way that you generate data and the way you have to sequence timestamps, especially when strong ordering is actually very important for a downstream application or analytics use case. So let’s say you go through all the work to do that, right, and then it’s like, okay, I need to do this again and again, right? And like, it just is really annoying. It’s so annoying to, like, go back through and, like, recalculate all the time stamps and make changes, because you realize, oh, all the time, same stuff that I did like it. Even if you try to randomize stuff, it’s really hard to make it seem Riffel, That’s so dumb, but, like, that’s very hard, yeah, and the thing that you kind of want at the end of the day is something to sort of sit at the front of your architecture and at the front door just blast data through as if it were your real customer data, and have a set of knobs to be able to say, I don’t have to go down and feel like I’m programming like C or assembly, but I have these very high level parameters that let me say, what does This data look like, what are the non functional characteristics, and then have it act as Shadow traffic that, I mean, that’s the name to act as a shadow of your actual customer data. I think that with simulation testing is kind of the
John Wessel 17:32
right answer. I’m curious, a little bit like, want to dig in a little bit on architecture, because if you told me, like, here’s the problem, how would you solve it? My immediate would be to go to, like, Okay, let’s go to production data and then like, scrub it right? Like, let’s hash stuff. There’s PII. Let’s essentially scrub from production data. We haven’t talked about it, but I don’t sense that’s the way that you went about solving this. Maybe, I think there’s two
Michael Drogalis 17:55
ways you can build a product in this space. You could either do what you’re saying, which is to take existing data, use machine learning or some kind of procedure to basically reverse a safe copy of that data. Yeah, many companies that actually do this really well, particularly in the relational database space, where you have like, 1000s of tables, they’re very static. All you need to do is, like, find all the addresses and rip them out. Not a trivial problem, but like, that’s kind of its own thing. Yeah, yeah. And then there’s the approach that I took, which is to say, Okay, what if instead, you had basically a very high level language that let you describe what the data is, and you can do it directly. You could sort of bootstrap off a schema. You could use an LLM to help you write it. And that is the advantage of being able to say, Well, okay, we don’t have to modify anything, because this is fully fresh. But also it has the advantage that many times, the data doesn’t exist yet. If you’ve never been to production, there’s no production, right? Oh, wow, yeah.
John Wessel 18:43
It’s blank tables or blank fields or whatever. Yeah, exactly. And then you could sort
Michael Drogalis 18:46
of speculation about the future. You could say, well, let’s not just take a copy of production data. Let’s basically use similar characteristics, and then say, like, well, let’s triple the volume over time, or let’s make the traffic much spikier over time, or stuff like that. And so it’s probably a smaller market of the two, but I think in some ways, it’s the one that’s a little bit harder to solve, and maybe I’ve had a little bit of traction. Yeah,
Eric Dodds 19:06
One, this is a super practical question, how do you see your customers? So I’m guessing that the general workflow is that I set up in dev or QA, and then I point shadow traffic at it and generate this highly realistic stream that allows me to get as close to testing in production as I can, as far as the data goes, right? Are they? Because that’s a lot of data, like, sometimes you’ll test you’ll try to run tests with like a small data set, or like a small sample batch, or whatever, right? But if you’re testing a bunch of data at scale, it also creates this really interesting challenge of, well, a couple of challenges that come to mind. So one is the cost, and then two is the system that you’re sending it to downstream, right? Because you’re either provisioning something as part of your dev you probably don’t ultimately want the data in there. So are you just. Dropping it. Can you just explain kind of what the typical workflow for a shadow traffic customer is in terms of the environment, what they do with the data, all that stuff? Yeah, it
Michael Drogalis 20:09
kind of depends on intent, as you say. So if you’re like an engineering team, your goal may be to make sure that a bunch of systems integrate. And so you may have a smoke test where you basically, kind of run a minimal set of traffic through your system, but you want all components online to make sure that the scheme is working, that your web sockets are connecting well together. Utilization works really well in that very state. The same team may take that set of shadow traffic files and basically use these knobs that I described, it’s like a very high level DSL, and say, no, no, let’s crank up the volume. Let’s do a stress test. A good example, customer mine raft, published yesterday that they did 100 terabyte tests on their systems. They have to generate very low latency queries using historical and streaming data together, which is a tricky problem. And internally, they use shadow traffic for more minimal testing. For this particular case, they turned it way up. Generated 100 terabytes of data, 50 gigabytes of data a minute, and they were able to do sort of a short lived, somewhat more expensive by Imagine test, tear the holding down, be confident, checkpoint and move on. And so right, there’s just a set of use cases where it applies, and the fact that it’s parameterizable kind of helps people move from problem to problem.
Eric Dodds 21:11
Yeah, super interesting, yeah. I mean, expensive to test, but probably not relative to the cost of making a break interesting production at 100 terabyte scale, it’s,
Michael Drogalis 21:23
It’s worth it every time to do the testing before the customer finds the problem. Yeah, yeah,
John Wessel 21:27
for sure. So interesting. Like, like, moving further down this journey, like, how? So this makes a lot of sense, if I can control it. But like, what about testing with some kind of, like, external dependency, like API type thing is, and I don’t know the space that well, so I don’t know if you’re solving this problem or others are, like, is there a spot here where you could, like, kind of like, mimic an endpoint and then dial in, like, how you thought that end point would behave, almost like a fake external let’s say API, like, we don’t actually want to crush this stripe. Let’s call it stripe. We don’t want to crush stripes with, like, all this stuff. We just want to have, like, kind of a dummy hold in that will behave about like a stripe API have the same, like, rate limits, etc. Is that part of this scope of what? That’s a bit of
Michael Drogalis 22:13
an orthogonal problem. It reminds me. The company name escapes me, but there’s a company that basically, kind of mimics AWS services, where they give you a set of containers, or maybe they host the services, and then they kind of behave in a similar way for testing purposes, okay? And yeah, so the shadow traffic, it’s sort of meant to find itself in a place where things are as realistic as possible. Whether you kind of fake out the rest of your downstream systems is right, if you want to use test containers, that’s totally fine, right? But it’s meant to give you those degrees of freedom to make those choices. Yeah,
John Wessel 22:43
One thing you mentioned, and because I want to make sure we have plenty of time to talk about you being a solopreneur and what that journey has been like. But this is just such a fascinating problem. So one of the really interesting challenges, and this is, it sounds so funny, but it’s actually just very difficult to be creative enough to generate data that mimics reality, right? And I think part of that is because human behavior is a very complex thing generally. And I think you see that in streaming data specifically, or even sometimes system behavior, right? And so but to write a script that generates a bunch of data, you have to think in a pretty structured way, right, like, and enforce it like, a lot of like concepts around taxonomy and stuff. And so you have these two competing things. And so it makes it very difficult for a human to generate something that’s highly realistic. So how do you do that? At shadow traffic? And you even mentioned that there are some tools, like llms that can help you express what you’re trying to do. But how do you get close to the bullseye in terms of this feels like real production data? Yeah, reality helps, because people usually come to me, not when they’re just bored or just trying to do something new, like they have a problem to solve, where they’re like, Okay, our customer needs to do this. We have this schema. We may not have their data, but I know what their data looks like. And then the 8020 rule applies where it’s like, we need to get it good enough along these particular characteristics. And usually they could dial it in where it’s like, okay, problem solved. And they move on. So they’re not imagining all possible dimensions, right? But the other thing you mentioned is, if you really are starting from scratch, like many developer advocates would be, if you’re building demos to try to, like, promote your software. I have a custom train GPT, which is awesome. You could just say, Hey, I’m thinking about these domains. What kind of examples could you give me about data streams? They’ll give you some lists and say, write the shout traffic file for me. And then, like, it’s not perfect, but like, 90% of the time, it gives you a great baseline that you can go and pick up and just start moving. And it’s like, that’s a perfect marriage of AI and high level programming languages. You can use AI to be creative, and then take that thing that it generates, check it in the Git, share it with your team, modularize it, and go from there. Yeah, totally, yeah. It’s just like fast tracking it. I mean, you can even run tests and then say, Okay, let’s go in and tweak these, these things to get it the last 20% Exactly. Super interesting. Yeah. Okay, one last question before we dive into solopreneur stuff, why a DSL that’s always like an interesting choice, especially with a startup, right? Because there’s tons of different thoughts on this, right? But like a classic one is, I mean, a classic one in data and analytics is people trying to write a language that is like writing a DSL that will eventually replace SQL, right? And so there’s a lot of people who are like, that’s never gonna happen, right? But, and there’s all sorts of interesting tensions there in different philosophies. But would love to know why a DSL is a choice for shadow traffic.
Michael Drogalis 25:35
Yeah, when I say a DSL, what you actually program in is JSON, and the reason it’s advantageous is imagine you have this, like, super deep, nested, gnarly record, which is actually, like, probably more common for your listeners than not. You need some way to basically, kind of work with that without, like, juggling all these different inner attributes and seeing whether things line up. The 32nd explanation of shadow traffics API is you basically take a specimen of your data. You look at all the concrete values, all the strings, all the Booleans, all the integers, even all the inner collections that you want to change. You root them out, and then you put in these little function markers to say, what do I put here instead of this specific value? Now, if you were to build that another way, you were to build that with a programming language, you would have to do all that juggling. You would have to figure out what the infrastructure is. Do I need to? Do I need to? Do I need Maven? Do I need a JBM? What do I need to do to do this? I package all of that into a Docker container. So all you need is the editor to write JSON and then a Docker container, and it takes care of all the complexity of compiling, running, garbage collecting, efficiently, all that. Oh, cool,
Eric Dodds 26:37
yeah. So it’s Yeah, more a tool set for JSON interface, I guess, yeah, that’s probably more accurate. That’s right, yeah, yeah, very cool. Okay, man, that’s super interesting. I’m, I can’t wait. I didn’t get a chance to use it, but I totally want to go play with it now. Okay, John, you were, you wanted to dig into solopreneur so I have a million questions. Yeah, dominating the game. Yeah?
John Wessel 26:58
I think, yeah, I’m a solopreneur. Super interesting. You can go a couple directions here. But the thing that comes to mind first is Sam software engineer, I’ve worked on streaming solutions, like, like you have, or maybe another sector. It doesn’t really matter what kind of framework, like mental framework, do you have when you’re thinking about ideas? Because for a lot of us, it’s like, I could, I can have 10 ideas a day, but like, what do you have, like, a mental checklist or framework to decide, like, oh, like, I should pursue that a little bit like, what just walk us through kind of the thought process.
Michael Drogalis 27:29
It’s a little bit more like, what kind of lifestyle do you want to live? Like, you can think of really big problems, and you can go raise money and live that sort of life where you have to hire people and scale really fast. Or, in my case, you can try to find problems where that could be solved by one person or just a few people, and try to run maybe like a more, slower growth business, yeah, and so, I mean, my my thinking is I was at confluent, and my entrepreneurial drive, I tried to relax it like it just wouldn’t stop. This is where I learned about myself, like I am built to make and sell things, and I will be for the rest of my life. I can’t turn it off. And I tried to, I worked on this, like, new presentation tool idea on the side, it was, like a VC, fundable idea. It never really went anywhere, just because I wasn’t really comfortable with doing another investor back company. It just felt like this is the wrong time in life for me. I want to do something that’s a bit more lifestyle driven. Yeah. And so I, when I left confluent, I put up a blog post that said, I’m launching four startups in four quarters. And I basically outlined my thesis that, like, Hey, I have a list of 10 ideas. I’m going to burn through them once a quarter. I’m obviously not going to run four startups. I’m going to find one that works. And I went through a process for 12 weeks until I launched shadow traffic. And by week six, I was pretty confident that I had a winner, so I kind of called the whole thing off. But I just took the approach that, like, I’m just taking the approach that, like, I’m just gonna burn through ideas until it works, and I’m not gonna work on them for years. I’m gonna work on them for at max, 12 weeks. And that should be enough to tell me, right?
John Wessel 28:48
So, so tell us about, like, the process to get to the 10 ideas. And let’s say, let’s stay on the like, we’re not going VC back route. We’re gonna go like, solopreneur route or early bootstrapped, right? So, like, any process behind the 10 ideas. Or for you, it’s just like, well, I just kind of always have ideas in the back of my mind.
Michael Drogalis 29:06
I don’t usually have ideas. I sort of forced it. I stood in my backyard, it was like a summer day, and I was just like, Okay, well, I’m gonna do this. What the heck am I gonna do? I just started writing stuff that I observed over time. The first one that came to mind was like, well, everybody needs test data. Like, maybe I could do something with that. And then, okay, I had some other ideas that are maybe of lesser quality around, like, child care is really annoying in my particular area, and I can’t remember whatever ideas that I had, but I just, I just sort of forced it. I was, like, 10 ideas now, and that was helpful to just like, stepping in the creative mindset,
John Wessel 29:36
Yep, yeah, that makes sense. That’s interesting.
Eric Dodds 29:38
I met a really successful entrepreneur who was very similar. People would say, like, whoa, you’ve you just seem to, like, have these ideas that are great. And I was like, I met up with him for lunch, and I was like, I am interested. How, how have you come up with multiple very successful ideas? It’s so funny to think about. He’s like, I. Just do the ABC thing. And I was like, What do you mean? And he’s like, you just write down all the letters of the alphabet, and then you try to come up with, like, like a company idea or concept that starts with A and then B, and then C, and, like, he’s done that. Haven’t heard that before? That’s great, wow. But it’s the
Michael Drogalis 30:15
same thing. It’s like a forcing function, right to just, like, sort of get your wheels turning and think about problems. That’s really cool. I like that. So one thing that I’m really fascinated about is 12 weeks is an extremely short amount of time to sort of build and validate what were, what were your exit criteria for? Okay, I’m gonna focus on this right across all the ideas, like, what was the what needed to happen in 12 weeks for you to say, Okay, I found the one that I’m going to focus on, right? I mean, hopefully the winner, but at least the one that I’m confident enough to to give it my full focus. Yeah, it’s only tight if your problem I think is too big. So what I did was I said, Okay, number one on the list that I feel decent about test data for Kafka people often have trouble doing demos in the Kafka community, small problem, and so I opened my laptop, I wrote a social media post that said, I call it the $10,000 demo problem. That was like the title of the post, and it was, hey, you ever had to do like test data? I bet you it actually cost you $10,000 in your time for these reasons you had, maybe you have to do, like, related data, or that sequencing that I mentioned, or any of these other things. Yep, didn’t hint at all about the solution. I just wrote a post that I thought was interesting. Got a bunch of reactions, like traction. I got reactions too. I got comments. I got likes on LinkedIn, on Twitter. I went and I reached out to every single person. I was like, Hey, thanks for interacting with my thing. Can you tell me a little bit more about your experience with this problem? Any hard details? Just tell me about it. I started to hear some real use cases, which is like indicator number one, are people saying that’s cool, or are they saying, hey, that’s useful. And then here is my background with this specific problem in these very hard details about what happened. I started to hear that, and I was like, okay, good. Step number one. Next thing I did was I created a minimal landing page. Come with a name, shout traffic. Did it like a hero that basically sketched just like the beginnings of the solution. Had a CTA that was like, join the wait list, put you on like, kind of an email thing with me. Had maybe 100 people signed up, reached out to every single one of those, ran the same process, said, Hey, tell me about your experience with this problem. The details got even a little bit more that was really good. I felt like, okay, something’s happening here. During that process, I had two companies reach out that not only did they have the problem very critically, they had urgency about it. They had decision makers and they had a budget. And I was like, Okay, six weeks is very good. Got it. Yeah, that was enough for me to be, like, that’s enough of a check box for me to keep going. And had you started to build any product in that six week period, or were you still spending most of your time just doing validation by talking to people? I did a little bit because my sketch of the idea was, like, pretty loose, and so I had to fill in some gaps for like, well, how will this work? Or what would this do? But it was mostly, I mean, I did, I can’t really go look it up. It was like, 60 customer calls in a couple two months, or something like that. I did it pretty hard, and all of those conversations, they just shaped what I eventually built. But once I had those two customers that were like, Yeah, we want to pay for this if you complete it, that was go time. I just went heads down to the keyboard and just banged out exactly what they needed, and that was the beginnings of a real product.
Eric Dodds 33:23
Yeah, wow. What was it? That’s a pretty hard swing, right? So you’re talking with one or more people every day for two months, and then you’re processing all of that. You’re trying to co lay all of the different patterns that you’re seeing across all these conversations, and then you just go heads down and, like, build a product. Is that? I mean, that’s a pretty crazy swing. Did you enjoy that? I mean, what was that experience like? I love
Michael Drogalis 33:51
Eric Dodds 34:20
Yeah, I love that one question I have. And this is just going to be a totally selfish question from one product person to another product person. Well, I guess, like, CEO CTO, Chief Product Officer, Chief Marketing Officer, all of those things, yeah, a lot of titles in the early stages, like getting that feedback, building those direct solutions. Have you come across a situation yet where a customer asks for something and you have sort of a vision for the product, where you say, I’m actually not going to do that, or you push back because their specific need doesn’t necessarily reflect. The larger picture of what you want to build. Yeah,
Michael Drogalis 35:02
that totally comes up once in a while, and to connect to earlier in our conversation, I had a few people who were like, hey, it would be great if you were to take my production data and automatically do this for me. No, LLM, you just have this black box that snapshots all my data and outputs it. I could eventually build that, but I feel like, okay, that’s something I want to kind of come into over time, maybe take some VC funding to go after another thing people constantly temp me with is like, hey, this would be great if you did like unstructured text for AI. I always resist that, because nine out of 10 times they don’t have a real use case behind it. It’s just like I could sense those check boxes aren’t there to really get, really build a product that people are going to use sustainably. So yeah, sometimes you just have to decline it. It’s
Eric Dodds 35:39
tough, but it’s true. Yeah, yeah. Now that that makes total sense, talk about the ingredients to be a we’ll just, we’ll be near the scope to solopreneur, but I you could probably extend it to entrepreneur as well, right? But we’re software developers. You are a product manager. Not everyone can make the transition from being an I see, definitely, or even a manager to actually starting a company and building a company. Can you just talk a little bit about that? What do you think some of the ingredients are that you’ve noticed in your own experience where, and we’re just thinking about those listeners who not everyone’s designed to be an entrepreneur, and that’s totally okay. But I’m thinking about those people who are listening to this, maybe on their commute to work or on their way home, and they’re they’ve had that itch inside of them just wondering, could I do that? Like, is it possible for me to do that? So speak to that experience and speak to that person around like, what do they need to hear to push them over the edge? I guess, I bet
Michael Drogalis 36:42
That’s like nine out of 10 of your listeners. Since I started this, I’ve had so many people reach out to me that were like, I would love to quit my job and just pursue an idea that I feel is important. But I think the first thing is to just, like, look at it objectively and say, Okay, if I want to do this, there is a much larger range of skills that I need to develop to be good at this. And I was not born with them. And I say that like me, like I learned to code. That took me many years. It’s the hardest skill I’ve ever developed. But then as soon as I did that, like if I wanted to build a company, I needed to learn how to do marketing and how to have a sales conversation, how to build pipelines and how to treat customers during customer service. You just, you really have to work at it. And what’s so hard mindset wise, is that when you do this and you start a company, everything is so scary because there’s so much uncertainty. And the thing you’re always going to want to do is to go back to what’s comfortable. I’m going to code because I’m good at coding, but the trap is nine out of 10 times in the beginning what you need to be doing is not coding, but probably sales and marketing and working with customers. And it’s just hard to be that uncomfortable all the time. It’s very frustrating. You can push through it, and you can actively get mentors to help you and study. You really can do this. It’s not impossible. Yeah,
Eric Dodds 37:51
I love that. I love in many ways how simple that is is encouraging, just to hear you work at it right, like you can actually do those things. What? I want to extend the question a little bit and get super, super practical. There are a lot of administrative components to this, right? So you have to set up a business, right? You, and there are a lot of things that go into that, right you. I mean, even you have to set up bank accounts, right? None of this is rocket science. But again, for someone who has never done that, how do I structure my organization? All of that, was there anything you learned in that process? Did you use any tools like stripes, palace or anything to sort of accelerate that process for yourself? Anything you can share with people?
Michael Drogalis 38:34
Yeah. I mean, so first of all, this is my second time around, so I made a lot of mistakes. Yes, paying taxes and bank accounts. And, like, when we raised our first money, I didn’t know that if you raise a bunch of money, you should probably put it in a place that bears interest and not just let it sit in a checking account. Yeah, things like that. I mean, who’s gonna tell you that? But, I mean, yeah, this time around, like, there’s all the drudgery to get through in the beginning around, like, legal registration, but like, so many people have done all this before you, and so you can look up the answers on the internet, just try to figure out what the right questions to ask are. And then I use a whole bunch of different tools. I don’t use Atlas, I use stripes. I use Calendly to do efficient scheduling. I use obsidian to do a lot of my tracking. You just kind of have to find a system that lets you settle into a routine for how do I manage my sales pipeline? How do I know when to do outreach? How do I know how and when to build marketing content? How can I check the performance? You just kind of build up these little tools, and they don’t all cost money. They mostly are free. You just have to figure out what works for you over time.
Eric Dodds 39:33
Yeah, totally. John, any questions from your end? I know I’ve been dominating this whole conversation.
John Wessel 39:38
Yeah. I mean, it’s such and there’s so many ways to go with this, I think. And I’m gonna, like, stick on the solopreneur topic, because I think that’s such an interesting one. So who were they? Eric was just asking about like, the basics, like, and I’ve actually done this over the last two years, and have found like, like, like, you’re saying like, most of it is Google-able. Right? And in fact, all of it is as far as, like, doing the basics, and then I think, like, from there, I very much identify with that, like, everyone has their comfort space. And especially if you’re coming from a technical background that’s probably building a product or software engineering piece or whatever, very much identify with like, get, like, pushing into like, hey, like, you got to do marketing and sales, right? Like, you’re not comfortable doing marketing sales, like, like, do it, you know, like, you got to do it. And like you said, you’ve got, you’ve got you, like, having mentors, having people come in that are like professionals in marketing, sales, I think is helpful. But I guess, like, another spin on this is like, what’s a really practical thing? So say, like, It’s Thursday today, and you’re like, like, I want to build. I need to do marketing and sales. Like, walk me through just like a Thursday of like, what does that look like? Like? I mean, like, internally, talk to yourself, essentially, of like, all right, I want to spend, I want to build, I want to add that cool new feature. But I know I need to work on marketing and sales. I think
Michael Drogalis 41:00
it comes down to this mindset of expected value, which is, like, if you looked at it objectively, you take yourself out of it, you could look at the situation and say, what is the probabilistically like, highest chance of something that I do like, what is the thing that’s going to move the ball forward and get customers? Like, if your goal is to just have fun and like, look cool, yeah, you could code, but if your goal is to actually get customers and build a company that can sustain your lifestyle, then the right expected value thing, if you have no customers or you want more customers, is to go make more people aware of you. And for me, like believing that is enough to be like, Okay, I should go work on that. And then I think once you see it start to work a little bit, you just believe and you’re like, Okay, there’s for me. Thursday is marketing day. After I hang up this call, I’m gonna go work on my marketing content for the week. And I like doing that, because I know that results in people who become customers and pay me money and enjoy my software and say nice things. And I can’t wait to get there, and so I’m motivated to do it. Yeah, well, and you actually just slipped in something that I think is super important. I think you just said that, like, oh, I have a pre-defined type where I do that. And like, that’s what I should be doing. Is marketing. I think that’s actually, like, can be a major thing too.
John Wessel 42:08
Yeah, yeah, it’s time boxing. Like, because you have so many different roles and like, Hey, if you’re just gonna, like, mix them in, like, an alternate every 15 minutes, that’s a nightmare, right? So, like, yeah. So, even having a time box of, like, cool, like, I’m marketing, or I’m, like, talking to customers, or I’m whatever, and try to, like, switch, like, time box it and then switch hats, like, I imagine that. You know, that’s helpful too. Yeah, it’s
Michael Drogalis 42:31
a great point. And you, you can’t sort of leave the week to its own devices and say, like, Oh, I hope I do all the right things. You have some way of being accountable for saying, like, what did I work on sales? Enough? Did I work on marketing enough? Did I work on engineering enough? And then you sort of balance that with, like, the macro things that are going on, like, if all my customers are coming in and they have things that, like, they need immediately, yeah, I’m gonna put marketing on pause. I’m gonna do, like, rerun posts or whatever, just something minimal, keep it afloat. And so you need to just play this game of balance. But as you say, having some system to keep you honest is really important.
John Wessel 43:01
Yeah, that’s so interesting. One, one other quick thing, I’ll ask so you’re the CO, like, original startup co founders, I think you said there was, like, four of you, then obviously confluent, like, near the end of, like, fairly large company. What type of additional efficiency do you think you have as a solopreneur versus working even with a small team, because of the communication problem, right? Is this exponential problem as you add people to a company, and you essentially have that zero amount of that problem, because it’s just you communicating with yourself. So what do you what’s the event? I think there’s an advantage there. Like, how would you think about that advantage? How would you quantify that advantage that maybe you have, since you can do it all. I
Michael Drogalis 43:40
I think it lets you be more objective about what you’re doing. I think even if you’re on a small team, and especially if you’re a big company, good things can be happening across the company that basically it’s a rising tide for everyone, like we closed the big deal, okay, that makes me feel about good about whatever I’m doing over here, and it’s just you. I mean, like your ego just can’t be in the way. If your goal is to make money, do things that make you money, and don’t do things that don’t make you money. And all of your rewards are your own, and all of your failures are your own as well. And so I think it puts you in this extremely fast learning loop. And it’s true, like the trade off is, I can’t solve problems as big as a 10 person or a 1000 person company can, right? But I get to learn a whole lot faster. So if I want to stay with what I’m doing I’m getting better at and if I eventually want to pivot back to, like, a bigger company, I could take all these learnings, and I probably accelerated my career in the last year by like five fold, because I’m solving so many more problems faster. It’s just like an accelerator in a certain way. Sure, I was just thinking about solarpreneur board meetings. Yeah. Those are called, those are called, those are called sleepless nights, wondering if I’m doing the right thing. Yeah, but,
John Wessel 44:45
I mean, it is funny though, because if you think through, I mean, Eric, you think through your day, or even myself, with a very small team, like, there’s every time you add somebody, like, there’s an extra layer of of communication, and then every and then, if you raise money, and then you have that. Layer of communication. And then, you know, if you have a board like, it adds so many different stacked layers that for what you’re doing, they’re just not there. Yeah, and it lets
Michael Drogalis 45:11
you stay extremely customer focused in the beginning, like, I work with a bunch of other companies advising or just mentoring, and they’re all very focused on raising money and kind of doing all the things to get the company going. I think there’s a huge advantage just starting incredibly simply and saying, like, have I nailed the customer problem and signed customers for one or two, or maybe even three, and then goes raise money. Because you just feel like you just have such a straighter path, where if the investors aren’t aligned with you, leave them behind you you know what you’re doing exactly on the right way to go. Yeah,
Eric Dodds 45:39
yeah. I agree with that 100% and I think that the maximizing for velocity as much as possible in the early, in those, you know, in the early part of a company where, you know, I love the idea of the solopreneur, I didn’t raise money for this, you don’t have a choice other than to get to the pain point as quickly as you possibly can, Right, and then solve it as quickly as you possibly can right. If you want people to give you their money, yeah, exactly.
Michael Drogalis 46:05
And then if you manage to do it, you’re in an awesome position, like, I’m 18 months in. I’m at six figures, arr. I can go raise money now on great terms. I can, yeah, take whichever direction I want, because I was really patient. I endured a whole bunch of pain, and they continue with that pain. But it just gives you more options. If you figured out more of your the space on your own, you can decide what you want to
Eric Dodds 46:24
Michael Drogalis 46:27
I know we’re really close to the end here, but my last question Will will be around what you just said. So do you have a dream for shadow traffic in terms of, I want to sell it to another company or raise money for it, or is it? You just want to keep solving pain points and doing that in a way that people pay money, and you’ll see what happens. I love not deciding it’s really fun, but the only thing that’s like true for me is I’m just going to do it until it’s not fun anymore, and then I think I’ve gotten enough customers to a place where I feel like it’s sellable, both for the product and for the ideas that I’ve pioneered, or I could open source set or whatever, but it’s really fun just not deciding, just really doing this for myself, and my goal at this point is to just build it into a long term business and keep going until it’s not fun anymore, but today it’s still fun.
Eric Dodds 47:14
Man, that’s so great. I hope that’s so encouraging to any of our listeners who are just worried about the potential of jumping out on their own. But man, what an encouraging note to end on that, you know, it is painful and difficult, but it’s also really fun, and that we trade this year for anything at all, like ever. I love it. Awesome. I love it. Michael, thank you so much for joining us on the show. I learned so many lessons. You reminded me of so many good things, just about the value of diving in, facing your fears, putting hard work in, and really enjoying what you do. And we got to nerd out on streaming data, which is always a bonus.
Michael Drogalis 47:56
Thank you for having me. It was a lot of fun.
Eric Dodds 47:59
The Data Stack Show is brought to you by RudderStack, the warehouse native customer data platform. RudderStack is purpose built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.