This week on The Data Stack Show, Eric and Kostas chat with Pardis Noorzad, the CEO of General Folders. During the episode, Pardis Noorzad shares her journey as a data scientist and entrepreneur in founding a company. The conversation includes the importance of data collaboration and sharing, the challenges and complexities of data sharing in various industries, the need for efficient and secure solutions, the underlying definitions and dimensions of the data exchange problem, including infrastructure, security, economics, and user needs, and more.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. Welcome back to The Data Stack Show, Kostas. I’m so excited. We are talking with Pardis, who has been part of data science. It seems that some really big companies, such as big, huge retail providers in India, Twitter and the social networking space. And she recently started her own company in the data collaboration space. And I am really excited because I don’t think we’ve covered data collaboration or data sharing, which is sort of, you know, some sort of data transaction between two companies in detail. And so I actually want to talk with parties about how big that world is. I’ve done some of that in the past, actually, you know, of course, have gotten weird with advertising data, and marketing. You know, but that’s just one slice of it. And I think this is actually a much, much larger footprint of companies doing it, then we probably even think and so that’s what I’m going to ask, how about you?
Kostas Pardalis 01:37
Do Eric, like, from what I understand, when we click that button that says do not show my data? It’s like they should say do not sell my data to Eric specifically, right. Like, that’s exactly what was happening. You are that person like?
Eric Dodds 01:51
Yeah, I was in a former life. Yeah. Former life
Kostas Pardalis 01:57
was now like yachting, you know, like the path of light.
Eric Dodds 02:01
That’s exactly it. Light. Yes. Yes. I don’t know. Maybe I’ll have to pay for those sins one day, getting weird. Data rarely leads to getting out.
Kostas Pardalis 02:15
Yeah, I can guarantee you are reformed. Like you’re, you know, you’re like a good citizen today. Yeah. Not tough anymore. Like, yes. Okay. So yeah, I mean, first of all, I’m also very excited that we have Barbies on the show. I know, she was like, for a while now. And she seems like an amazing person outside of like, okay, like, the problems or like the technical solutions that she’s going after. She also had a very interesting journey. Like with starting a company. And I’d love to learn more about that. I’d love to hear about your experience. And what nature does, and we’ll take it from there. And then of course, we are going to also ask more technical questions and more like product related questions. Indeed,
Eric Dodds 03:09
Well, let’s dig in with parties. Yeah, let’s do it. Parties. Welcome to The Data Stack Show. We are so excited to chat with you.
Pardis Noorzad 03:16
Thank you, Eric. Thanks for having me on the show. I’m excited to be here.
Eric Dodds 03:21
Very cool. Well, let’s start where we always do. So give us your background. How did you get into data? Did you start working on data in school after school and sort of what’s your journey been? Like?
Pardis Noorzad 03:32
Definitely. So I did my undergrad in software engineering at the time, you know, in high school, and an elementary school kind of, I was super interested about, you know, how the Internet worked. And I thought it was super fascinating. And, you know, I wanted a degree like our program where I could learn more about how, you know, computers are connected to each other, and you could share information in that way, and so it wasn’t a difficult decision. And then in the ECE program that I was in, we also had this robotics, kind of, you know, team, I would always pass by the room where they were testing out these robots and so started getting interested in have aI smart software, and things like that. And so I thought the evolution of like, just regular software is, you know, smart software. And so I started reading up on AI, took some courses, started with reinforcement learning, and then actually did a master’s in AI where I learned more about more things and deep learning and just machine learning algorithms and stuff like that. And then from there As I moved into a, you know, as I was getting more interested in ML, and I’m trying to you guys, we were comparing these various methods to train models. I got into a lot of math and I thought, wow, would it, wouldn’t it be nice to have a math degree. And I wasn’t really ready to commit to a PhD at that point. But I started to apply to a couple of master’s programs in math, and applied math. And I got into this program and started doing some of what I wanted to do, which was work on some of the theoretical foundations of these ML models. But then I met this professor, and he told me all about his work on social networks and graph theory, and kind of changed what I was working on and got into a lot of that, which was super fascinating. And at the same time, I started taking some courses in the MBA department, my school was super entrepreneurial, and I knew how to push everyone to go and, you know, start a company. And that’s where the, you know, the world is going. And that’s how you should get your job, and things like that. And so it was awesome. And I started taking some MBA courses. And in the MBA department, there was this magazine that I picked up one day, a marketing magazine, and they talked about your area of interest.
Eric Dodds 06:37
I was gonna say, we’re getting into very trepidation waters here, but I like it. Let’s keep going.
Pardis Noorzad 06:43
Yeah. And they were, they did a profile on Hillary Mason, and, you know, the work, she was doing a bit League, I then went on a Google search, okay. And then saw some of her videos talking about writing, dupe jobs and things like that. And I was like, This is so cool. You know, it’s kind of, you know, all the things that I’ve learned kind of coming into one place in this kind of type of job, until I started looking for that sort of thing. And luckily, I found this really early stage startup, they were looking for their first data person. And they were doing retail analytics and building a new, AI platform analytics platform for retailers online and offline. And I got to work on super interesting problems. And I would take the rest as history. But I noticed the path to data science.
Eric Dodds 07:44
Very cool. Now, I have a question. So I want to talk about what you’re doing today. But let’s just take a brief detour. So you are doing data science, you know, sort of in the retail space, and then later in the social networking space. Before it was as hot of a news item, as it is today. Right. And, of course, AI is sort of the, you know, the LLM ‘s are creating a huge amount of news. You know, and everyone’s talking about it. But back then to your point, it was data science, it was machine learning, right? I mean, maybe some people call it AI. Can you just talk about, you know, you worked in it for years, sort of before it was as crazy as it is now, before it hit fever pitch? And so can you just provide us some perspective on that? You know, has it changed that much, actually? Or is it just sort of the next manifestation of things that have been happening for a long time?
Pardis Noorzad 08:51
Oh, totally. I think it was around 2010. And maybe a little earlier, when I started to, you know, get into AI and things like that, I would say it was already kind of becoming a very, you know, interesting topic. A lot of people were talking about it, there were professors at the University of Alberta, at the University of Montreal. And then, of course, Stanford and Berkeley, a lot, and MIT, a lot of interesting stuff happening. And I was following all of these kinds of professors at these places. And following the research, there’s student research. And so I would say even at that time, it was pretty hot and you know, deep networks were gaining a lot of attention. Sparse learning at the time was super, you know, all the papers in this area would get 1000s of references or references and things like that. So I would say, even at that time, I was kind of following the crowd, to some extent, didn’t get too much credit. But in terms of how it has changed, I would say it’s definitely much more part of the public kind of, you know, conversation today than it was in that time. At that time, you know, university was a huge deal. But it may be outside you need to go through, we’ll talk about in more detail, whereas right now, you know, you might go out with some friends and they’re in other industries, and then they will ask, Hey, you know, tell me more about cat GPT? And like, how does it work? And things like that? So, definitely more public conversation today?
Eric Dodds 10:47
Yeah, for sure. Yeah. That’s a helpful perspective, because I agree, I think a lot of times new cycles can make things feel very new. But these are things that have been around for a long time. It’s just that now maybe, like, your mom is texting you. Have you heard of chat GT with you know, and that’s, that just makes it feel a little bit closer to home.
Pardis Noorzad 11:16
Very Sure, yeah. And I noticed, you know, this kind of textual interface, the ability of, you know, reaching more people making it more accessible, has really helped, you know, everyone kind of feel the magic of AI.
Eric Dodds 11:32
Right, as opposed to like, having to have a pretty significant amount of domain knowledge in order to see the magic. Totally.
Pardis Noorzad 11:41
And, you know, like, Google has been doing this forever, you know, it’s search and things like that. But, you know, it’s really, a has now that people are really feeling the magic or things like that, which is really interesting. And nice. I mean, I’m definitely happy to see AI, you know, being talked about all the time.
Eric Dodds 12:11
I love it. Well, thank you for indulging my little sort of his historical AI interlude. What are you doing today? So you’ve done a huge amount of work at, you know, at multiple different types of companies working on data science, AI stuff. But you recently founded a company, which congratulations are very exciting.
Pardis Noorzad 12:36
Thank you. Yeah, I appreciate that. So I started general folders. Because, you know, I have seen this recurring problem in every single job I’ve had. And up until I would say, my last job, I didn’t really, you know, think that this should really be a tool like Kumbaya, because up until that point, I wasn’t responsible for buying and making informed decisions. And so at my last job, it was really my responsibility to think about, okay, what how can we make our team more efficient? How can we make our data more secure? How can we drive down the cost of managing this infra you know, and build things faster, and things like that with, you know, decision that we make in terms of what we buy, what we decide to build, and invest in how we collaborate with engineering, and DevOps, and some of the other teams to like, build these things and connect these things together. And so I think that was an interesting role to have, because now I could see that, okay, this is something that needs to turn into a company. And so I was really looking for people looking for ideas, to tell them, hey, you know, this is something I want, I don’t see it on the market. Can you build it? And at that point, I actually, when I left the company, I had a couple of ideas for things to build. And this seemed like something where, you know, you’re managing a company, other companies’ data, it seems like okay, initially, it’s a harder thing to build. There’s more kinds of things that you need to pay attention to and more info that you need to build to make something that works well. But I thought out of all the other things, this is something where I truly believe that the, you know, need exists, and it’s something that I think of myself and later as a team When we can sell better. And so that’s why I kind of, you know, started building this and started talking to a lot of people from various industries, just to make sure that this is not just something that I have been seeing all the time, but people are seeing this across various industries and in various roles and things like that. And so yeah, I can talk about that problem as well.
Eric Dodds 15:31
Yeah, well, looks, I’d love to zoom. Well, actually, why don’t we do this, in just a couple of sentences describe what general folders does are sort of like the mission of the company, just a sort of level set, like, at the simplest level,
Pardis Noorzad 15:47
totally, I would start at the highest level general folders is to be a tool to make business collaboration easy and secure. One very important aspect of business collaboration and partnerships is data transfer and data collaboration. And so every time you sign a contract with another company, there’s always an aspect of data sharing that happens. And you notice this, when you become responsible for all the data for the company, and you’re like, oh, wow, every time we signed a contract, there was this aspect of it that we needed to pay attention to. And so at the highest level, this is what general folders made easy and secure. That’s the mission of the company. A lot of let’s
Eric Dodds 16:37
talk about data collaboration, or data sharing, maybe could be a term or data transfer, there’s probably a bunch of different, you know, modes in which this actually occurs. You know, sort of one directional bi directional. But how but I just wanted to talk about I don’t even cost us have we even talked about this on the show, maybe Brooks has an encyclopedic knowledge of the show cut off. But I don’t know if we’ve ever actually talked about sort of two businesses sharing data and dug into that on the show, so I’d love to just talk about how big of a footprint does that have among businesses? Because I think, you know, I have done some of this in past roles. And it’s a way larger world than I ever would have imagined. You know, it’s almost like sort of looking at, you know, the ocean and then putting your head, you know, under and saying how big the coral reef is, it’s like, well, there’s way more here than I ever would have thought happening under the surface, right. And there’s all sorts of crazy ways that companies share data.
Pardis Noorzad 17:50
Absolutely. And, you know, part of the reason why I think when you look online, it’s hard to say, Okay, how big is this market? Because a lot of the work that happens, I think one of the investors I was talking to kind of, you know, had a word for this said, this is a gray area. And what that means is that a lot of the work that happens in this area is kind of, you know, do it yourself. So companies just, you know, the moment they see a problem like this, they start building some ad hoc solutions. A lot of times as a company, you might not even know how big of a, you know, how big this is going to become, what type of headache is it going to cause you down the line. And it was the same for us. Like, you know, we started working with Alessia hospital, and started sending data initially. And then this problem became just that they had more requests, they, you know, the type of contract we had with them changed over time. Now, we needed to change the pipeline. We didn’t have really, for some of these places, we didn’t have a bigger team to work with. So there was only one person they weren’t necessarily well versed in, like, helping us manage these pipelines or things like that. We’re just, you know, completely. Okay. But it was causing so many problems for us to fulfill what we had promised to in these contracts. And so, yeah, I think, a bit of a challenge in actually, you know, calculating the size of the problem is, most people don’t even know how big of a problem isn’t even inside their own company. And only realize it when you have a data team that’s responsible for all the data in my company, when you realize, wow, you know, there’s so much of this type of activity happening.
Eric Dodds 19:55
Yep. And so you mentioned one example. which let’s just say, you know, you’re sort of performing some sort of healthcare service or you’re a software provider that provides some sort of software in a healthcare context that needs to be sort of rolled up into a larger hospital system, maybe it’s a subsidiary, whatever the context is, right. And so you have some sort of data that needs to get, you know, rolled up into a larger entity and shared or something. So that could be one context. Another context that comes to mind is, let’s say you’re like a payment processor, that processes transactions or some sort of financial information on behalf of sort of maybe like an end user facing application, right, where users you’re interacting with it, but then you need some sort of back end service provider. Right? Huge. That’s, I mean, a huge market for that, right. And so you have data transfer between two businesses in that context. But what are just a couple of others, because I think one thing that’s really interesting to me about the nature of this problem is that it’s so varied. I mean, we’ve talked about healthcare data, and we’ve talked about financial data, right? I mean, to crazy realms in and of themselves. But, you know, I think there are probably hundreds more,
Pardis Noorzad 21:13
definitely. So those two are super important, especially because the data in this space is very regulated. Security and privacy are extremely important in those two spaces. So definitely way more here and, you know, more modern approaches for data transfer, absolutely needed in those spaces. However, there’s just so much happening elsewhere, where another interesting one that I’ve encountered is, you know, we were working with a company that would provide data for lenti wanting to find out where to build the next clinic. So we needed data on all the clinics in a certain city or state or country. And, you know, and the types of demographics in those areas. And so, you go to a data marketplace and want to buy data, you want to explore that data. And finally, you want to transfer that data into your warehouse, none of those steps are really high, you know, there’s not really a great tool for those steps, of course, and even the past year, and in recent years, a lot of major cloud service providers have been providing really good kind of services in this area. But you know, we for sure, have not really had that, like, even in the in my last role, where we were seeing, you know, a lot of Excel sheets getting sent via email and some of the procurement kind of process for us getting to that data was just took forever, you know, many months. So that’s one. Another one that I’m particularly interested in, is working with a lot of AI companies. I really like to work with these people because I feel like, you know, we there’s a lot to talk about. And I also can get a lot, especially in these earlier stages of a company, have got their help in evolving the product as well. But AI companies kind of need their customers’ data to kind of build the models. And usually, you know, they all have various ways. If you go on their website, they have different ways that they allow for companies to, you know, deploy. And, but probably a lot of them will prefer SaaS deployment, you know, forever your software that was deployed with this approach is, you know, more efficient, it’s easier, but it’s especially for AI and kind of some of the because you you want hardware optimization, there’s that aspect to it as well, where it kind of makes some sense to like bring the data to their own kind of warehouse and where they are building and creating these models. And so they want to move data to their warehouse. And it’s extremely important for them for these pipelines to be reliable. So what they do is they kind of end up managing these end to end pipelines asking for their customers credentials connecting to their warehouse and bringing the data in. And so what we want to be able to do is to tell these professionals, hey, we can manage these pipelines for you, we can monitor it. We’re gonna you know, no matter what your stack or what your customers’ stack will be, we’ll be able to offer this stack agnostic solution. and monitoring would be, and also add on some additional kinds of features that I think are extremely important, which is like data validation on both sides. You know, so many times customers will mistakenly send PII or bhi that, you know, they weren’t supposed to send. And these are like really simple kinds of validation steps that you can do to just avoid any of that sort of activity happening between companies.
Eric Dodds 25:35
I love it. Okay. I have a question. And this is actually for you. And for Costas okay. And I’m going to ask this question by sort of giving you three scenarios of data sharing. And then I’ll spend and I’ll pose the question. So this is a great question. I love this. We don’t do this enough. Cost is great. So scenario number one, and let’s file this under sort of, like, primarily infrastructure. So you have, let’s use the example of a company, you know, maybe two companies have a partnership, or, you know, maybe it’s a company who has a customer, and they need to transfer data to that customer. Right. And it’s primarily an infrastructure question, right. Is this a situation where it’s a pipeline question? I think parties you said that well, right. It’s, I mean, that’s a question of pipelines, you know, do we just give you credit to our database? Like, hopefully not? So okay, so someone’s running pipelines, which means that I have to send data, you need to receive that data. And it creates a host with infrastructure issues around scheduling, pipelines, differences in infrastructure validation, and all that sort of stuff, right? Yes. But you’re, you know, you’re my customer. And so it’s really just a question of managing infrastructure. And how do I get the status view, right? In the middle, let’s talk about maybe a clean room or the concept of a clean room, which is where two companies have data that they want to share, but they, you know, there’s PII or PhI, there’s some sort of benefit to gain between sort of joining some aspects of this dataset, but it needs to happen in a sort of agnostic environment where, you know, it’s impossible for either side to sort of get information that they shouldn’t have, right? Let’s call that the security sort of category, right? So we have pure infrastructure, how do I get the data to you? We have the security side, which is like we need to share, but like security, sort of our top priority. And then this third scenario is what I’m going to call like, trance, like finding our contractual exchange, right. And the example here is something that I did, a lot of, which is really crazy to think about that this is in the advertising space. And so there were certain audiences that these companies had that I wanted to advertise to. But they didn’t want to send them through a data brokerage, like a traditional sort of cookie or data brokerage, or whatever. And so I literally just sent them a check and said, you know, send me some screenshots or a CSV of like, you know, the ad performance or whatever, right, sort of the most primitive, like, really there we’re talking about, maybe we can call this one sort of like economic exchange, you know, like, transacting, and we, I’m exchanging money, and it just happens. So happens that sort of there’s sort of like data, you know, that sort of governing the terms of the transaction or whatever. Maybe I’m being long winded. But what I want to ask you in Costas is which of those categories, infrastructure security, or economics is the primary underlying definition of this problem? Right? Because, like you said, early on parties, like there’s some sort of contract that governs data exchange, whether it’s official or not, right. And infrastructure and security, I guess, maybe a question to be like, are infrastructure and security ways to describe some sort of spoken or unspoken contract? Or is it primarily like a security problem or primarily an infrastructure problem?
Pardis Noorzad 29:31
I would say it’s primarily on the infrastructure side, you know, you have a lot of tools, retire right now, where and improving every day, and you’re creating stack agnostic data transfer between every two sources, whether it be kind of just batch data replication, whether it be streaming, and a lot of these problems are actively being worked on. by a lot of great companies. And so I, I think the fact that, you know, in 2020, when I was working on this problem at my company, I still didn’t have a tool to help me was because of that kind of credential management layer where, because we’re leaving the boundaries of one company. Now, it’s not just an infra problem, it’s a kind of, you know, how do we manage the security for two parties problem, that was one, and the other is maybe higher up that stack, which is kind of on the application layer side, which is problems of, you know, validation, but then also kind of cost accounting? Because now when you have two companies involved, right, who’s paying for egress, who’s paying for the pipeline, who’s paying for any computer that maybe either party’s gonna occur through transformations that they want?
Eric Dodds 31:09
Like, the cost of the transaction?
Pardis Noorzad 31:12
Yes, and the cost of the transaction. So there’s all of these bits and pieces that need to be accounted for. And it’s not always clear who the motivated party is. And so that’s a complexity that’s added when there’s two parties involved, or two or more, right. And so you kind of want a thing that will help you kind of, you know, split the check, like, and how to do it and be have some flexibility there wearing or not really your, I think, who should pay? And so yeah, so definitely a security problem on one side, and then some little bits and pieces on the application side, where it’s around costs and data validation and things like that.
Kostas Pardalis 32:06
Yeah. I totally agree with all that, I would have probably liked one more dimension to what both of you have said, I think it’s primarily, like the problem is like, primarily driven by market conditions. And it’s primarily an economics issue. And what do I mean by that? Okay, is like, it’s not like today, it’s like the first time in history that we have to share data, right? Like, we literally have been doing that since the inception of the internet, we had protocols, we had FTP, when we made this FTP, because we wanted it to be secure. And security was always an issue, right? And at the end, okay, if you want to be super, super secure, you can always, you know, mail home heartbeats to the other person, right? So ye mean, but like market conditions and like, like in economics problem is that when, let’s say in the maturity of the markets, the problem of like, the need to exchange data, okay becomes important enough that it’s actually every day around like the Trump, that transaction becomes, like important to be figured out. Right. And today, for many reasons that we can talk about, obviously, like AI is a catalyst for that, like there is the need for companies, more companies today, to share data with other parties, right as part of like, the way that they grow and like the things that they’re doing. So when we formalize that, like we can’t ask, it’s not like, you know, a problem anymore, that we can just be scrappy, and everything is fine. Mike, well, I hear both of you like saying all these times that you had this problem for, like forever, but at the same time, it wasn’t that important of a problem for the whole market to try and find the solution and build businesses on top of that and try like to optimize it as much as it’s possible. Right. So I think, while I find it very fascinating about today, that we reach that critical point where the markets demand from our industry to go out there and find the solutions and turn these into like, an actual, like, scalable product. Right. So that’s my contribution to your question. I don’t know if I helped you make you any wiser or less wiser. But
Eric Dodds 34:40
no, it’s interesting. I think it was, I mean, it was certainly a little bit of a loaded question. But it’s fascinating to think about the friction in transactions, right? And it really sort of crosses all three of those vectors. And I didn’t even think about, you know, it’s like, well, we’re paying for computers right now. And so how much of that are we responsible for? And you know, responsible for? And how much are you responsible for? And so the splitting the check analogy, but yeah, it’s a very interesting multi dimensional problem. And it seems like the transaction friction and the security piece are really driving a lot of that, right. Like, you know, the security breaches today are far more costly than they’ve ever been. And then, of course, both of these AIs are also driving a ton of it. So yeah, it’s super fascinating to sort of be at the you know, at a critical point for that. So, yeah, thank you, that was very helpful. I feel much more educated,
Kostas Pardalis 35:47
you are welcome both why we are here, like both me and polish, right, like, educate you. All right. But this, I have a question. So I was hearing, like, all this time, like talking about, like, the problems that you are excited about, and actually, like, super excited, excited to the point where you started the company. But I keep like, sharing about, like, two things. Okay. Like, one thing is, let’s say the technical side of things, which has to do with, yeah, like, how we move the data around, and how we can make have, like, you know, like, specific guarantees around that, and security, and like, all these things that we get asked already. The other thing, though, is also like You talk a lot about what I would call, let’s say the product or like the experience, right? That is driven by the needs, like the business may like, but these people out there have rights. And you mentioned stuff like, Okay, you’re having you’re making like a contract, and part of this contract is like some stuff around the data rights, or there are some requirements around the data that has to be met. Right? All these things are not, I mean, they obviously have, let’s say, a technical dimension because you have to be able to like to automate these processes. But they are primarily driven, let’s say, from what a user needs, right? Can you tell us a little bit more about that? And like, describe to us what inexperienced people look like, right? Like, let’s say, I am an AI company, and I do need to go to Eric and get his marketing data, right behavioral, like user behavior or marketing data, right? Don’t have the technology, forget about the technology, right? What does this interaction look like? Like, how do we do it?
Pardis Noorzad 37:50
Definitely. So like, as an, you know, AI company, you will have, you’re probably like offering some sort of, you know, modeling approach that you can help where Eric doesn’t have to, as an example, I guess, given us all my experience working in FinTech is one area when you start moving, you know, working directly with customers is the issue of fraud detection. Non detection is a super complex problem, it needs so much experience, kind of, of, you know, just human behavior, for you to understand how to solve for our detection with ML and AI and things like that. And so even what data to collect, and how to structure that data just needs a lot of domain expertise. And so there are companies that say, Hey, I can build really great fraud detection models. If you know, start this financial services company, you can just, you know, start using this model. And so, like, in this case, that, you know, financial services companies are now highly motivated to use or try out these kinds of AI platform companies, right? And so now, there’s one initial evaluation phase where even before I sign a contract, I want to transfer some data and see how it works on some of you know, the past two days of data and try to see, okay, are you guys able to use this tool to train a model to capture at least you know, 80% 90% of fraudulent activity, and then I can give you more data so you can get higher accuracy on the type of data and things like that. So there’s valuation, but then the moment we decide, okay, this is actually working really well. It’s way better than something I can build in that amount of time. In the amount of time that I have. Let me kind of sign in on the time track I signed a contract. Now I want to send my data on a recurring basis for you all to kind of build that model. And it kind of depends on, you know, my volume of transactions and your need for things like how often you need to update the model for us to come up with the sort of cadence at which data should be sent. And so if I’m low volume, I might not even need the model to be trained, like, even more than one today. But let’s say if I’m very high volume, and if the types of stuff that can happen in a day is just too much, I might even need higher cadence or things like that. So it really depends on the type of business and the size of the business, in terms of how often they decide to, you know, set the cadence of these pipelines. But essentially, I think that’s the workflow
Kostas Pardalis 41:01
100%. And, okay, so I actually, it’s interesting what you said, because you also described like, bots have another question that I have. So, okay, we have very, give to me his data, and I train a model. And I have to somehow, like expose this model back to him, right. So there is quality, that’s the like, the input that I need from him and the output of my work. That’s what I’m also getting paid for is the model, right? I don’t think that it makes that much sense right now to get into the logistics around that, like how this model ends up back to Eric and what Eric does without let’s consider the assault problem, right? But you mentioned, for example, how often these data transfers should happen, right? Which means that there is like, it’s not a one time process. It’s not like Eric will send to me a data set, altering these and then we say goodbye, right? Like, we have to iterate on these things. Which reminds me of like, wow, it’s usually like a pipe pipeline is supposed to be doing like data infrastructure. The difference that I see here is that we are talking about things like, when we are talking about data in front, like pipelines, we are always talking about something in Denmark, right? Like, it’s my data infrastructure, my pipelines, I run them. And they’ll have a lot of implications, both in terms of like the governance, but also like the technology itself, right? Like, it’s a different thing to build like a pipeline, that I know exactly where it runs, like, even not to which data center listing grounds, like the software that we will be building for that, like completely different compared to be, oh, I need now a pipeline that is going to be connecting to entities wherever, right without like controlling me or anything in between, like it’s over like the internet. So from a technical perspective, like what does it mean, to establish these relationships between me and Derek, and send the data at a regular pace, and like making sure that we have specific guarantees around things like the quality, both of the infrastructure and the data itself?
Pardis Noorzad 43:23
Immediately, I can talk a little bit about the initial thing that you were mentioning to how to serve the model back. Usually, these companies just expose the API and you know, so for every new data point, just call the API with the trained model. And kind of evaluate for that new data point. I like to say whether it’s right or not, or things like that. And so with respect to kind of governance, I think something that’s really important to consider about pay is, you know, one tool like general folders and the other like customers or general folders is, it’s very similar to every other kind of cloud based tool. And so, you know, when, let’s say we sign a contract with any cloud service provider, right? We are, you know, we have some control over okay, I want to be in this country, and I want these particular regions and things like that. And then we okay sign these contracts on the privacy side, security side and all that, and this would be very similar. So, when we go into business with any of these customers, the two sides already have a business contract. Data mining has already happened, where one is trusting the other side to manage their data to kind of, you know, follow whatever limitations or requirements or regulations that this target business has And so they’ve already kind of signed that type of contract together. And that’s where we go in and kind of try to adhere to those same rules. There might be cases, though, where the two sides don’t have a contract signed, let’s say one side is trying to evaluate the other kind of product. But we have a contract signed with both sides. For example, to set up a third party place, and a trusted place where the two can collaborate. I’m that friend, you know, now, we adhere to whatever requirements there may be. So, hey, I want to be in this country, in this region on this cloud, or wherever. So very much like all the other kinds of data tools will kind of follow those same processes.
Kostas Pardalis 45:58
Make sense? And what’s the difference between what you are, like having a mind as the solution to this problem, compared to, let’s say, products, like the data sharing capabilities of like Snowflake has, right? I mean, they’re also trying or they already like, succeeded. I don’t know exactly how successful it is. But they have like, kind of like a marketplace at the edge, right? Because at the end, when you’re talking about like, the, what you’re describing remains a lot of, let’s say, like a two sided marketplace transaction where you are, you know, the broker in between, right? You are making it easy for these parties to transact, right. Regardless of obviously, like in your case, it’s not like something that you buy, like you do like an eBay, right, which a physical object is data, right? So what do you see? Like, how do you see these attempts to solve a similar problem? Right? That’s the first question. And then I have like, I’ll follow up with another one. But let’s talk a little bit about like the competition, let’s say,
Pardis Noorzad 47:09
and yeah, definitely, I think, you know, with Snowflake, first of all, you know, they’ve been very public with their data on, you know, this particular business unit for data sharing data marketplaces, they released their data. And it looks really good. I think, for me looking at that, it’s actually kind of, you know, confidence building, because, you know, this is a company that knows what they’re doing there. They know their customers. And so for them to be public about this particular business unit and kind of continuing to invest on this side is definitely a good sign. In terms of data marketplaces, it’s certainly a one to many kind of relationship, which I think is very important in data collaboration in general. Not the main focus for us, I think, early on, we definitely want the one to one kind of connections, we really want that type of experience. To work on that first, see how that goes. That’s one thing. The other is the approach that companies like Snowflake take on the kind of zero ETL side, right? They’re saying, Hey, don’t move data around, you know, there is no need to replicate data from one place to the other. If you’re both a customer of Snowflake, we can provide you a view of the data, where you have real time access, you see the exact changes, there’s only one copy of the data, it’s pretty efficient. From, you know, the perspective of this company, we believe there’s all these other needs that are not captured with this particular kind of approach. And although this is super efficient, and to make so much sense. And we shouldn’t be doing this when we have to save the past. We want to be flexible to offer other ways of doing that. And so, yeah, so like, I would say that’s how we differentiate just to be more flexible in terms of what is possible.
Kostas Pardalis 49:30
Yep, yep, no, but 100% makes sense. Is there anything like that? I’d say that like a feature of these platforms that you really find interesting and before that like to make it a little bit easier for you like there’s another question because I think like white bags, I’ll give you an example. I always found it very fascinating how BigQuery has used the public datasets that they have for marketing purposes actually, right? So they are like exposing, for example, the daily GitHub data, which is like a huge data set, right? Yeah. And, okay, it’s like an amazing way for someone like to get exposed to BigQuery, right? Like, I might be looking for something completable like I just for the data at the end, and like, I’ll share about BigQuery. So I found, like, always, very interesting. And I think I’m successful. This, like the marketing activity that BigQuery is doing with data sharing, in a way is there. But okay, you’re much more into labs, products. So I’d love to hear from you something that really excites you that someone else has built.
Pardis Noorzad 50:50
I’m definitely very excited about Snowflake. I used the product and bought it at my old company. I really like it. I think it really makes the team quickly become very efficient. And kind of independent. As well, you know, as a data team, I felt really good to independently manage our infra and move as fast as we needed to. While kind of, you know, it was also possible to manage our costs. And we and I have dashboards and monitoring and all of that to manage those. So definitely excited about that. I’m trying to think of other products in the data space that I’m paying for. Very say you want in particular in the data collaboration space, or this.
Kostas Pardalis 51:49
I think you answered my question, to be honest. And I would like to close because it’s my last question. And then I have to give the microphone back to Eric, asking the same question, but like, for the things that you are building, something that you are really captured by, I mean, obviously you’re, like, proud of everything that has been built. And I can relate to that myself. But if there’s something like a feature or Iona like even something that you’ve learned, by interacting Trank like to show the problem that has like a special place in your heart or mind, let’s say, I’d love to hear that, like something that was like surprising in a very positive way for you through like this journey of like building and stopping a company in trying to solve this particular problem.
Pardis Noorzad 52:43
First, sure, I mean, like, I can think of so many things. And, you know, one of the things I would say the hang of of our people rather than product, which it was just over this journey, ever since I started the company, it’s been so heartwarming to see kind of, you know, the name able even pre product kind of able to talk to so many people who will give such good feedback and provide hope and support in so many different ways. Which was just, you know, makes me feel so thankful to be part of this community to be able to have access to that. So that I would say, number one kind of really, super interesting and exciting thing that I’ve experienced.
Kostas Pardalis 53:41
Okay, you have to choose only one so,
Pardis Noorzad 53:44
yeah, like, definitely, the race is
Kostas Pardalis 53:47
like a way for us to make sure that you come back because we have to hear. Okay, so that’s all from my side. Eric, the microphone is yours again.
Eric Dodds 53:58
Yes. Well, we’re, we’re at the buzzer, as we say but parties. I have a question. I was thinking about how many different contexts you’ve worked in a similar problem space, you know, so you studied, you know, software engineering, you got into sort of data science type stuff, the mathematics behind that. You did that in several forms at several different companies. You’re starting a company that sort of is in a related problem space. Now as a founder, is there sort of a lesson or a principle that you feel like has served you really well throughout all of the contexts or a piece of advice that you’ve returned to over the past, you know, decade or so that has sort of been a theme for you throughout all those different contexts?
Pardis Noorzad 54:54
Um, I wish there was something that I knew all along and And, you know, there definitely wasn’t I mean, but, you know, there were so many ups and downs, and I’m sure there will be many, in my future, where something that I’ve, I would say, learn. And you know, that anything I want to take with me on this journey is how one puts people first. And no matter what happens on the business side, or product side, or, you know, something doesn’t work out, there’s ups and downs, and all of that, but really put people in relationships first, that’s super important. And to kind of always try to pay attention to yourself as a human, you know, exercise, eat well, pay, you know, go out with friends, and spend a lot of time with other people. And so, because we need all of that to function well, and so, as I’m thinking about how to manage myself, and, you know, as I’ve become a manager of more people to kind of ensure that everyone has a well rounded lifestyle, because it’s super important. For the long run.
Eric Dodds 56:28
Yes, yes, very wise words. I think those will certainly serve you well, as a founder will produce. Thank you so much for joining us on the show. It’s been a pleasure. So many things you didn’t cover. So we’d love to have you back sometime.
Pardis Noorzad 56:41
Thank you so much. This was so fun. I appreciate you having me on the show.
Eric Dodds 56:48
Man, Costas, this episode with parties from general folders, really got me thinking about economics a lot. I mean, I know that sounds weird. But when you think about the idea of two businesses, or two entities sort of collaborating around data, whatever that looks like sharing one way, bi directional or whatever. It’s a huge infrastructure problem, right? I mean, the fragmentation makes it a nightmare. You know, for anyone who’s had to do that, it is really interesting that the more parties talk about it, the more you realize that the fragmentation on the transactional side is actually far worse than it is on the infrastructure side. And so the infrastructure is almost a proxy for sort of the way that companies are trying to interact. And so I guess maybe you think about the Snowflake marketplace, has an interesting, sort of economic layer on top of their ecosystem, then gives them a lot of market power. But if you actually zoom out and think about the economics of that, in an infrastructure agnostic way, that’s pretty crazy. So I’m going to be thinking about that a whole lot. And that’s just such a compelling idea in general, from the party. So that was my big takeaway. Yeah, 100%,
Kostas Pardalis 58:13
I totally agree with you that lists some very interesting, like, economic implication, when all the stuff that like we discussed about with politicians, that’s what I’m going to keep also, to be honest, because it is, let’s say, an extremely strong signal, that data is standing in the eye actually kind of a commodity, like, we will start like transacting on top of data. I mean, we were doing it already, but it was like more of, let’s say, a nice kind of money. Right? Like, yeah, financial markets are known for being here. I mean, like, transacting over data, because data is actually like the moles, right? You’re probably something similar might happen already, like we’ve in the medical space, right? But what we actually see here are happening is that we are entering this new Heartsaver like era in a way where actually we’re going to see a lot of economic activity on top of data, something that these these accelerated by this whole AI thing that is happening, but that’s just a catalyst that makes things like go faster. It’s not an enabler like it was there. People predicted that it will come and it seems that it’s coming faster than we thought so that’s why they keep and I would encourage everyone to go and like to listen to the episode because there are like, I think like some very interesting insights around like, what the future will look like.
Eric Dodds 59:57
Yeah, we should do a shop talk on that because you. It’s, I hadn’t thought about this, but you bring up, you know, the FinTech industry, transacting over data, right. And one thing that’s really interesting is that what makes that possible is that you assign value to data that allows for transactions on a very wide scale across a very large number of people in the market. And I didn’t even think about this being the beginning of an era where you have all this fragmented data across all this fragmented infrastructure across all these different companies. And beginning to see value being assigned to that even through the fragmentation is that’s crazy breakfast Shop Talk. That’s going to be wild.
Kostas Pardalis 1:00:42
Yeah, 100%. And just like the mix something like, clear, it’s not just like FinTech, even like traditional financial markets, we’re doing, like, if you go and see, like the traders in Wall Street, like, all these HFT things, or like the hedge funds, it’s all about data. Yep. At the end, because that’s an interesting conversation I had with someone who came from they’re not doing that anymore. But like he was saying that, yeah, like the strategies, the algorithms, they all get known at some point. So all that is left at the end that it’s consistent is the data. All right. Yeah, I think we should definitely do a shop talk. And also, like, maybe like, find people like to have a panel or something like around that stuff. But before that, everyone should listen to the episode because these are like, what? There’s a lot of like, discussion around a lot with parties.
Eric Dodds 1:01:48
Yes. And I got to come clean on some past sins, in terms of data collaboration with advertising data, which honestly, Nikhil really clean. So thank you for that opportunity.
Kostas Pardalis 1:02:02
Yes, you’re forgiven.
Eric Dodds 1:02:05
Thank you, father. That’s right. Well, thanks for listening to The Data Stack Show. Definitely listen to this episode. Great One, subscribe if you haven’t, tell a friend and we’ll catch you in the next one. We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.