Episode 169:

Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ

December 20, 2023

This week on The Data Stack Show, Eric and Kostas chat with Tasso Argyros, the Founder & CEO of ActionIQ. During the episode, Tasso shares his journey from Stanford to founding ActionIQ. He discusses the evolution of database architecture, the challenges of handling customer data at scale, and the importance of making data accessible to business users. Tasso also discusses his experience selling MySpace as his first customer and how ActionIQ bridges the gap between data engineering and marketing. He emphasizes the importance of customer interaction, understanding business user needs, creating a flexible, workload-agnostic infrastructure, the future of customer data platforms, and more.


Highlights from this week’s conversation include:

  • The Evolution of Databases and Data Systems (2:33)
  • Abstracting Data for Business Users (4:31)
  • Building a Database for Google-like Search (7:58)
  • The Big Data Explosion (11:10)
  • Selling Myspace as First Customer (13:14)
  • Starting ActionIQ (16:57)
  • The customer-centric organization (22:46)
  • Transitioning to customer data focus (23:53)
  • Understanding business users’ needs (28:30)
  • Supporting Arbitrary Queries and Data Models (34:42)
  • Unique Technical Perspective of Clickstream Data (37:01)
  • The value per terabyte of data (46:45)
  • Building a product for multiple personas (50:45)
  • Composability and Benefits (58:05)
  • Evolution of Storage and Compute (1:00:09)
  • Composability and Treasure Data (1:02:10)


The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.


Eric Dodds 00:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com. We are here on The Data Stack Show with Tasso Argyros,welcome to the show Tassp, we’re so excited to have you.

Tasso Argyros 00:33
Great to be here. I’ve been looking forward to it. Thank you for having me.

Eric Dodds 00:38
All right, well, give us an abbreviated background. So you’re the CEO of action IQ, but you’ve done lots of database stuff. So just give us a brief, a brief background.

Tasso Argyros 00:48
Yeah, so I’ve been, you know, I’ve been a database guy, my whole life more or less, whole professional life, I was, you know, grew up in Greece, where I studied engineering. And then I came to the US. I started a PhD at Stanford to study databases and distributed systems. And about a couple of years into that I dropped out and I started one of the first. They’re nothing massively scalable database companies at the time. That was back in the 2000s, it was called Aster data. And Aster was one of the first companies that could deploy very large databases on commodity hardware, right, so much lower cost to store and analyze big amounts of data that was, you know, pre Hadoop, it was around the same time, but MapReduce, and that’s that was coming out. I sold that to Tara data, which is one of the, you know, a big data warehouse company. At the time, I was the largest enterprise data warehouse company, and spent a few years they’re definitely a great school in databases in the business of databases. And then, you know, I wanted to do something slightly different. So I left and started actually IQ, which is a customer data platform. So there’s definitely a bunch of database technology involved. At the end of the day, we have a UI, and our goal is to empower the business users, along with the data engineers. So you know, databases were a technical project. And actually, it serves a dual purpose, as I like to think about it. You know, which kind of brings us to today, you know, should it be his big exciting market, and I’m sure we’ll talk about it in the show.

Kostas Pardalis 02:33
Yeah, 100%. And, by the way, doesn’t like one of the things that like really excites me in, like, the conversation that we are going to have today is like this connection between, like, the data systems like in like, at scale, especially in the business use case, and you chose, like, I think, kind of like an extreme use case here. Because we, you, have a problem that, from my experience, at least when we are dealing with customer data at scale, it can be hard, like the data platform that you’re using, and how to interact with it. But at the same time, you have one of the most like, demanding in a way customers out there, which is marketing people, right? Who has to use these? And they have to use it in a way that ‘s very provable, that brings value to the company. And so I’d love to get more into like, the connection, like the dots there, how like data systems, and like the evolution of them, like lead, like to support these kinds of use cases, and also, how you solve this very hard product problem, right? It’s one thing to build a database with a terminal, like SQL. It’s another thing to build something that someone needs to slice and dice data for marketing campaigns. Right. So that’s something that I’m really excited about. That’s right. What’s in your mind? Like, while you would love to get deeper into life like today? Yeah,

Tasso Argyros 04:06
so I think, because what you say is spot on, right? So thing with that said to be, you know, we had our work cut out for us, because, first of all, for the business user, you need to abstract things, enough so that they can do stuff without understanding all the underlying, you know, data, you know, they shouldn’t know SQL, and they shouldn’t be able to know what every table column means right to do work. So you need to abstract things, enough for the business user to do the work, but not so much that they can really do that much anymore, right? Because he does practice things to the point of elimination. And the other thing that I think is interesting is that it’s not just the business users, right? So we have the business user persona. We also have the data engineer as an endless person as a database. You know, you have the database users or engineer chandeliers and using rights everybody knows. Did you know at least SQL is right, and people understand data structures and what the data means and in our world, some of our users do, but some of our users don’t. Right. So you also have like this mountain of users. So it was definitely an interesting problem, which is kind of what I was looking for. But beyond that, I think it’s interesting to think how the CDP and the database world have been kind of intertwined, right. And, you know, some of the latest trends in the CDP world, like composability, are enabled. And were created because of how the cloud databases have evolved in the past few years. So think, database architecture, evolution, and CDP evolution kind of go hand in hand, even though they’re separate spaces. So I think it would be very interesting to talk about that. And, you know, what is it to be right, and which still, you know, hours of debate, right? I can take place in what they were to say, composable. So all this stuff is fascinating to discuss. Yep. That’s

Kostas Pardalis 06:06
super interesting. I can’t wait to get into the details here.

Eric Dodds 06:09
Yeah, let’s dig in.

Kostas Pardalis 06:11
Let’s do it.

Eric Dodds 06:14
Okay, so let’s start where your story begins at Stanford, and then kind of go from there, because you sort of wound your way through databases, and then sort of ended up at the business user. Can you just trace that path for us a little bit?

Tasso Argyros 06:30
Yeah. So you know, I landed at Stanford, and, you know, it was really such a fascinating time for me, you know, I got into the Ph. D. program, computer science, right, which is, you know, obviously, a very high esteem program with, you know, so many great people have come out of it. And, you know, before Stanford, I had done some research and data mining wasn’t electric, and my intention was to go study databases. But what happened was, I ended up meeting this professor David Sheridan, was this really brilliant Canadian professor and researcher. And David was, he was the first check into Google. So he gave him the first seed money, right? I think he ended up owning 1% of Google or something like that, which is pretty good. Still has it or not,

Eric Dodds 07:20
but it’s pretty good. It’s probably an understatement, but Well, yeah,

Tasso Argyros 07:23
they’re limited. That was a good ROI for a city investment, and, you know, together with, you know, a couple of other posts like Rajeev Motwani when forced to pass away and a couple of others. And, you know, if you recall, at the time, Google had implemented search using commodity boxes, right? Alta Vista before they were using these big mainframes was very expensive, right? And Google, they would take the speech and boxes and deploy the search. So David came to me and he was like, Hey, you’re a database guy. Could you build a database the way Google built it search? That was kind of them the initial problem statement, I would put it and then I met up with a couple other students that were looking at the same problem from different perspectives, there was some peer to peer database research at the time that was relevant. And so we started Aster data out of Stanford. So my advisor, put in some money, you know, we had angel investors, there was no formal seed investment back then rising as pioneer dividuals to do that. And we did end up developing a database that essentially, storage and compute was together in commodity boxes, we would buy Dell or HP servers, and many of them, right, hundreds of them. Our first customer was Myspace, which at the time was, you know, as big as Facebook, right? Was a patient of the day. Wow. And we would deploy massive scale, initially for MySpace is customer data. And what’s interesting about our approach,

Eric Dodds 09:07
like, can you just talk just very briefly about like, what was MySpace doing before? And then like, what were they sort of migrating on to Aster? Like what? Yeah, specific workloads? Because obviously, they didn’t do everything at once, or I would guess a date and a

Tasso Argyros 09:22
Yeah, so MySpace. So essentially, MySpace was a Microsoft shop. So they were using SQL server at the time, what SQL Server couldn’t do was all the Clickstream data, right? So you could do the profile data, right? In SQL Server. There could be darkness , you know, you do operational analytics on that. But it was the event data. That was massive in scale, right? Because the MySpace users were all over the place and obviously ran that they couldn’t do so they used us initially, specifically for any Ben behavioral information was a master. And then that profile from the most static information about the customers was in SQL Server. Number 10, they expanded the usage rather than did more and more like I remember, Myspace had one of the first revenue sharing agreements for music. So all their loyalties would be computed through Aster data because of how to do what, how much of a song did you listen to, how to do with how much money you would pay to the labels. So that was also based on event data. So it had to be computed, you know, very stressful, by the way to be running, you know, queries, for we know what was like huge amounts of money at the time, out of our systems. And so, you know, in what’s interesting, I think about that architecture is that before Astor did that, with Google search, if you had the large scale database, you had to buy a mainframe, you would buy a multi CPU server from IBM, you would buy a discovery from HP. And you would have to spend 10 million bucks, right SaaS on the hardware just to get you starting just to build a 10 terabyte data warehouse. And so the whole idea with Astor was okay, we bring storage and processing together, you partition the data, you partition the workload today, that’s obvious. But at the time, that was a very new approach, right, believe it or not, and we were the one of the very first vendor teams to do it as a product. And so, you know, that kind of led us to when the big data exploded there, there was in 2010, there was this whole explosion, but big data was the cover of B there was everywhere. It was on the cover of The Economist. And then very quickly after that, the more legacy database lenders like Teradata, so the big interest, right? And that wire does. And, you know, subsequently, there were a couple of more iterations of Data database architecture that was Hadoop, right in the whole NoSQL movement, which didn’t go very far. And that kind of came back into sequel but in the cloud, we it’s kind of resulted in what we know today, as you know, the Snowflake Databricks type of architecture, which ironically, separated processing and data again, right. So we went from data processing being separating the mainframe disk array world, to coming together in the MPP, database world, and Hadoop. And then it got separated again in the cloud, just because network interconnects, right, became so efficient, that you could actually afford to do it when you couldn’t do before. But that was kind of the quick, quick story on Napster.

Eric Dodds 12:35
Okay, I have a ton of questions there. And then, of course, I want to ask about action IQ. But just one quick, if you will indulge me. Can you talk about selling MySpace as your first customer? Because, you know, again, that was in the tech world some time ago, actually, not that long ago. Right. But it would kind of be like a database startup saying, you know, Facebook has our first customer, which is a yummy big sale. And so can you just give us that story for the entrepreneurs in the audience? Because that’s just I’m, I need to know

Tasso Argyros 13:13
that. Yeah, no, you really, you first of all, it was a huge deal. I think the deal itself was 10. Next, the money we had raised at the time, just to give you a sense, right. So it was like, Yeah, I think it was almost like seven days or something crazy. Wow. And the way it happened was that Ron Conway was one of my seed investors. And he connected me to Adam Bain, who ended up being the CFO for Twitter later on. And Adam Bain at the time was running Fox Interactive. Fox had just bought MySpace. Okay, so I was very fortunate to be dealing with a very intrapreneurial I mean, for those of you that know, Adam, Jesus, super smart, super entrepreneurial guy. And he saw in us, you know, and he knew he had to scale MySpace, right? He was very growth minded, right? He was very ambitious, and MySpace at scale north from MySpace to scale. MySpace data infrastructure had to scale. Right. So you had the very technically aware, ambitious business owner. So Adam was the person we interacted with through Ron Conway. And at the time, you know, the reality is Adam, and MySpace didn’t have that many options, right. So their options were SQL Server on the one side, or data data on the other side, right. And then SQL Server couldn’t scale and Tara data, which probably for the amount of day we’re talking about, will probably have cost, if I had to guess, just $100 million. Yeah, that’s the amount of money we’re talking about. Right? Yeah. So you had to do something and we will be right in between right in terms of cost, and why we can handle it. And, you know, we were, again, you didn’t didn’t have many options, which takes us back to I think, every time you close a big deal. The reason that a massive deal happens is because the customer absolutely needs what you’re selling. It’s vital for them. And there’s no alternative that comes close. Yeah, that transaction met both criteria, right? It was critical that MySpace could steal their data operations, and there was no alternative at the time. And it paid off for them. Right. I mean, we did, you know, a lot of what we promised we would do, and I’m not sure what we would have done at that time. You know, again, later on, you know, 10 years later, there were a lot more options. Sure that day. Yeah, the word. And so that’s how this whole thing came around. But to be very honest with you, I was out of school, I was 24 years old. And if I had the experience I have today, I would never ask for so much money. Like that was crazy. Like he was completely inexperienced that he asked for how much I asked him for the time to be completely out. Yeah.

Eric Dodds 16:05
Well, like a true, true technical founder. You sound simultaneously like someone who has a deep grasp of the combination and separation of storage compute, and how to make an enterprise sale. Which is Yeah.

Tasso Argyros 16:22
You have to write, you have to learn this stuff. Yeah. And you know, our approach to pricing was very rational, right. We’re like, we ran the math. And we’re like, alright, that’s a reasonable price. But you just look at the price, you would get scared. But yeah. The math post is correct. In the end? Yeah. Well,

Eric Dodds 16:38
yeah. I love hearing the phrase, the math is correct. Okay. So tell us about action IQ. So then we’ll go back a little bit because I want to talk about databases in different flavors, because you’ve got an interesting journey. But what is action IQ? And why did you decide to start it after? You know, working in databases?

Tasso Argyros 16:56
Yeah, so So I think, by now, it’s probably obvious that Aster, you know, we did a lot, I suppose a generic database, right. So we had a lot of use cases, like, you know, from the MySpace use case, where they’re working with healthcare companies, financial companies, you know, a lot of big banks globally, telcos. But we ended up getting used to a lot of customer data. Because of the time it was the event data that couldn’t be processed by the traditional databases. So almost by accident, a lot of our use cases were around customer data. And subsequently, even with Tara data, one of my observations was that I was fascinated by the vast amounts of customer data that would live in the IT systems, Aster data, data, whatever was your data warehouses, right? Massive amounts of customer data. And then when you will do what happens with the business, which is where the value of data is supposed to be created, right? Because at the end of the day, it doesn’t store customer data, right, these purposes, story to power, business use cases, right? Or product use cases, if you look at the business systems, that could at best store 1% 0.1% 0.01% of that customer data at best. And the reason was there was this bifurcation, right? So you could either buy a product for engineers that scales like Aster. Yep. Or you would buy, like an email tool for the business that has almost no data infrastructure behind it, right. And there’s this huge gap in between, right, so I started thinking a lot about that gap. Because in my experience, you know, I’d asked her out there that often, we would put a lot of customer data in those databases, and we would succeed in doing what we said we would do. But unless the business got direct access to it, the buyer wouldn’t be there. It was almost like, you know, I used to joke that, you know, the operation was successful, but the patient died, right? Because, you know, the day that got into the lay, he was not there, because you know, the people would support the credit value. They weren’t technical, they didn’t know SQL that there had to be people in between, they were very slow, the system was not connected, etc. So I got fascinated by this problem of how do you bridge not just the systems, but the two worlds because we’re talking about different cultures, right? The data engineering culture, where I was part of, is one culture. And then let’s see the marketing culture as a completely different culture that values different things in your settings, different languages. So the intersection of these two worlds are fascinating to me. And I decided to start a company to solve this problem. And when I was starting the company, I wasn’t sure exactly what it would look like. But I knew it would have to scale with data, as much as Aster data, and you would have to have a UI that you wouldn’t have to be a data engineer to use. That was kind of my two criteria when I started the company. And that’s how action IQ was created.

Eric Dodds 20:00
That’s it. Okay, and just give us the pitch on action IQ, like what is? What does it do? Obviously, it’s a UI on some sort of database. But, you know, why do people

Tasso Argyros 20:11
actually like you? That’s three things, right? So we connect data on the data side, right? So we can plug into a data warehousing cloud data warehousing, data lakes are multiple, right? So we can do data Federation. We used to bring the data over, but now we push the queries down with a composable model, which we can discuss. On the application side, we connect multiple applications. So theoretically, every business system you have that’s touching the customer should be connected to action at you. Because

Eric Dodds 20:42
email we need to CRM, customers email,

Tasso Argyros 20:45
CRM, web personalization, call center, direct mail, for retail, right, decisioning system, next best action systems, you know, product, right? Because the product is customer facing, right? So there’s really a very long tail. I mean, there’s probably an enterprise that probably has like 100 to 200 of these systems, right? At least Yeah. And those are integrations

Eric Dodds 21:12
that you support, this

Tasso Argyros 21:16
integration with support, which can be push or pull. And then on the interface itself, part of the interface for the business user part is for data engineering support for the business users, what we want them to do is to get access to an abstracted version of the data. And being able to say, Who do I want to target? Why? And what do I want to do with these people through what channel? Right? And being able to deploy a new experience, right? The marketing person may call it a campaign, right? But you can go beyond marketing with this. deploy something new and you experiment running your test with customers, without having to write SQL without having to know what’s this column, stable data warehouse, none of that stuff, right. So we offer self service that you didn’t have before. And we offer agility. So you can do things in a day that will take you a month to do before when it comes to creating this new experience. And orchestration, right, which simply means email doesn’t do its own thing. And the web does its own thing. Now you have kind of something to coordinate what this one customer sees through any channel that may happen to directly. Yeah,

Eric Dodds 22:27
I kind of think about that as like, you know, marketing is kind of like a DAG anyways, right? It’s just the nodes. That’s right, be like different tools that are sort of emitting something out of an API, but whatever. That’s my nerdiness. Yeah.

Tasso Argyros 22:46
And what’s interesting also, because we say marketing, and I’d say marketing a lot as well, right, and, but marketing in many places have become kind of the, the ambassador of the customer. But if you think about it, many of what we’re talking about is not marketing, for instance, we have a lot of b2b customers and technology customers, you know, Atlassian is a good example, right? And a lot of how you interact with your users there, it almost looks like customer success. Yeah. Which is not really marketing, but it is a user interaction. But in most companies, companies are not organized around the customer or the user. They’re organized around functions, revenue, and the more functionally organized, the marketing ends up taking the lead in many places and saying, how do we align around this one customer, right? But it really becomes a very cross functional thing. Because most functions, right? If you think about product, marketing, customer success support, everybody’s touching the customer. And in theory, everybody should be one, or if there’s not one, it should be tightly coordinated, orchestrated in some way. Yeah,

Eric Dodds 23:53
makes total sense. Okay. So I want to dig into databases a little bit here. When you Bill asked her, you know, you built what you called, sort of a multipurpose, you know, or sort of, let’s say, a workload agnostic piece of infrastructure. You could use it in healthcare, you can use it as it ended up being used heavily for Clickstream data, just because, you know, the infrastructure around sort of, you know, sort of more traditional SQL based, SQL based databases wasn’t optimized for that, right. That’s a pretty different problem to solve than, at least from my perception. Then building a database that’s geared towards, you know, essentially driving customer experiences, right. So when you think about workload agnostic, you need to think about sort of optimizations on a more general level, right? You know, Figuring out particular data types, handling a large variety of data types, you know, lots of edge cases, when you build action IQ, what was that transition? Like? Because now, you’re really focusing on the customer data. And so you can be much more opinionated. Can you talk about how you approach the person?

Tasso Argyros 25:23
Yeah, yeah. Yeah. So you know, this is it really, it’s something that most databases haven’t been on the database side, right? You struggle with this problem all the time, because you have to support the long tail of use cases. And then the deal is really not right. And you get into all this esoteric functionality, that a small part of your use cases need and you have to support them, and everybody has to be, you know, an equal citizen almost. And that becomes very difficult. That with action IQ, the day we started, right, and both myself and my co-founder, we were both database backgrounds. It was almost released the feeling I would describe it as released, that we could say, Screw that 90%. How can we further optimize this? Right? They had to do it very quickly, it was truly a relief, like, I mean, I can give you a few very basic examples, right? We support arbitrary data models. But at the end of the day, you have some customer identifier, and you have some event timestamps, like you know, even simple things like that allows you to make decisions that can optimize performance and optimize stores in ways that you could never do in database because it would break a lot of things. To give you another example, right, we do a lot of segmentation, like operations in the UI. Yep. segmentation. Turns out, it’s a Left Outer Join, right and SQL, at the end of the day, guess what, we have a really fast method or join, optimize. You know, this is like one of 100 different types of joins, the database has support, right. But in our case, it just happened. We knew that if we get an extra usage of anything else, we could optimize it. We could optimize it on day one. So, you know, it was always said we knew we could always do it, right? I mean, I knew I could do all these things in the past, but we wouldn’t do it because you have supported all these things. And, and it’s really a blessing and a curse, right? I mean, the reason why, you know, Databricks Snowflake has these huge valuations is because it can support all these use cases, they’re sure, but then the technical complexity that gets graded is kind of unnerving. So I would say, we weren’t careful to not put too many constraints in the action of your data queries, and all that the whole way we structure the system. But even with the very basic constraints, we were able to do this, because we knew we’re dealing with customer data that has some very basic properties that allowed us to do things that, you know, we would have never done in a database. So that’s kind of the database side. Right. But for me, personally, that was the easy part. Because I was, you know, that was kind of my expertise. The difficult part was understanding how the business users wanted to use the system. Yeah. And that was a long, that was a learning curve, right. And what we did there was that our first customer was this ecommerce company in New York or Gilt Groupe that used to be an Astros? Yes. Well, I remember gilt and yeah, and they had a great, very technical team, very sophisticated team at the time, right, very high flying. I mean, the whole space kind of fizzled away, but there wasn’t anything they could do about it. But for the first year of our life as Action IQ, we got to sit next to the guilt group’s marketing team, so our actual physical seats, desks were right next to our customers for a whole year. Oh, wow. And so we could see, I could see them from my desk using our product. And we would talk to them, we would have lunch with them. That would tell us what they’re doing, how they’re doing it. A lot of these people came from the financial services sector. So they brought a lot of very mature best practices for CRM. And then, you know, our first hire, and one of our very first hires was a UX designer, because we knew there was not a you know, again, we’re database people, right? We’re infrastructure engineers. So we wanted to have zero experts roasted coal and dried Yeah, and SQL and C++. You know, Skylar, right is how we use Scala to build a product. I wrote code for the first year. I write handwritten code once in a while. It was fun for me to get back to it for a little bit. But you know, we knew when it comes to UX, that was another thing, right? So we made a UX hire super early. We end up hiring great people. We sat next to our business users very early. We forced ourselves to get close to you know, where I felt we were the weakest. because actually it is all about bringing data infrastructure and business applications in one company. Yeah. And again, this is not about the technology, it’s about the culture. It’s about the mentality. Right? That’s why I joined the company. Right. It was fascinating to me to do something like that. But you had to force yourself to get uncomfortable early on to bring these two things together.

Eric Dodds 30:20
Yeah, I love it. No, I mean, what a great story about just, I mean, I think that’s actually just sage advice in general about, you know, having a desk next to your first customer, I was worried that you were gonna say, our first customer action IQ is my space. Yeah, I’m glad I got that.

Tasso Argyros 30:45
Yeah, no, I didn’t. I didn’t actually want to, I mean, you know, we needed a local customer to do that. Right. So I tried really hard to find a customer that was local to us in New York City. Okay,

Eric Dodds 30:56
so I know Costas has a ton of questions. But one more question for me, maybe two? Can we talk about the data model? So you said that you worked really hard not to put constraints on the queries that action IQ is able to execute from a segmentation standpoint, okay. I get that, in theory, the world kind of runs off of the Salesforce model, which is like a lead contact account. You know, however, you know, that basic data model, right. And at the end of the day, there’s sort of an end user, whatever their relationship with the hierarchy of other entities that your business, right, right, right, right, you’re sending a message or an advertisement to a particular user or group of users, right? How do you deal with that? Because one of the problems, you know, like, one thing we used to joke about all the time is that, have you ever seen a Salesforce is not a Frankenstein? No, no one’s ever seen that. And the reason is, because, you know, the data model that they have, does not actually afford flexibility to represent some, let’s say, a business model like a, or like a data model like Gilt Groupe, right? Really hard to represent that in sort of the rigid, you know, Salesforce data model. How do you approach that from the database standpoint, but so you want flexibility, but you also need to have some sort of underlying data model that allows the UI to create, like a sense of logic and predictability for an end user. How do you reconcile those?

Tasso Argyros 32:36
Yeah, so so so first, maybe, for some context, right? To talk a little bit about where our users and customers started in Yugra was at CES, a retail company, right? Ecommerce, but retail, so we started in retail, but since then we’ve expanded and you know, we do all kinds of different b2c. So we do, you know, a lot of media, right? Like, you know, folks like News Corp, Washington Post Sony, we do a lot of financial services. And we do a lot of b2b. Right? We do a lot of combinations, b2b and b2c, folks like Dell, right HP, and others, all big enterprises, right, big enterprises, b2c to b2b. When I started action, I had an IQ I had, you know, again, I can write all my experience in the data space, right? And my observation was that setting up the data model, and they do pipelines sucked. It does work, things took a lot of time, right? And we’d get complicated. So my first criterion next cue was that I wanted to be able to reuse the exact same model that existed in the data warehouse. Which, by the way, the time now’s composability, it’s obvious. At the time, I think we were like, Sure, 510 years ahead of time, right when we said that, but Oh,

Eric Dodds 33:57
no question. Like, marketers weren’t thinking about the data warehouse 10 years ago. Right, and many are still today, actually. But that’s probably another topic. Yeah. And

Tasso Argyros 34:07
given vendors, right. I mean, if you look at the big vendors, right, every vendor has its own data model. And they expect someone to take the data from wherever it is, maybe the data warehouse and load ETL into their own data model. But when we started extracting you, you know, like eight years ago, now, we said, we have to be able to reuse the same data model. We can maybe augment it, or we can put metadata on the data model, right? But we want to use exactly the same data model as the original data warehouse, because again, my goal was not to build a new data mart with customer data. My goal was to take all the data that exists on the IT side and make it accessible by the business side. So that for certain things, right, so the approach we took early on was to say we’re going to support whatever data model is there, and we’re gonna allow the users to DAG The Data module to tell us what is an identity? What is an event? What’s the timestamp? And also what are the joint graphs, right in the data, we would leave, we would leave the data model from the data warehouse. Essentially, it’s more like caching the data on an extra queue rather than loading the data; transforming the data will give a cache of the data with action IQ. Interesting. Yeah. And then you could DAG the data model on top of it. But we had to be able to support it. That led us actually to implement our back end database as an in memory database. Because we had to support arbitrary queries and arbitrary models with interactive times, right, which is pretty hard to do, generally speaking, but it was essentially it was a lift, right? Like lift it and move it versus transform it. And so since then, right, we have expanded that and I can talk about how that ties to the UI and everything else if you’re interested. But now, for example, the b2b use cases, right, which tend to have more complex identity, as you were saying, when we supported one identity, now we can support hierarchy of identities, right? So you have to have a user that’s part of an account, it’s part of it, you know, like a big client, whatever that may be. Right? So we expanded support for the concept of more hierarchical identities, and a whole bunch of other stuff. But the fundamental principle, that one requirement, right, was for our sales force to be very open about what kind of data models support?

Eric Dodds 36:36
Fascinating. I mean, I have 100 more questions, and supporting arbitrary queries makes the whole thing make a lot more sense. And having a caching layer makes the whole thing make a lot more sense. Because traditional SAS vendors force you to basically input your data into pre-existing queries that they run already. Okay, I will stop. Costas, please jump in. I’m going to hand the mic to you.

Kostas Pardalis 37:01
Yeah. Thank you, Eric. So also, you mentioned that you started like seeing, like these use cases of clicks to Clickstream data, or like user event data, sees like your time like Aster data, right. I want to ask you, first of all, about what’s unique about this data from a technical perspective, right? Well, what makes it so challenging? Or say, or maybe not that challenging, we’ll see, to accommodate, let’s say, the processing of this data at scale, in traditional databases, right? And how these things have changed also, like through time, because you have market data today, like there’s a lot of progress that has happened. But there was a little bit about that, because my feeling is that it’s also like, pretty unique, in some ways, like the type of data that you have to work with. That’s right. Challenges like the systems in sales. So tell us a little bit more about that.

Tasso Argyros 37:59
Yes. So from a NASA perspective, first of all, I’ll answer your question very quickly. But just to state maybe the obvious, the reason why Clickstream data we work with was a business reason, primarily, which is the dollar per terabyte value of the data was low. Right? So if you have if you’re a bank, and you have data about your customers accounts and their balances, that data is worth a ton of money is very low versus the value clicks when you don’t even know if it’s valuable at all until you analyze it right? And maybe it’s not. So you can almost do it too, by doing say high value, you know, large scale with low dollar per terabyte. That’s the data we dealt with, right? You had 100 terabytes of low value data that will come to us because we could support the cost structure, right to make it economical. But to answer your question, I think what people don’t realize is that Clickstream data is time series data. Yeah. And that’s where a lot of the complexity comes, right. So a lot of what we had to do with Aster people was interesting, not just an identity, searching the data, but saying, Okay, what is the sequence of events? That leads to something good or bad, right? So you have, you know, we did a lot of stuff like, Okay, I remember right, we had the big grocery chain as a customer that we’re trying to figure out, what are the gateway products? What’s the kind of product that if a customer buys now they used to buy only, you know, grocery from me now they’re buying all their meat, MCs or whatever, right? So what is the what’s what are the paths in that Clickstream data that lead to positive or negative outcomes. And that’s time series data. Now SQL is a really bad language for time series data. Because SQL essentially is a way to model set theory is very good with SEC intersections unions, that’s what SQL is at the end of the day. But time series is now that right so we ended up building a lot of custom functions that would allow our users back in the Astrid days, right, there was still SQLite, which would allow our users to do time series queries on top of Clickstream data. So the way we would organize the data, store the data, partition the data, and we expanded SQL to support time series queries, there were a lot of innovations we did in addition to the basic architecture, right? There was nothing. But that can get very complicated. Because unless you know exactly what you’re doing from an implementation perspective, you know, if you tried to do time series analysis, with basic SQL, it’s just extremely slow. Ready have to save a ton of data around sisters and work. So that is one difference. Yeah. Okay.

Kostas Pardalis 40:44
That’s awesome. By the way, question now, like, naive question, but we’d have time series data, right? We pretty much like in, like convey industry, we have like a dedicated type of databases, like for time series data, right? Especially for things that work like in the observability space, right? Because all these things at the end live like time series data. I do have my opinion, what’s the difference there with customer data, though, but I don’t even know, like my, my opinion on that. Why not go and use one of these solutions, right? That’s, technically at least they are supposed to be working well, for data that are time series, and also, as you said, they have a very low, let’s say, value per terabyte of data. There couldn’t be data, like from a data center, like okay, whatever, like, Yeah, it’s interesting when things start breaking, but until then, it’s just like a lot of noise. Right? So, yeah,

Tasso Argyros 41:44
Why were we so bad back then, first of all, most of the systems didn’t exist, right? Like, when NASA existed, there were no time series databases. I mean, I shouldn’t say that. Let’s say the word popular, right? Or it wasn’t something we were aware of at the time, or we’re looking at, I think, today, you have the option of using that for some of that. But also, the customer queries are a little bit different. So I’ll give you a very concrete example. And I know we have a technical audience, right. So just to just to go for one minute, and it’s a little bit more technical. One of the things we created at Assar was a single Empath, you will get a regular expression of events. You could be a b star c. And we would map that to the Clickstream data. And you could define what a, b and c are, right? So it could be like you enter your website and this page starts to do like zero or more of these things. And since you end up in this checkout thing, we would map this regular expression on the time series data to help you find patterns across your customers. That’s not what time series databases do right, time series databases for the most part, they’re concerned with calculating aggregates and other metrics right on top of the data. Here, we’re looking for behavioral patterns that span weeks, potentially, right of data. So I would argue, even today, it’s probably a different problem. But at the time, there was not even the option of the time series databases to be considered.

Kostas Pardalis 43:15
Yeah, 100%, I totally agree with you, actually, I think, like user event data, and like Clickstream data have this very unique characteristic of being like a time series. But there are like quite a few dimensions of that to its point. That’s right. That’s right. makes the problem like, quite different than having, let’s say, calculating CPU users. Right? That’s like a completely different type. Yeah, you might have I don’t know, like, when we’re calculating, like CPU signals, like from a data center, probably you have orders of magnitude more data. But the dimensionality of the data is much lower. And that makes a huge difference, right? Like, in what gaming?

Tasso Argyros 43:55
That’s exactly right. And I think subtle differences in the actual order of events or sequence of events matter with behavior, right? So again, like Athena in observability, a lot of it is about the aggregate metrics, or when you hit a certain threshold, and this and that was cache behavior, right? You’re looking at why people are dropping off? Right? It’s a Y, right? It’s not a word or a when it’s why people drop off my website at a certain point, for example? And you don’t, for example, a big part of this is how do you visualize the data? Yep. To give people the opportunity to notice patterns, and figure out what questions they shouldn’t be asking there. Right. So you have all this, you know, fancy diagrams, right that we had actually, again, that’s what we didn’t do much of that right. But we had implemented some light UI on top of it. Or to give you another example, one of the interesting use cases or use cases of a tester was that people you may not be on LinkedIn with, so LinkedIn was an early customer. Number one, the lead data scientist, at the time was a brilliant guy called Jonathan Goldman. He worked under DJ Patil. When becoming the ability he was then the chief data scientist of the United States later on. And Jonathan uses us, you know, this technologist to create the first version of the people you may know, right now that is ubiquitous. This has nothing to do with metrics, right? You’re trying to see how people connect with each other? And whether scrap says about who you may or may not know. That’s not typically what time series databases will be concerned with. But it is event based or network based right? Problem, for example. So yeah, we have some really fascinating, early use cases that now there’s probably more diverse tooling, like I don’t think today, you would use a single platform maybe to do everything. Yep. But still, there’s much of the stuff we were doing back then that I’m not sure there’s a clear replacement for that type of operational analytics base. I think today, what people end up doing is, you know, you know them in a data lake, right? And then you can deploy something like data breaks that Iran has a more flexible language, right beyond SQL. And you essentially write and custom analytics and custom code to do what you have to do. So I think the modern approach has a lot more processing power available and is more customized, that is less abstracted, right? So I would say the world has moved on probably, for good reason. Better natural, it’s simpler to do today, the kind of things we were doing back then. Yeah,

Kostas Pardalis 46:45
yeah. 100%? And do you think you’d like something very interesting? You said about like, the, the value per terabyte of data rights back then. And someone who’s like, let’s say, You believe usually what’s going on, like with each industry would say, but the data you are talking about here, like the facts on IQ, right? Like it’s like customer behavior? Like isn’t this like the most important type of data you have in a company, right? So, especially like today, with all these like ML, like aI stuff, like all these, like, let’s say that workloads, like they are sifting a little bit more on like, building predictive power on top of like behaviors and all these things that like, okay, like 15 years ago, probably they were more of cases, do you think that these, like DAG of like, like dollar price per value? Sorry, per terabyte is changing, because of these new use cases and technologies, or it has remained still like, kind of like dealing with logs? Let’s say like, in? Yeah,

Tasso Argyros 47:57
I mean, I think today, the cost of processing data has come so low, right, storing and processing data is so much cheaper today than what it used to be that I think people look more at the aggregate value of data, because when I say dollar per terabyte, is low, you have so much terabyte that in aggregate, it could be super valuable data, right? So I feel today, the conversation has moved again, for good reason. It is less about the dollar per terabyte, and it’s more about what’s the aggregate value. And the other thing we see is that the cost is in the processing more than is less how much data you have. It’s more like, How expensive is the processing you want to do with it, you can store data, almost any data that you have today for very little, you can run simple processing on top of it for very little. But as we’ve seen with LLM training, for example, right? If you try to do some very complex processing it can get extremely expensive extremely quickly. So now I feel the metric is dollar bear. Cost of model right or cost of model is dollar per value donor to train the model over the value of the model. Yeah, yeah, it’s not about data size anymore. Is that about storing it’s about okay, how expensive is it and then human labor is super expensive. Right? Again, part of why action IQ is so successful is because every time you try to have a human interface between business and the data, that becomes a bottleneck very fast, right? There’s not enough competent people that you can insert in between the bids in the data to make it happen. So if you can make the business even a little bit more self service a little bit more agile, that’s a huge win. And it’s not that you’re saving money. I mean, you’re still gonna hire as many data engineers as you can, that’s what you know, instead of something that takes a month because everybody’s waiting on everybody else. Now you can do it in a day, that’s it allows the business to move at a much higher speed, right so I think these are the modern problems, I would say, are challenges. So they have moved on. But they still ask the question, right? It’s like, you know, how much is it worth for a modeler and inside, but it’s a different ratio that people are using to think about it. Yeah,

Kostas Pardalis 50:17
no, let’s, let’s the next one, we like to put it. So a product question now, it’s one thing like to go and build the product for one persona, right? So when you were at Aster data you were I mean, younger, but fortunately enough like to primarily, like build a product for very specific type of Persona, which is the persona, let’s say, system engineer or whatever, it’s a completely different set of problems, when you’re trying to build something that has like to be good for multiple. Not just like multiple personas, here, we have some people that in some cases, they pretty much hate each other because of, like, how different they are, right? So we have a data platform. So naturally, you need the involvement of some data engineering or like, like people at least, and then you have the marketeers, right, like the people who are actually interacting with the data and creating value out of these. And these two personas like, very different. Many organizations probably don’t even talk to each other, because like they are so like, not, because they hate each other. But just because of the different functions. How do you build a product, when you have to keep happy? Both of them right, and build, like, user experiences for both of them to succeed at the end? Yeah,

Tasso Argyros 51:47
It’s a great question. Great question. So it’s there’s, you know, for context, actually, like you were very, you know, we’re very enterprise focused, right. And usually, when we go into an enterprise, what we find is that there is a structure to do action, like you like things. How does this work? Right? There’s a team that, sometimes it’s called an analytics team or a marketing operations team or something. But these are people that understand data, they know how to write sequels. And then there’s a business team that’s using this marketing ops team, as a considerate team message, right? So you have the business folks, and then they’ll send an email, the submitted ticket, the Bigeye gift sitting out to these people, and they’re like, can you please pull me a list with these people? Or can you please help me understand how many customers we have that meet criteria? So our goal, so we assume that something like that there and some collaboration? Why is this important? Because it’s marketing ops folks. They already know the data. And they know what the business wants and what language they’re using. Right. So these are our, our eyes, in essentially implementing deploying an x ray tool. And the way I think about it is that they want to take this marketing ops folks with ease. they’re very competent, they have a lot of skills, right? It’s analysts, and tell them from a one no responders to requests to bring the administrators, the configurators, and the power users of action at you. And take 90% of those requests. And once they configure an extra queue, they push them to the higher layer interfaces we have. And give them over to marketers, right? So next neck you, for example, there’s a translation layer, that you can take database concepts, right, like a table or a column, and rename it, reformat it or do something to make it presentable to the business. Right? It’s almost like a dictionary. That translates database terminology to business terminology. And because there’s marketing ops, data analytics, customer analytics, things, whatever they’re called right center of excellence, right? Sometimes they have been going back and forth servicing requests for years, they know how to do the translation already, right? And that it’s a matter of giving them the tools where they can point external cues to the data sources, right? Presumably one or more data warehouses, right? Or data Mart’s build that dictionary of terms, right and then expose that to the business user. And then they only get involved when there’s new terms, new data, new requirements, or something that’s so complicated that the business needs someone to double check or whatever, but 90% of the stuff gets automated. And then, you know, we’re an enterprise company, right? So we support a lot of governance things. So you can have, you know, the analyst approving things before they go out, they can double so you can have checks and balances, right to make sure that they want to oversee what the business is doing, you know, to do that as well. We try to teach people how to fish correctly. I mean, that’s the idea sort of like you know, trying to feed them one fish at a time gets them how to fish and it also a huge improvement of life really analysts. Because these high urgency requests come from the business. You wake up in the morning and you have like five emails because somebody needs something today. It’s really not the fun part of the jobs. are not people, data engineers or data analysts.

Kostas Pardalis 55:18
Yeah. 100%. Okay, that was awesome. So, I would like to spend like a couple of the last minutes that we have here on talking a little bit more about CDP’s as a category of products, and also talk about something that we hear a lot like Qlik, which is composability overlap, right? So what does it mean? Like, for a CDP? Like a customer data platform to be composable? What’s, what are the semantics behind that? What and what are the last two questions actually like? One is, from a technical point of view, and one from like, the customer likes the user point of view, right?

Tasso Argyros 55:56
Yeah, exactly. So composability, in general, right to start there, and then beyond to be what it means is that essentially, it’s a different word for specialization, right? And optionality. So instead of having one thing that does everything, you have one thing that may be doing a lot of things, that technology gives you the opportunity to delegate certain parts of its functionality to other systems, specifically, in CDP. The biggest thing composability means is that instead of copying data over from the data warehouse to the CDP, and doing the processing in the CDP, right in the CDP vendors cloud, the CDP sends the queries down to the data warehouse, right it sees the data model, it doesn’t copy the data, pushes the query down, doesn’t pull the data out. Now, like the term CDP itself, right composite beat is being used today by certain vendors, right. So certain vendors use the word composability. To mean, you know, they will talk for example, about having a connector with a cloud data warehouse, but that connector doesn’t push anything down, right, just get some data out, but they may call this composable. So you have to be a little bit careful with, you know, the poetic license that marketing always has, as I’m sure Eric and all of us know. But you know, it’s composability for CDP’s means the Cloud Data Warehouse, or the data warehouse is your processing engine, essentially, Lambda way, we mean specifically adduction IQ, we have what we call hybrid compute, which means you can have most of your data in Databricks, right, or a Snowflake or Teradata. But you can still have some data and action IQ. If that makes sense. It’s completely up to the user. And you can have multiple systems that we access to gather data. So we essentially support the quality and duration layer as the base layer of action IQ. And on top of that we have maintained our own ability to store and process data. Right. Now the beauty of this is it’s completely up to, you know, the analysis and the data engineers right to decide how they want to manage this configuring, you can have a single cloud data warehouse and all the data is there, that’s what we use. Or you can have two or three, maybe it’s a cloud data warehouse and a couple of analytics data mart that has the data that we access, or you can have the same thing, but also have some data in an extra queue. As far as the user is concerned, they do not know and they should know, right? is completely transparent that playing the UI, the click buttons, and the queries are routed appropriately. Whether it’s in the you know, our customers, IT systems, data systems, whether it’s an action accused, on systems, that it has composure properly, and the results represent the UI. So we provide a lot of flexibility there. The benefit of composability, right, is that you don’t have to move, it’s governance and security. Right? Largely like the moment you copy data, you have to build pipelines, data copies can run out of sync, the benefit, they shouldn’t run out of sync, like if you know, you know, kind of a thing, right? If you have the data stored in many places, it can get out of sync, you can have definition problems. And then more and more, there’s more concern about security and privacy among our customers, right, this legislation, you know, GDPR all these things, more awareness around information, security and risks. So our customers love to not have to move data not only for governance, but also for security purposes. And so, you know, there’s a lot of benefits that come with composability. And this is something we have developed, I would say the last few years, we would have started there. But when we started that snack queue, the problem was the database technology at the time, did not scale well enough to support an interactive UI, right? The reason why this works today is because stores and computers separated again. Computers are not more elastic with modern technologies. Right. Again, the Databricks Snowflake souvent, Teradata, right Redshift, everybody’s evolving to separate computer storage and make compute elastic. So you can support a much more diverse mix of workload on top of the systems today versus what you could do before. 10 years ago, if you had axes, you’re going directly, like an oracle system, right or whatnot. Even this morning, squarey would probably take half an hour, right, the meeting things would tend to take a lot of time. So listen, we have other words that are high priority running. And that just could not work. Yeah, but being database people, right, the moment those systems became able to support this type of workload, we immediately said, This is it. That’s the future. We’ve seen it, we know it, and we bought the product to support essentially this hybrid architecture that can do either or, yeah,

Kostas Pardalis 1:00:51
That’s so interesting. Actually, we’d probably like to have a full episode just to talk about that. In myself, like, Okay, coming from Starburst and working with a federated query engine, what they find, like, extremely fascinating here is how much the workload matters actually, in making Federation work or not. Or like, Federation couldn’t work. That’s right. And one last thing, before I give the microphone back to Eric, I just, I don’t know, I find it fascinating how things like mix cycles in a way. The first time that I heard the term customer data platform was related to treasure data. And it’s funny because treasure data, and I’m pretty sure I’m not wrong here. But they built the first version of the platform on presto, that was a query. So it’s interesting to see how the concepts go back and forth and like how things need to mature to make things actually work. Right? Because probably, they were like, kind of like too early in what they were doing back then, in terms of like, making blood safe with technology, like work boggler. But it’s, it’s yeah, it’s

Tasso Argyros 1:02:10
a bit different. It’s a little bit different. I mean, differences, the business slightly different things. Also, I think it makes it very interesting. But I mean, treasure data doesn’t talk about composability and law at all, right? They support it or the plant supports it, but part of the reason is, a lot of what they’re doing is bringing data together. Like there’s some CDP’s whose job is to build data Mart’s like Katherine says with a data mart. Less than access to gastroenteritis. 60 that somewhere else? Yeah. And if you’re building the customer, 360 doesn’t make sense for you to be composable, right? I mean, you are you, you’re competing with a cloud data warehouse, essentially, right? If you’re that type of SNTP. But if you’re in a tabular city, be like us. That’s accessing data we started, as I mentioned, before I the first founding principle is use whatever data model is in place in the data warehouse, then composability makes a ton of sense, and it fits really well. into our model.

Kostas Pardalis 1:03:04
Yeah, 100%. Anyway, we need to definitely find more time to talk about that stuff. It’s like fascinating,

Tasso Argyros 1:03:09
super fast.

Kostas Pardalis 1:03:12
Eric, microphones back to you all yours. Yes,

Eric Dodds 1:03:16
while we’re at the buzzer, as we like to say but Tessa, okay, here’s my question. This is more of a personal question. So you have had a very unique journey, in that you founded a data infrastructure company and sold it, which is extremely difficult to do in its own right. And now you’ve built a successful company that serves business users. Okay, so if you had to start something new, but it could not be in SAS at all, what would you do?

Tasso Argyros 1:03:56
Oh, man, you know, that’s a great question. You know, the reason why I started actually IQs, because I love learning new things in that show. You know, it was such a big new challenge. So I haven’t thought about it, you know, I’m so obsessed with what I’m doing right now that I haven’t thought about it. It would be, but probably be something that it would, it could benefit from data, but it will have nothing to do with either SaaS or data infrastructure. Right? I will probably take my skills. I mean, I’m a huge believer in interdisciplinary opportunity. I think the opportunities are in these Venn diagrams, right, where lots of people do A and lots of people do B, but very few people understand A and B together. So I would ask myself right now that I have done a and b, what would be that C thing? Right? That would benefit everything. That’s how we think about it. But maybe I’ll think that question for the next time we talk Eric and oh, yeah,

Eric Dodds 1:04:55
absolutely. Yeah, we’ll do another episode in your future. They’re so big.

Tasso Argyros 1:05:02
There’s so many opportunities. Yeah, yes,

Eric Dodds 1:05:04
indeed. I thank you so much for giving us the time today. What a great episode.

Tasso Argyros 1:05:09
Yeah, I really enjoyed guys, thank you so much for having me here. Really fun.

Eric Dodds 1:05:14
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe to your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.