Episode 98:

Category Theory and the Mathematical Foundation of the Technologies We Use with Eric Daimler of Conexus

August 3, 2022

This week on The Data Stack Show, Eric and Kostas chat with Eric Daimler, CEO and Co-Founder of Conexus. During the episode, Eric discusses how to present complex data matters to those outside the industry, category theory, and how Conexus is bridging gaps at scale.

Play Video

Notes:

Share on twitter
Share on linkedin

Highlights from this week’s conversation include:

  • Eric’s background and career journey (3:30)
  • Presenting to people without knowledge of AI (11:04)
  • Why math was chosen over AI (19:03)
  • From compilers to databases (25:42)
  • The contribution of category theory (30:09)
  • The Conexus customer experience (37:45)
  • The primary user of Conexus (46:33)
  • Interacting with 300,000 databases (51:07)
  • When Conexus begins to add value (54:02)
  • The best way to learn this mathematical approach (55:46) 

The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.

Brooks Patterson 0:24
Hey, Data Stack Show listeners. Brooks here. Usually, I’m behind the scenes keeping things rolling for the show. But today, I’m coming out of hiding to share some exciting news. We have another live show coming up and we want you to join us for the recording. This time, we’re bringing back Tristan from Continual and Willem from Tecton to talk about the future of machine learning. We’ll record the show on August 10, at two o’clock Eastern, 11 o’clock Pacific. So mark your calendars and visit datastackshow.com/live to register today.

Eric Dodds 1:01
Welcome to The Data Stack Show. We have an exciting episode because we are going to talk about the White House, we are going to talk about math, and we are going to talk about a data company that solves really complex data problems all in one conversation with Eric, who is from Conexus, which is a fascinating company, and he’s a fascinating person cost us of course, I have to ask him what it was like to be an advisor for AI at the White House. I think typically when we think of the government, we don’t necessarily think about people solving issues or thinking deeply about the subject of AI. But that’s exactly what he did. And so I want to know what that meant practically day to day. So that’s what I’m going to ask. How about you?

Kostas Pardalis 1:48
Absolutely. I really want to hear like all the stories that he has to share or from like trying to heal the government understand what’s the implication like all the state of the art technology and heal them, like do, like, introduce all the right legislation and how these things happen. And like all these interactions, and it’s something that is like very, very different than what we are used to by like building businesses or like building products. So definitely like a lot of many questions there around that. But at the same time, like, he’s also representing a company that each one of these companies that they have a product that is very directly connected, very foundational research, especially with mathematics. So we’ll have like one of these rare opportunities where we can talk with someone and go through the whole like, say, from the product itself, and the experience of the cause. And the problem that it solves down to, let’s say, the core mathematics that are used to actually deliver these values. So let’s see what he has to say about the law itself.

Eric Dodds 2:53
All right, let’s do it.

Eric, thank you so much for giving us some time and joining us on The Data Stack Show. We can’t wait to chat.

Eric Daimler 3:01
It’s good to be here, Eric.

Eric Dodds 3:04
All right, well, you have an absolutely fascinating background, you’ve probably sat at almost every seat at the table that someone could think of when you think about a technology company in the data space, and then some that you wouldn’t think about. So can you just give us a quick history of all the different things you’ve done in roles that you played? And then what you’re doing today?

Eric Daimler 3:28
Sure, yeah, I’ve been told that a rare if not unique perspective in having exposure to the areas of AI from the perspective of being a researcher to being an entrepreneur to being a venture capitalist, and even spending time in Washington, DC. And that’s often how people will know me if they know my name is as acting as an AI authority during the last year of the Obama administration. Before that, I had spent time as a professor in computer science and academics and venture capitalists on Sandhill Road entrepreneur multiple times. So yeah, I’ve seen AI from a lot of a lot of different perspectives. And now I’m today after the stent with the White House, running a spin out of MIT’s from the math department and sitting on a couple of other boards, one of which was soft banks largest investment into AI petrol.

Eric Dodds 4:33
Wow. Kostas, I don’t even know where to start. This is so exciting. But, Eric, let’s dig in, I think where my mind went, and I think a lot of our listeners, so an advisor in the White House on the topic of AI, can you just tell us what was that like? What were your responsibilities? And then I think I’m so interested in what were some of the really specific things that came up that you worked through in that role?

Eric Daimler 5:04
Sure, I can say that it is a very privileged position, I was really grateful for the time who has worked with some really smart and dedicated people. And I hope to do it again someday, the rule itself actually has been elevated. There’s now an AI office inside what’s colloquially known as the science advisory group. I know that person that leads it, and some of the people that work inside of it are all super smart, competent, the right people there, and that and even the other job is not a cabinet-level job. But the job senior to that one is now a cabinet-level job. So lovely people working very hard on behalf of the American people. When I was there, there were other people in other areas of expertise, from space, to health care, there’s an expert on soil and agriculture, I just happen to be the authority on AI during my time, there was another in computer science, there was another Princeton professor, who was an expert on computer security, it’d be for me, the person was more of an expert on very large computing systems. But I am very happy to have been there when I was there. It was a hot time to be there around AI. What we did what we do, it was colloquially known as the science advisory group. It was it’s really nonpartisan, and that was my experience. So this was not a whole bunch of West Wing people in a screenplay or something or something. Yeah. These are nerds. Right. These are nerds? Yeah. They’re science thing. We know that we did not talk about politics. And really, for all I know, and actually, I didn’t know, in a couple of cases, people had different political views, even then, then the President that we serve, oh, fascinating. I know that the person who I reported, they said, I would serve a lot of presidents. But this president I served with enthusiasm. And that’s how I felt. The job was as was expressed, during my time, showed up as coordinating on behalf of the President humbly speaking on behalf of the President, the goals of the White House and coordinating the executive branch, the executive branch are in a state defense, of course, but also health, human services. And transportation is a big one, coordinating those efforts and AI, so generally the funding of research, but also a coordination of the goals and outlook that the federal government might have for the coming years. This got expressed then, in written reports, many of them were public, obviously, some of the work we did within the DOD is not the end, the intelligence community is not, but much of the work was public, I think you can even see this still on the White House archives, the work we did in it, and it really was helpful in coordinating a conversation that then we could share with Congress, who would then allocate funds, rihere do we want to go? What do we want to fund? What do we see happening in the future, we would take some lessons from what some of our allies would be doing. And, and vice versa. It was a wonderful experience where you have a very high-level perspective of, of AI initiatives. And this was actually a bigger deal than I had expected. Obviously, everybody knows the federal government is big, but one of the wonderful parts about that job. I get goosebumps even thinking about it, recalling that experience was that it was it is bigger than any organization, any other organization. So I get to see, oh, this is where people are experiencing roadblocks today then become a lot worse and richer, these are some of the big scale difficulties, people are going to be running into it because everybody knows that data is increasing the every two years sort of thing or every 18 months for computing power. But also, data is growing every Dakota doubling every two years or some such thing, better the exponential growth in data is well understood. But the equally exponential or quadratic to be more precise explosion in data sources is less appreciated. So the real complexity is that you have a combination of data and data sources. That’s an unfathomably large number of data relationships. That is just breaking systems, because, flip from me millions and billions to billions and trillions, you have to be thinking about your systems in a fundamentally different way. It’s really a phase change, I used to water, water, the gas is a fundamentally different way to be interacting with that ale of data relationships. And that’s what we saw begin to happen in the federal government.

Eric Dodds 10:25
Fascinating. Okay, one more question to satisfy my curiosity and then I’ll hand it over to Kostas because I know his mind is buzzing with questions. And this is just more of a curiosity in terms of taking research, papers, discoveries, recommendations, and say, presenting those to people who may not have a good understanding of what AI is. We all work in the data industry. And even then people, a lot of times will misuse the term AI or speak about it in a way that’s ambiguous. If you think about the wide audience of people who were exposed to the work that you and your team did, was it difficult say, if you were presenting a research paper to Congress, or they were digesting that, how did you approach the problem of not everyone has a baseline understanding of what AI is at the fundamental level? Was that a challenge?

Eric Daimler 11:30
I’ll actually say that not a challenge. You don’t present research papers to members of Congress, it just doesn’t work. Some of these people are smart, some of them are less. Some of them are some of the senators are very, very smart, some of them less so. But they still may not understand, and they shouldn’t be expected to understand the ninja does have a stack. So it’s a big part of the job actually to, to both work with my peers in the State Department, or the Defense Department, or the transportation department, or energy, it has some super smart people in the internet department, or by my peers at that level, and then go back to members of Congress and try to— you don’t say “dumb down” because many of these people are, are, are super smart in their own right, but try to simplify it in a way that is meaningful, so that they can have a grass to make more effective policy, I have a couple of conclusions from that experience. And it really was daily, if not ours, a lot, fasten would have briefing to make the social calendar wasn’t really a social calendar, because I would often be the entertainment at dinner, we’re talking about AI where in at some ambassador’s residence or with some was a Congress. So that the lesson I took is Tell, tell a simplified version of AI, and I can even share with you how I told it, worries can often be helpful. And then, and then the second lesson is AI, that we really need to bring more people into the conversation around AI. Because even if the members of Congress and senators at the federal level would understand this, we still have every state government to say nothing of other governments allies around the world from whom we also take some direction and where our companies are often subject to those laws, like GDPR as being a perfect example, Europe had, we could talk about their implementation and their modifications of GDPR. Over the last few years, despite having very, very smart people, they’ve been often misguided in those in their modifications to GDPR. So those two lessons, one is have a good definition that we’ll share. Another is just generally working to bring more people into the conversation of AI, what it is how we want it to be implemented. So the definition that I work with, that I found to resonate with members of Congress, what is that AI is a system, a system that collects data, census data. So that could be from the LIDAR on top of your car could be from the air quality sensor in your home, then through that sensor takes the data into a system that then cognates about it thinks about it plans it plans for action, that’s a traditional place that people would think of AI and I noticed I try not to get too pedantic about saying well AI, just that is with a subset of deterministic and probabilistic AI a subset of witches in machine learning a subset which is deep learning, right? For people that aren’t researchers day to day, right, but it’s a system that got a sense is plans and then x on those decisions, learning from the experience, so we take that whole system, and then and then apply it to how ordinary non AI professionals can get engaged. And we talk about automated car driving down the street, seeing something, is it a crosswalk? On the crosswalk? Is it a person at the Tumbleweed? Is it a shadow? What do I do slow stop come or keep going? Do I ask for driver intervention, then that’s, that’s a point that everybody can get that we as a society need to make a decision, we as a society will need to determine where do we put that liability on the driver on the manufacturer on odor week, so that we’ll have it now we’ll have it or you will have, we will have litigation around this to make that determination. Mercedes, for example, makes really no bones about them, biasing towards the safety of the driver. So you think, if I see an automated Mercedes coming out here, I might back off. We’re all part of Tesla’s beta test, whether we like it or not. They very clearly break the law, that’s kind of just their mode of operation for testing their autonomous software. So we need to engage more people in that conversation, use the definition, and work to engage more people.

Eric Dodds 16:19
Super helpful and super fascinating. I could keep going, but Kostas, please jump in.

Kostas Pardalis 16:27
I have a feeling you were keeping a lot of notes to share with the sales enablement team or something, right?

Eric Dodds 16:34
That’s right. Or actually, I will say, Eric, though, that is a very helpful definition. It’s the classic you’re at a cocktail party and, to your point. It’s not that people aren’t intelligent, it’s just that distilling a subject like AI, with all of the various componentry is kind of hard. And so I really appreciate that. I’m going to, I’m going to paraphrase that definition. In the future, if you don’t mind at the next cocktail party where AI comes up.

Eric Daimler 17:08
Please use it.

Kostas Pardalis 17:10
Yeah, that was pretty amazing. To be honest, it was one of the best taking some very, very complex concepts and distilling them down to things that like everyone can understand. So that’s a very, very rare skill. So I definitely understand why you have the position that you’ve had there. Like, it’s, it’s amazing. And I think like everyone, like the, it’s one of the skills are like anyone who’d like working with technology, which have like, work more, improving ourselves to be honest, because like, it’s a big problem that we have, especially when we introduce not like, which are these new pieces of technology that they are pretty much like, we also have to invent, let’s say new language words, like, like, people are just like, not ready. Like it takes time, like for doesn’t matter how smart you are, right? Like, you need to rewire your brain and start thinking in different ways. So that was amazing. I don’t know, if you write like, you have a blog, or like, do you plan to write a book at some point, but please do like, I think many people, thank you. Alright, so having said that, and thank you so much like for this amazing like, introduction. I like like to chat a little bit more about the company like Conexus. And we were talking like until now about AI, which is, let’s say, the holy grail of data, like we collect all these massive amounts of data that’s at some point, like, we want like to build these models that they are doing, like the user data and like help automate like big parts of our life in a very positive way. But Conexus is more like works on much lower level of this let’s say journey. You’re like sublight, same data, let’s say. So what made you from working with an AI for so long, go and build a company that works on much more, let’s say boring, in a way like, and don’t take this wrong. It’s not boring for me. But I’m pretty sure that you were discussing much more about AI. And you were discussing about how to create connections between data like with the people that you were meeting there, so, but I think there’s a very good reason and I’d love to hear more about that.

Eric Daimler 19:33
Yeah, thanks for that. You there are a lot of different levels at which we could talk about this. But I can take the last point which, to a non-nerd, it ain’t sexy, that’s for sure. It’s gonna be difficult to write a Hollywood screenplay about math.

Kostas Pardalis 19:55
Unless there are aliens involved. I see like, public and millions there that Have you tried to communicate with I think that script works, most of the time.

Eric Dodds 20:05
There is. I will pay for this movie.

Eric Daimler 20:07
There’s a brilliant woman, Eugenia Chang D, she does a fascinating job bringing math to life through the metaphor of baking. Baking pie is obviously kind of a pregnant way of saying this. But she even went so far as to have a children’s book that I bought, and I read to one of my nieces explaining math, and specifically categorical algebra. In the level, I read to a four-year-old, and so she’s brilliant, I’m a fan of hers, I can say that. The math is where it’s going. Even as I was studying computer science, the more I advanced in that domain, the more we got away from the syntax of the different languages, of course, and the more we got into the mathematics, what I have come to believe, is that we are not to be hyperbolic. But we’re entering a new epic, where we are shifting the framework from that of logic that helped our current in the infrastructure of computing create itself to another epic, which is that of composability. You see expressions of the concept of composability in such things as quantum computing, and in specifically in quantum compilers, where we would not as humans be able to understand the output of quantum computers. Without the math of categorical algebra or category theory or type theory. You see other expressions of composability, with smart contracts, the structure of which would not be able to exist without categorical algebra for category theory, that math helps you understand and analyze these increasingly complex systems. And there’s really no other way to do that the math that we all grew up with is the math of the 20th century may not even say the math of the 19th-century calculus, geometry trigonometry, that you it’s going to become a little bit like Latin, which is interesting, intellectually interesting. But, but less and less relevant. In today’s digital age, though, those are the maths that we will use for aerospace engineering or mechanical engineering, but for digital applications and the composed emerging compositional systems, we will be relying on the math of category theory and type theory and the operations of categorical algebra. So that’s where we’re going. And that’s what Conexus is building. So Conexus is built on a mathematical discovery. And that’s, that’s as foundational as you get. That’s a law of nature. There there are, that’s better than physics. Math is a strange thing. I will say a little aside, just to point out how nerdy some of these math professors are, one of which is our co-founder, David speedback. You go to your mentees math department, and now you’ll be able to tell you’re in the math department, two ways. One is no computers on the desks. That’s weird. The second is, blackboards not whiteboards. So we these are, these are hardcore, if you went to Central Casting, and you said, Give me a mathematician, your co-founder would would would pop up. And that’s what that math department looks like. So it’s a buddy letter, but a little excited about that. But this is the meat of math had a discovery, where this is sort of meta math of categorical algebra was applied to databases. So that translation of problems between spaces can now be done with databases that was expressed in software by Dr. was neski. That then began to have a commercial expression. And that’s when I found out about it initially put money into it and then decided to jump in full time. The reason is going to be the biggest trend we see over the next 10 to 20 years other people can be fascinated about other domains they may read about in the press. But this although it’s not telegenic. The map is not as easy to is you have a story to tell it is foundational, it’s going to make a very big difference. Category theory, categorical algebra will be the math that our kids learn. In the future, I might say the more math the better. But if I was to choose, I would be replacing calculus, geometry, trigonometry with statistics, probability, and category theory.

Kostas Pardalis 24:55
That’s pretty interesting. So okay, let’s see the back a little bit. And let’s talk about category theory. I was aware of like the impact of category theory and type theory has like, especially on compilers and functional languages, right? Like, it’s a big part of like the conversation that’s happening, especially like, and people have been working with Pascal and like the functionality but adding weight in writing content. So how did we go from these obligations, like with compilers, and like, computer languages, to databases, like what happened in this between like, and how did this happen? I’d love to learn about that.

Eric Daimler 25:41
The easiest way to get at that is just from what our customers tell us. And that’s the best that we can. We as engineers might really enjoy programming in Haskell, that’s a fun place to be. But your commercial expressions of Haskell are kind of few and far between. We even worked with Uber. And I’m not saying anything out of school here to say that they didn’t want to even open source their Haskell code, that code we created with them. Because it was Haskell. So it’s just not, it’s not that they want to just be affiliated with, they don’t want to, they don’t want to be part of it. It’s fun, it gives me we’ll all do that in retirement is just setting programming in Haskell. What our clients tell us… The clients of Conexus, they come to us (like Uber) to say, hey, we tried solving this problem other ways, and we reached a dead end. So Uber has an interesting story where they, like many companies have some very smart people who really exceptionally smart people have been an Uber in their technical at functions. And effectively infinite balance sheet with which to fund a optimal IT strategy. But neither of those allowed them to actually create an optimal IT infrastructure, despite having smart people and it had been a balance sheet, they grew up respecting the business. They grew up in that case, then as a ride-sharing company, city-by-city, for jurisdiction-by-jurisdiction. And the output of that was they were then prevented from easily respecting a privacy lattice, so that for instance, driver’s licenses might have a different privacy sensitivity than license plates that are depending on the jurisdiction, easy business questions, theoretically, such as, hey, there’s a sporting event coming up, how will dry driver supply be affected or writer demand, those could be done for Richmond, Virginia, or Charlotte. But they couldn’t be done for the all the whole state or the whole eastern seaboard of the US little the whole country or the whole world? So Uber then looked about how to solve this problem. How do I integrate 300,000 databases and their particular case, this is what I meant about the scale, the phase changing of 300 apps into databases, they realized this is to the point, Kostas says, they realized, hey, this needs to be solved at a deeper level, the commercial expressions are broken, they they they don’t extend. And you can the marketing language that you’ll often hear. But don’t extend to 300,000 databases in a way that’s feasible, how we need to look deeper, they came up with the solution of categorical algebra. And then they looked at we’re around the world about who are the leaders in that, and they found Conexus Conexus, happens to be 40 miles north of them. So that worked out. But we then co-developed software with them to solve that problem of bringing together 300,000, essentially 3,000 data models, and in a way that was guaranteed to maintain its integrity, which is really the point of the category theory, we can’t have four mutate into approximately four, left up for equal for every single time you have all that Uber could have done this without category theory. But they the budget on that I think they computed would have been roughly $2 trillion. So it’s just good. They wouldn’t they don’t do it, because of all the connections that category theory allows. So that actually the same as in quantum computing, you can use traditional methods of quantum mechanics to run the compiler, but then you start having to use imaginary numbers, which can be done, but it’s not pleasant. It’s not as easy. It’s not as dependable and high consequence contexts. You want to have math that is less susceptible to human error. And that’s where category theory comes in. And that’s it. That’s how it’s evolved as a way to be answering your question.

Kostas Pardalis 29:52
And how was category theory like seeing the way that we solve this problem and what’s the contribution of category theory? That’s here to make these problems tractable while they were not before. Right? So what’s the secret sauce would say if category theory?

Eric Daimler 30:08
Yeah, look, category theory is a sort of meta math. If you’re already talking about formal methods, then you already you’re already most of the way there, it’s really defining in a very precise way, giving the ability to define in a very precise way, a reasoning and general rules engine, but it then is encoded and shared with others. One of our one of the clients of Conexus came to us and they described it this way. Have you ever been in a room where you wanted to ensure that you heard everybody’s viewpoint, and you wanted to make sure that the loudest didn’t dominate the conversation, and the quietest God heard, such that everybody left the room, feeling that their opinion was represented exactly as they had set it. That’s how they set it. It Oh, they’re telling it is that that can happen sometimes in a consensus. So up to 30 people 30 engineers get together and they have a consensus about roll well, what a wellhead would look like and how you define it. But then you add the 31st person, and then all breaks again, you have to do it all over again. And that’s exactly the sort of problem that engineering and manufacturing say nothing of healthcare and logistics and financial services run into with some degree of regularity. What category theory allows, it allows for a logical composition of each of those respect the different definitions or meanings that the engineer represents, or the subject matter expert in more generally represents category theory just supplies that that that language, you get to you don’t have to worry about the syntax about how you might encode that. But the math allows for the semantics to be respected in any data transformation.

Kostas Pardalis 32:03
Okay, so how do we go from the business problem that we have? Let’s say I have, like, let’s make it simple. Okay. Let’s say we have like two databases, let’s get like the three founders 1000 databases. And we want like to align the two data models there. Okay. How we are doing that using category theory, like shall provide a little bit to understand the, let’s say, the experience that the developer would have, like trying to do that using category theory or the Conexus product?

Eric Daimler 32:35
Yeah, yeah. Well, today, this is the crux of the problem is a lot of existing solutions, you’ll deploy as a sort of proof of concept, hey, let’s deploy to, and now let’s do 200. And that’s exactly where they break. So Conexus is engaged with customers who’ve had this experience. And the people say, oh, yeah, I can do it with category theory. And then they scale up to even 10. And they start fudging it, who among us hasn’t ever hard coded a cell in Excel? We’ve done it before. And that’s what, that’s what often these people will do if they don’t have a foundation on which their product is built in category theory. So to databases isn’t, isn’t you’re not going to see a big difference. It’s really differentiated quadratic scaling, and a linear scaling for the complexity. So let’s just say five, and you don’t know doesn’t have to get terribly big. We have this experience Conexus did with a health care system in New York. So I didn’t know this was possible. But there’s one health care system, one, one system had different definitions of diabetes, between the group serving and maybe this is familiar to you, or some of the listeners, but for me, I thought, well, I can just look in the dictionary for a definition of diabetes, but know how that expressed itself is that one group would say, diabetes in the table, like the attribute would be diabetes and then in the row would be yes, no. And then in another division, they this might be in research. So it might be temporal, or it might be in the use clinic, clinician versus research. But the next, the next table might say, diabetes, how you treating it, or the next one might say diabetes, how long ago or the next one might have well-meaning clinicians that would say, Wow, they’re kind of diabetes, and then we treat it and it doesn’t appear to be showing up anymore for some reason, like, whatever. So you have different definitions of diabetes across the organization. What Conexus has seen is clients will do one of a few things. They’ll either normalize all of that so that just combine it squash it into diabetes, yes or no and lose the fidelity or the semantics from that. From the subject matter experts. As we talked about in the meeting, everybody’s meeting was lost because it’s just the lowest common denominator. Or they will, they will pick a couple of the ones that are easier to integrate Maybe get databases, diabetes, yes, no, and diabetes, how long ago? And then they’ll ignore the rest. And that often what happens, or they’ll just ignore all the other ones, they’ll just have one. And the others will remain dark to use a Gartner term for it dark data, data, data you collected but aren’t going to use? It’s really funny, actually is, it aside is, I feel like companies have gotten the memo about the data collection. Data is the new of oil and all that, like, there’s a lot of data now being collected. But yeah, I really am curious about how much of that data is actually being used. Because my experience is not a lot. They’re collecting it, but it’s not being put to good use by the data scientists. And it’s because of that difficulty that layer between data scientists, and data engineering. So back to the example is we can either ignore it, we can normalize it, or we can spend a whole bunch of money with ETL for tools. And then and, and Tata or Wipro, or Tibco, or Infosys, or Accenture, combining that data over a period of months or not years. That’s all suboptimal? So what category theory provides, as the background is a category three provides what Conexus provides for its client is it provides for a math of a way to have a mathematical representation of those different traits of those different columns in those different tables. So you can say for example, the diabetes, yes, no, is related to diabetes, how long ago, and you can keep the finality of diabetes, how long ago equals diabetes, yes or no, plus some other attribute. And then you can just keep going. So that’s actually how this gets expressed as you have a mathematical relationship that is able to be captured. So you can then begin to realize, oh, this is a little bit like graph theory, where my Ph.D. is in. It’s a little bit like graph theory, but it is a richness that is unavailable in graph theory. And that’s why it’s a little bit more like type theory. Because you have it, you can have an infinite amount of expressions inside of every edge and node and that in the vernacular category theory has its own vernacular.

Kostas Pardalis 37:13
Yeah. Okay, and how like, okay, let’s, let’s look a little bit about the what’s the experience that the customer has. Let’s say the hospitals come to you. What’s the journey that they have to go through together with Conexus to build these rule engine and represent all the semantics that they have around their data using category theory?

Eric Daimler 37:41
Yeah, it’s a great question. And I’ll tell you, it’s really an easy start for any, for engineers, accountants, lawyers, or people that are often trained to be at least aware of the precise meanings of their words, and other people can get this to those people that haven’t necessarily been trained in those disciplines. But I’ve often I find that those disciplines are, I find this to be a little easier, which is, as a subject matter expert of any flavor. As a subject matter expert, you want to define what we will call an ontology log. So ontology log, so that is, so it’s not just a graph database, where you just have a whole bunch of data. It’s the data plus the data relationships. So every person has this symbol. Every person has a name. That’s person name every person has. That’s it. That’s the simplest part of an ontology log. And here’s where it gets important. You say, Well, whatever, Eric, that’s trivial. Yes, it is. But here’s how this gets messed up all the time. So just to use a common example of a good eye. For me, you’d say, well, Eric has a nose and eyes, Eric has ears and eyes, ears and eyes. So were in there that I say singular, and where did I say plural? That can matter? It matters a lot. You want to be super, super clear. When you’re writing down as a subject matter expert, my knowledge of mine face, nose, eyes, ears. Yeah, so you gotta write that down in the ontology log. Every head has two eyes, really, maybe both working, one nose, maybe working two years maybe working that you got to write down with that level of precision. Any subject matter expert that kind of lives with their work has this implicit knowledge, they then need to represent that implicit knowledge explicitly. And that’s what Conexus helps its clients do but that really we get access to isn’t gonna read anybody’s mind and it’s not magic. So that has to be cut captured by the subject matter expert, and we facilitate that and are working overtime to make that easier.

Kostas Pardalis 39:58
Yeah. So, okay, I have a couple of follow-up questions here. First, when it comes to domain experts, usually domain experts don’t know category theory because they spend their time getting really deep into something else, right? Like, for example, someone might be like a medical doctor, or they might be like, I don’t know, like, an expert in finance or whatever. How do we help these people? And gold’s these knowledge that they have into something that then can be used to accurately being represented using category theory?

Eric Daimler 40:33
Right. This is great. In order to do this, we would be at a huge disadvantage if somehow we weren’t requiring people to learn any math, let alone abstract math. This is not probably bringing up warm memories for people that studied abstract math at its school, say category theory, I think is easier than calculus, frankly. So I think people that are getting into it might enjoy it, especially with the easy-on ramps available from the likes of Eugenia Chang, and others. One of our co-founders wrote two books on category theory, they’re excellent. So people may enjoy it, but they don’t have to learn it, in order to do what we are describing here in order to capture diabetes. Yes, no diabetes, how long ago diabetes, how are we treating in order to capture that, you just have to start with a logical diagram, we’ll call it an O log, ontology, log on and all log, you just have to create the ontology lon, transferring that old log with the right syntax into the software. That’s the job that Conexus will do. And that’s, that’s super easy for us. But that’s all it is. It’s just a syntax, not like SQL, because we’re dealing with databases, that’s what Conexus does. So it’s really super easy, a lot of kind of depending on the level of sophistication or interest that your people can pick up this little modification of CQ L and a long weekend. That’s, that’s, that’s how it happens. That’s what happens next ontology long syntax into building a Conexus.

Kostas Pardalis 42:11
Okay, and now the tricky question: how do we deal with changes in the semantics? Because, okay, what do we know, today about diabetes might change in the future? Right. And that’s one of the resources that we have also, like, all these different versions of what the abilities might be. So what like the process of going back and refining the Scientology logo? And representing this back to the, to the rules that we have, like, how does this happen? And how do we, like, how do we find out that we have to do that also.

Eric Daimler 42:50
The knowledge that any subject matter expert, a represents on an ontology log gets captured in the software, the if the software needs to be modified to represent a change in the ontology log that would be driven by the subject matter expert, the software is only an automatic reasoning engine, it just looks for all the possible connections. This is a factorial explosion of possible connections, or combinatorial is a term that people are often familiar with combinatorial explosion of, of relationships. So it’s a reasoning engine about those relationships, but it’s not going to read anybody’s mind about the changing reality that mandated a change in the design of the flange on the wellhead I got. That’s the subject matter expert’s responsibility.

Kostas Pardalis 43:48
Okay. And, okay, we do like these things we create, like the rules. And then what’s like, what’s the experience that we’re having there? Are we able toquery these rule engine, instead of going and creating all the different databases? How does this work in like, in whatever would have been used to call it say that an analytics environment that the company has, right, like, how does this work, how we can integrate these as part of like the data warehouses that we have and the databases?

Eric Daimler 44:27
Yeah. The experience of a user is really going to be transparent. And to build on your last question about that, you’ll get some benefits such as just contradiction detection. So another answer to your question also addresses the last one, which is what are the benefits and how does it show up? It will tell you whether you have contradictions which can often happen if you have 30 engineers in a room and again, all analyzing a complex systems. You may this is what happens today in many of the situations, you may have to go through iterations, many, many iterations to expose the contradictions in the different viewpoints of the subject matter experts visit automatic reasoning engine that is powered by creating a Conexus that’s available immediately. This, push the button and then you will see the contradictions in the data models that then will require perhaps a change in semantics or change on the ontology that you might have taken years to discover. And perhaps after, after something bad happened, the experience day to day should really be transparent because you’re no longer needing to develop a consensus and abide by a consensus with others. Once you have a Conexus you’re just experiencing it the same way you’d interact with any other sort of database or with SQL, you have your version. And that remains the case and your x, your who you’re accessing that, in relation to that the linkage, the automatically created linkage between all of the other definitions.

Kostas Pardalis 46:16
Okay, so Conexus is a tool that is used primarily by the data engineers, when while they are like maintaining, let’s say, the databases and the data lakes or the company like who’s like the latency that primary user?

Eric Daimler 46:32
Yeah, it operates somewhat at the level between a data scientist and a data engineer, you’d often that data engineers, although they’re maintaining a data, complex data infrastructure, are often they often have less autonomy than they may wish they’re told what to do at implement, we’re data scientists just want to get out there, they have maybe have more flexibility to get done what they want to get done. This is often driven by ears operating in a business group that need to guarantee the meaning of their semantics. That’s how it was driven Uber, that’s what’s driven. Most of our other clients, its engineers in a business unit, not driven by it that need to collaborate with their teammates, and not spend a couple of hours a day that we’ve heard, sometimes they’re spent. Yeah, exchanging Excel docs.

Kostas Pardalis 47:27
One last question before I give the stage to Eric, so how long it usually takes from like day one, but someone wants to build like this ontology logo and populate like the reasoning engine until they have Le t, everything in place, and they are going to start like using it.

Eric Daimler 47:48
The big issue is developing the ontology log that actually can take someone moving their job from something that’s implicit to explicit, a large, there’s a large time variance and how long it takes to create one of those. But I tell you this last week, we had somebody stay up after we told him about this just over dinner, they said at 10:30, at night, they sent us an ontology log of their job. And then we coded it up in the semantics of a Conexus in 30 minutes and send it back to them. So that’s, and they can then they can start reviewing that say, oh, is this the ontology log I met, just in the language of SQL really, and, and they can be good to go. The magic comes, of course, when you’re combining these different subject matter experts, these different ontology logs into one Conexus. That’s where the power comes, so the more the better, but it can go pretty quick. The time is in developing the ontology log, and then the weakness? It’s a question you didn’t ask, but it’s an important one. Where’s the weakness? Where does it fall down? There are some jobs where we can’t actually define them easily example was told to me last week where there was a person that walked around a big manufacturing plant and listened to the motors, okay, when would pull out parts designate, pull that part out, pull that part out? And how I heard the story was that nine out of 10 times the person was right, that it prevented? Yeah, it was a good preventative maintenance. And when that person retired, that there was a large increase in preventative maintenance budget. So they pulled the person out of retirement for an hour a week just to walk around and listen to the plan. That maybe can be replicated in machine learning and a microphone, but if you can’t take that implicit knowledge and put it down in a logical diagram, Conexus doesn’t have a lot of say.

Kostas Pardalis 49:41
Yeah, so it gets like, we’re not going to see somebody doing that like anytime soon, I guess, creating ontological logs over like that. The wind experiences that they’d have, but I guess that’s fine, but that’s okay. Eric, all yours.

Eric Dodds 49:57
Eric, a couple of questions here. I think your average person working in data probably doesn’t get to see the scale of 300,000 databases. 100 databases is a lot. 1,000 is a lot. So I’m interested to know, I feel like I have a good handle on the ontological log and how you prepare to do the math that’s required to reveal the relationships. But when it comes down to interacting with 300,000 databases, it’s hard for me to wrap my mind around what’s required on your end in order to, to actually interact with that many databases. Is that a significant infrastructural problem? Is that something that the customers enables on their infrastructure you have access to? How does that work practically when you engage, say within an Uber who has 300,000 databases?

Eric Daimler 51:07
Yeah, Conexus is not a bit store company. We never will see the data in that in that way. So we’re not storing petabytes of data. And it’s unless maybe different security protocols. So we have applications and defense in the intelligence community, that that may require different sort of security protocols. But generally, that’s the that would, that’s a complication of the video implementation, just a respect of that, of that sort of framework for deployment. But the deployment is really a pretty light one because it’s all in the code is actually, this isn’t oracle that you’re deploying, and that is not honestly, we work with any it’s cloud-native, but it works with any other cloud providers to kind of talk, okay, in some fashion. It’s really complimentary to all of that, what we do it what Conexus does. Conexus is just a way of doing what was previously difficult, if not impossible, I gotta say, some things were just didn’t feasible. That’s what I meant to say, infeasible, if not impossible, and Conexus just enables, enables that your 300,000 databases is where we’re going in many cases. But that is just an example of a sophisticated system, Conexus works with a financial services company that had a goal of taking 86 databases, down to one that was their, that was their vision. Now, there’s a bank where there’s actually the story that they told us this particular well-known US Bank 86 databases down to one, they budgeted $20,000,000 in years for this project. Five years and $120 million later, after people then got fired, they then went back and scaled-down that problem at six databases to 16. They then budgeted another 100 million dollars in five years and exceeded, but there’s the point where they knew that we then get involved because they say, well, at the end, I know that we still are left with a super fragile system in that every time we acquire another company, or divest of one of our assets. And we have to do it all over again, in some sense, or we introduce that degree of fragility. And that’s what developing a Conexus can provide for these firms that really can’t afford to have their data models be mutated.

Eric Dodds 53:34
Yep. Question on the discussion around two databases. Five, 100, 300,000. At what point is Conexus— Where do companies start feeling the pain? What’s the breakpoint of complexity where Conexus really starts to add value?

Eric Daimler 54:02
There are two ways where Conexus began to provide value. One is the general proxy is number of databases, and that you can just do the simple arithmetic of a linear versus quadratic explosion until you say, well, three, four, oh, five, yes, when you really start to notice a difference. Yeah. But the real answer, the more nuanced answer is, it 10, the sophistication of the data models, so if you have homogenous data models, then there’s not as much to say that we’re had a broad heterogeneity of data models, such as any engineering, energy, transportation, manufacturing, then then this, then you really have a very big problem where the consequence of failure is large, and that’s where Conexus provides the most value. That’s where people actually will come to us unsolicited.

Eric Dodds 55:00
Yeah, absolutely. And we’re closing in on time here. But one thing that I would love for you to provide our listeners is, we’ve talked about AI get did on the show, we’ve never talked about the mathematical componentry of it, or certainly how that’s influencing a technology like you’re building. What would you say to someone who wants to begin to study this now, so that as technology like this experiences wide adoption, what are the best ways for them to start to learn this mathematical-focused approach that you’re using?

Eric Daimler 55:43
Sure. Categorical algebra has been around since 1948. The mathematical discovery upon which Conexus builds is happened in 2011, out of MIT, so I might look for texts and videos for a probe category theory. Since that time, we mentioned Eugenia Chang, one of our co-founders, Dr. speedback, is another one and one of his co-founders Brendan Fong, in academia and its collaborators, those people are excellent resources. To learn more about category theory, categorical algebra, that’s a great place for the mathematical foundation, you don’t really need to learn that often as a programmer, as a coder, as a computer scientist, formal methods are maybe a nice, easy gateway into that some people might be more comfortable through type theory as a gateway. And to that, I often will say that category theory is like graph theory, but with more structure, and just because that’s my background, but I think I think that, oh, they may also benefit by just remembering that there is the theory, category theory. And then there’s applied category theory, or that’s why we called categorical algebra can often sound a little easier for people because you can immediately just think of examples that might be day-to-day helpful.

Eric Dodds 57:06
Very cool. Well, Eric, this has been just absolutely fascinating. Thank you for enlightening us on so many subjects. It’s not often that you get to talk about white, the White House and mathematical theory in the same conversation, but you have brought those worlds together. And we’ve learned a ton. So thank you, again, for spending some time with us today.

Eric Daimler 57:27
It’s just been fun. Thank you, Eric. Thank you, Kostas.

Eric Dodds 57:31
Kostas, I have three takeaways. I guess I keep breaking the rules more and more. The problem’s getting worse because we’re supposed to have one major takeaway, but here are my three. I’m just on a roll after Brooks not being here for a couple episodes and so I’m still sowing my wild oats, going off script. So one is just how smart people are. I felt every five minutes we covered a subject where it was clear that Eric was deeply knowledgeable, probably on an expert fundamental level, in all these different topics, which is really fascinating. My second one is, I was just a good reminder that a lot of times, we take it for granted that programming or working with data or solving problems around that has a foundation in math. It almost seems obvious, but I forget about that a lot. And so I think it was fun to see him really draw the direct connection between mathematics and some of the things that we do day to day are the problems is off. And then the last one, which is perhaps my favorite, was talking about the mathematics department at MIT, where you go in there and there are no computers and only blackboards, and it’s like staying true to mathematics. And I just love that, that mental picture. So how about you?

Kostas Pardalis 59:03
Yeah, what I’ll keep from the conversation that we’ve had is how much more work can be done in introducing new technologies and new things like doing computations, and how this contains, like, the type of problems that we can solve. So we might think that like, things are like everything that could be solved, like solved, but actually, it’s not likely. And at the same time, I get like, I get this feeling of like, fascination of how, like so abstract concepts and stuff like category theory that started as a very abstract and technical mathematical tool for explaining of like, becoming, they were trying like to build the foundation of the rest of the mathematics, let’s say, ends up like solving, like real life problems that affect me in we call an Uber or Lyft to come and picks us up? Right. So that’s why one of the biggest joys that I have like doing the work that I’m doing, and also the show here is I keep, like, learning and get really relearning this again and again. And it’s like so, so fascinating for me. And it’s one of the reasons that I really love, like, doing the stuff that I’m doing.

Eric Dodds 1:00:26
I agree. I agree. Yeah, I think when you think about the practical nature of the example, you give around something like diabetes, you realize, Wow. Not to be too dramatic but, on some level lives can be on the line, if you get certain things wrong, so definitely some weighty stuff and a fascinating guy, many more great episodes coming up, and we will catch you on the next one.

We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.