This week on The Data Stack Show, Eric and Kostas chat with Prateek Joshi, the CEO of Plutoshift. During the episode, Prateek discusses data that’s emitted from the physical world by sensors attached to all sorts of different devices, specifically in manufacturing, and data’s responsibility when it comes to sustainability.
Highlights from this week’s conversation include:
The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Eric Dodds 0:05
Welcome to The Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You’ll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by RudderStack, the CDP for developers. You can learn more at RudderStack.com.
Alright, Kostas, this is going to be such a fun show. One, I’m losing my voice for some reason. I think I talked too much in meetings yesterday. But two, we’re going to talk about a sort of type of data and context for data that we’ve never talked about on the show. And that is, data that’s emitted from the physical world by sensors attached to all sorts of different devices, specifically in manufacturing. We’re going to talk with Prateek. He runs on a platform that collects all of this data from all of these different physical devices and sensors and factories, which is fascinating. My big question is going to be about the data itself. Actually, you guys, when we talk about data in the context of a normal business, you have your transactional data, you have your analytics type data. And we talk a ton about that on the show. But I’m just really curious to know what it’s like to work with data that’s being emitted from sensors. So that’s what I’m about to ask, how about you?
Kostas Pardalis 1:26
Yeah, I’d love to learn more about what it means to instrument the physical world, how we can collect this data, and what are the challenges? And do we need a different kind of stack to work with this data? So I think we’ve had the right person today to have this discussion. So let’s jump in.
Eric Dodds 1:47
Pratik, welcome to The Data Stack Show. We are so excited to have you.
Prateek Joshi 1:51
Thanks. It’s great to be here.
Eric Dodds 1:54
All right. Okay, so I have so many questions about data in the physical world and sustainability. But of course, let’s start where we always do, give us your background and what you do today at Plutoshift.
Prateek Joshi 2:09
Sure. I am the founder and CEO of Plutoshift. Plutoshift is a data platform for industrial sustainability. We help industrial companies achieve their sustainability goals, lead smarter operations, what that means is operational data, like pressure, temperature flow rate, it flows into the platform, and what comes out are metrics that the operations teams can use to monitor that physical infrastructure. And we help companies that make physical products like beer and ketchup, and she’s, so these operations are very continuous. So keeping an eye on them at all times, and detecting and predicting events of interest is of the utmost is of utmost importance to our customers. So that’s what we do. At the end of it, we help reduce the consumption of resources like electricity, chemicals and water, which are very critical for carbon footprint. With regards to my journey into machine learning. I pretty much machine learning is all I’ve done in my career, I started my career at Nvidia, and over the last few years, ended up building systems for mobile cloud edge. And I’ve worked across a number of data types like images, text and time series data. And along the way, ended up publishing a few books, technical books on the topic, mostly oriented towards developers on how to build production machine learning systems. How do you think about applications data? How do you make sure that it addresses a variety of use cases? So yeah, that’s been my journey into this field.
Eric Dodds 3:59
Very cool. Okay. I’d like to start with something that you mentioned when we were chatting briefly before the show. You said that in terms of data tooling, so for every 100 tools in the world of software, there’s like one tool for the physical world. So, which is interesting, right? The world of software is really young, but there are new data tools every day. The space is just completely exploding. So I’m interested to know why you think that is because intuitively, I would think, Okay, well, manufacturing has been around for way longer than software and so why are there so few sort of advanced data tools for the physical world, especially when you think about manufacturing and physical processes like that.
Prateek Joshi 4:51
That’s a wonderful question. When we talk about tools, we mean software tools, right, so for the world Build of software, we have many, many tools, precisely because they are software native meaning software, by definition is native to itself. And when you build a tool to, for software, they play very well together by design, right. So if let’s say you’re, you have a data center, or you have a large number of servers, you need to keep an eye on. And you build a tool that can take in that data and provide you with the metrics, all of that they are in the same play the same game, same framework, right. And when it comes to the physical world, it’s been around for way longer, and it’s not, at least in the past, now go back 10, 15, 20 years, it wasn’t instrumented in a way that all the tools can play nicely with it. So that’s why just by design software, allows any infrastructure we use to build software, it by design, it lends itself nicely to all the tooling. But in the physical world, the infra is not commoditized, meaning collecting the data storing retrieving connection to the physical source, making sure it’s continuous, it doesn’t break. And also the people are the people who run the physical systems, they have their own requirements. They’re not, they’re not software engineers. So because of this gap, right, the software world really went way, way ahead. Now, in the last, last 10 years, physical infrastructure has been getting instrumented a lot. In fact, today, I would say most of it is commoditized. Meaning, if you have a large number of pumps, or membranes, or filters, you install sensors that are connected to the internet, they BEAM data to the gateway, and then to the cloud and to a nice data warehouse from there, you can connect to it via API and build amazing applications. Now, all of that happened in the last 10 years. So for any, any company or any, any developer, for them to build any kind of tooling, this, this infrastructure needs to exist. And that’s, that’s what’s happening now. So that’s how I look at, look at the amount of the number of tools that exist, the difference and how vertical infrastructure is now definitely catching up.
Eric Dodds 7:26
Yeah, that’s interesting. Another thing and I’d love because I really don’t know a ton about physical manufacturing. But the other side of it is your point on sort of software sort of being native to itself, almost right. Like it, it naturally produces data, right, or, like, interacts with or relies on data. And so there’s sort of a fundamental principle there. Whereas in the physical world, let’s say there’s a process where there’s canning or bottling or something like that, right. And I know, this is changing, as you said, but the machinery is designed to put caps on bottles, right? Not necessarily, like produce data, right? It’s designed to, like, do a very specific thing at a certain speed consistently and you need enough data to like make sure that the machine keeps running. But it’s not necessarily like a fundamental layer in the structure of the thing itself.
Prateek Joshi 8:25
Right. That’s actually a very good way of putting it. And that’s precisely the point is, by design, as you said, a process in the physical world, is designed to make that product. In this case, let’s say if the product is beverage, right. The process is designed to handle the steps in between and produce the result, which in this case, is beverage, and along the way, if you want to keep an eye on it, use that data to make decisions. You need to instrument the process, right? By itself. It’s not gonna, it’s not going to just produce the data. And actually, it works in a similar way. And in software as well, meaning, let’s say you have, you have a large number of servers or applications, unless they’re instrumented to BEAM data to you, they’re not going to do it by themselves. But it’s kind of like almost like you do it almost because it’s there. It’s easy to do and doing it is comes. It’s, it’s almost like feels native to just do it, right. It’s not like a separate task. You write a function, you do a quick test. You want it to be in beta. So that’s, yeah, it’s a different framework.
Eric Dodds 9:42
Yeah, absolutely. Instead of opening a code editor, you want to instrument a physical machine like someone has to go like, take the machine apart and install something. Let’s talk a little bit about I’m not gonna steal one more question cops just because I’m just loving this so much. Let’s talk about the types of data. I think in the world of software you’re dealing with there’s certainly various types of data, but you have your usual suspects, right? You have sort of customer issue data, call logs, customer records, clickstream, events, blah, blah, blah, financial transactions inventory, all that sort of stuff, right. Is it talk about, you talked about pressure, you talked about temperature, flow rate. Those are all very different units, I’m sure that sort of the instrumentation for capturing those is different. How is it dealing with that data? Is it just a totally different world than working with sort of your standard, like, say, data that drives the eye for a consumer business?
Prateek Joshi 10:53
In the physical world, the analogy I like to use is, there’s process and there’s acid, right? So the equivalent of that, in the cyber world is server and application, right? One is like real thinking and touch and feel application is abstract, like you define the application, it does a certain thing. Yeah, so similarly, the physical world asset is a pump or a membrane. And a process is, hey, I need the 79 membranes to do X, Y, and Z. And here are like seven steps to do it. So. So that’s one. So that’s the, that’s what we work with in the physical world. And the type of data that ends up coming out is what would be the operational data like sensor data, what I mentioned, temperature, pressure flow rate, all of that is measured by sensors that are installed at the right location. And then we have ERP data. ERP is already said about three resources. Basically, it contains data about how many resources was being consumed, when what scheduling, all of that information comes from ERP. And the third type is CMMS. Maintenance. Basically, the infrastructure is being managed by the operations teams. And what do they do on a daily basis, basically, if they do something, they got to make a note, because if your shift ends and the next person comes in, they need to know, right? Hey, that membrane number 72, underwent a maintenance event, right, or pump number nine, has been consuming for X amount of electricity, so need to take a look. So basically, CMMS system stored all these maintenance events. And these are the three main types of data we end up working with. And again, within each time, there is so many so much variety. So but that’s what we work with. Now, to build a product. In the physical world, it works very similar to like a workflow management tool, meaning you have to define a workflow, you have to define what data what format it accepts. And then once you define that, then the workflow becomes well defined, the model plays a role, it outputs the specific metrics in real-time. And that’s what we end up doing. Because you’re right, like, if you don’t define anything, that the variety is so much that it becomes like a custom project every time. So the build-up product will say that, okay, let’s say there’s a workflow called membrane monitoring, right? The by definition is good, it’s going to monitor membranes. And it’s going to accept, say, 14 columns of data. And column number one should be temperature, column number two is fresher, and so on and so forth. So basically, you define the types of data that can enter and the right format. It goes through the workflow is what we design, it’s design of the product. And then it outputs the metrics that can be consumed on on mobile cloud. And yeah, that’s how we ended up practicing this offering.
Eric Dodds 14:00
Yeah, super interesting. Okay. Kostas, I’ve been holding the mic too long.
Kostas Pardalis 14:07
Oh, no, no. That was awesome. So okay. Couple of questions, Prateek, but I’d like to start with an observation because like, I hear you talking all this time. And you’re describing processes that like engineers are like, probably familiar with, like, how we instrument and collect the data and the use of the data to figure out what works and what not and all that stuff. And there is like a, like something similar happening in the physical world. Now, in the digital world, in cyberspace, we have like very specific technologies and stacks that we are using, right? It’s like, we have the snowflakes of the world out there. We have data warehouses, we have time series data bases, we have all the things that you can find in the AWS catalog goes like 179 More I don’t know how many products they have. Right, right, right. Is there any equivalence between like how, like, what’s the stock in the physical world? You mentioned a few things about like sensors, like the ERP tools and all that stuff. But if someone wanted today to go out there and build, like an instrumentation of a factory, right, to instrument, the factory, what do we need? And what are the similarities to the physical world?
Prateek Joshi 15:28
Yeah. Not actually, there are two. I’m going to divide the stack into two big parts. And we’ll dive into each one is instrumenting, a physical system and getting the data to a data warehouse? Let’s call that one part. The other part is, once you get into a data warehouse, how do you process that? How do you structure it extract knowledge and delivered to the end user? Now the second part works very similar to how it works in cyberspace. Actually, that’s a good thing. Because once it gets to a Snowflake, we can use a very good, well-defined, well-accepted tools to process that data and deliver a great software product. But the first half is where it gets very interesting because that is that has no equivalent in the cyberspace. So if you are running a facility, and it’s not instrumented in any way, it’s not doing anything in terms of data. So what you want to do is one, you want to let’s say you have a simple example, let’s say you are running 45. membranes, you want to collect some basic data. So you, what do you want to do is you want to install sensors, like basic sensors that are connected to that can talk to the internet, and big companies have started manufacturing these at a very low cost, like Honeywell is a good example. But many, many hardware companies, they sell sensors that can talk to the internet, and they’re very ill-defined, okay, so you installed a whole bunch of sensors. Now, from there on the facility, all of that, all those sensors, they’re not powerful enough to stand on their own, or they could be but it’s gonna be very expensive. So you install low-cost sensors that are connected to a gateway locally, like Intel makes gateways, really, it’s like a big black box that can talk to the 100 sensors in the facility. So the sensors BEAM the data to this gateway, the gateway is very powerful lot of processing power, it can connect to the internet, it can be in the data to the cloud, let’s say AWS. And once the raw data comes in, the next step is we pre process it, we put it in a data warehouse, or a datalake, depending on your choice. And once it gets there, after that, it’s the metrology standard, you connect to it via API, you pull your carrier, you can do whatever you want, like, that is good data intelligence product. And really, that’s how at a high level, that’s what, that’s how it works. And there are many firms like they are they’re called, like, they’re called integrators, right? You just go to them and tell them, hey, I want to instrument my physical system, they’ll go out and they’ll shop for hardware and the software, you need to for the sensors to talk to the Gateway, they’ll figure out all of that for you. And they’ll just build it and your factory is now instrumented. So that’s how it works here.
Kostas Pardalis 18:24
How long it usually takes, like a project like this, like, let’s say to instrument like, like a factory, right? Because it’s also it sounds like in software, we’re experienced enough to be like, Okay, we started a new project, let’s start the testing from the beginning, let’s write the unit test. Let’s do the instrumentation like from the beginning in the last but like, Okay, if factory obviously, like does not work like this, you might be a bottler for like, I don’t know, like the gauge, and you decide to go into like instruments, how long usually takes and what kind of like investment, like in resources from a company is required for something like that?
Prateek Joshi 19:04
Yeah, the way I’ve seen this work is companies that are doing this for the first time, what they do is they choose a facility, and within that facility, they choose a specific part of it, just to see how it works. They want to get gonna get familiar with buying sensors, installing it, running it, and then having a steady stream of data. And for that the timeline I’ve seen timelines vary a lot. So there’s that isn’t like a, like an exact timeline, every company will, we’ll get to but on average, just considering all the people who need to sign off and the time it takes to get something running. I’ve seen something from six to nine months. And again, it’s not just like installing the sensors is very fast as I was learning the point. The point is, if you’ve never done this before, you need to understand okay, I have membranes from this company and I need this type of data. So who may So sensor, so it’s not exactly a copy-paste solution. So that so if you haven’t done that, that’s what it takes, and then going from there to the next facility to the one after that would be pretty fast, mostly because you know how to do it, you know who does what, and you know how the system works. As far as the and also, the good thing is, all the hardware that’s being sold, in the more recent years, they already come equipped with all the sensor. So if you built or if you upgraded this facility in the last decade, you’re fine. But if as you said, if you have been running this for like a few decades, and you’re doing this for the first time, that’s when you need to think about, okay, that how do I instrument so that I can collect the data and get this running. So that’s what we’re looking at. And also, you said one more thing about investment, investment really depends on how many sensors you buy centers are very, very low cost. But the thing is, similarly, for example, let’s say you have 45 members, each membrane can be instrumented with just one sensor or 100 sensors, right? 100 sensors obviously means you will have amazing data very high, very high level of detail on every aspect of it. So let’s again, companies decide, let’s say they have a budget that allows them to install only four sensors, right? Different types of sensors on a membrane, then they’ll say, Okay, since I only get four, I’m going to choose the critical ones like temperature, pressure, and milk. So that’s how it usually plays out. There is no, I’ve seen companies that are have, they just want a very, very rich data. And they just installed a whole bunch of sensors. So they had a lot of budget. And I have seen companies where they’re like only one sensor PER BIG device. So again, the problem is, if you don’t instrument it enough, it will feel like oh, I invested all this money, where’s the outcome? Right? It’s like, there’s a certain minimum level of data richness, you need to actually do make something happen. So that’s, that’s where it gets gets interesting. So people plan and make sure that they don’t invest like a million dollars and nothing comes out of it.
Eric Dodds 22:04
Kostas, can I interrupt? Just one quick question. Does the data format vary by sensor manufacturer? Or is there sort of like an open data standard for say temperature or something?
Prateek Joshi 22:20
Luckily, they’re standardized the way in which data flows. So it’s basically time series data. So it so people have mutually agreed upon this format, where there’s a timestamp, and there’s a value, right? So if you paying, if you’re paying a sensor, all right, you’ll get like a stream of data. And in terms of frequency, again, it’s customizable, but that’s the good part. If you go to like their data store and download, like the last 12 months of data, it’s pretty nice. It’s timestamp and value, right? And parsing that has become obviously it’s fantastic. There are so many times CDs-oriented databases that you can use to explore that data.
Kostas Pardalis 23:00
That was interesting actually. That’s good that we have like consensus around data for at least one place in the world. Probably tildes, ladder gates. A time series is a time series. You have diamonds, you have, like some kind of numerical value, there’s right, then there’s a mandate to the like, on the scene that like Nazis are but Okay, back to what we were discussing. So okay, let’s say we go instruments, we start getting like rich, the rich enough data on all these things. We store them like a data warehouse. What is the lowest hanging fruits that you have seen for companies out there to go after as soon as they have the first data flowing out of their facilities? What’s the first thing that I want to get in return of investment from that?
Prateek Joshi 23:59
Yeah, resource consumption is the lowest hanging fruit. And the reason is, let’s say that you installed sensors, right? And you made this investment in a, these machine learning tools are, are built to enhance, at least in this case, enhance the work of the operators. Now, so you cannot let people go that’s not that’s very not productive. So apart from personnel cost, right? The resource usage meaning if you are a company that makes a physical product, you need to buy electricity from the city. You need to buy chemicals from another vendor, you need to buy water again from the city. So these are resources that if you consume less and still media through but nobody’s going to be angry. In fact, people are happy if their electricity bill goes down if the chemicals building that goes down, so the lowest hanging fruit is heavy use X amount of resources to make one bottle of this beverage can we use can we reduce that build by 20%, because it directly goes to the bottom line, like it goes directly there. So that is the, like resource consumption. That’s the first problem that gets attacked. And within that the chemicals and electricity are the most expensive line items that. And also there’s no objection, if you go and present this to your entire team and C-level C suite, nobody’s gonna object, “How dare you reduce our electricity bill?” Nobody does that. So as that’s why we’ve seen a lot of people get excited about pursuing resource consumption as a problem.
Kostas Pardalis 25:35
And what are some challenges around this infrastructure? Okay, obviously, the structure of the data is not such a big challenge. But what are the challenge of working with this data is like the volume of the data is like the reliability of the sensors. Is it like, what are like some unique challenges there?
Prateek Joshi 25:55
Yeah. One would be the data, standardization, or lack thereof. What I mean by that is, let’s say you build a workflow, and it’s supposed to get 14 columns of data, right? And you go to it works for one company, you go to the next one, and they say, Oh, we don’t have columns, number seven, and 10. Right, we only have 12. So what do you do then? Right, and you go to a third company, and suddenly three columns start beaming data, it just stops out of the blue. So how do you handle these edge cases, and also get the this is a solvable problem. But the names are all over the place, like company, one can call it, temp, underscore 562, underscore ABC and company two can call it temperature underscore Batman, right? It’s, and it’s basically, we can do some, we can do some automation to kind of figure that out, or put some people behind it to manually do it. But again, it’s not standardized. So that’s one, two would be a lot of the physical infra that is very critical to them. It’s located in adolescence, meaning it’s not, it’s not in the office, and you cannot log into it remotely. So a lot of times, what happens is the connectivity becomes an issue. There are many advanced facilities where connectivity is fine. But we do see this as an issue where the data just stops coming in and to fix that somebody has to drive all the way to the site, somebody from their company, because obviously, they have the permissions to do it. So connector, connectivity is definitely a key issue. And third is the output of the workflow meaning, let’s say we have productize this and the workflow outputs all these key metrics, let’s say those metrics is event detection is x, y, and z Now, every company wants slightly, something different, like, hey, I want to detect that event. Oh, no, no, I wanted to take this other events. So one way to solve that is, is to let them stitch together what they need so that they can it can output like, like, we have many workflow tools in the software world where you as a developer, you go, and you stitch it together, and it works exactly the way you want. But in this world, since they’re not software, people, we don’t want to provide like, you don’t want to overwhelm them with like, all this freedom and options, and nobody ends up using it. So it’s a bit of a trade-off between how out of the break, we want it to work out of the box, at the same time, we want them to have the freedom to customize. So it’s a bit of a trade-off on the product side. So these are some of the challenges that we face when it comes to physical instrument.
Kostas Pardalis 28:42
All right, and okay, we have the data is stored like in the data warehouse, what do we do with the data? What are the tools that we are using there? And where am I will get into the picture?
Prateek Joshi 28:57
Yeah, in this case, the data goes through a few steps like pre-processing. And then it goes through a model, we extract information to be presented to the operators now, where machine learning gets infused, is when we have to detect the events of interest when we have to prioritize all the different alarms. And when we have to predict or estimate an event that’s coming up soon. Simple example would be, let’s say your beverage company. And you have 200 membranes working in series in parallel. So raw water goes in clean water comes out. And you’re supposed to maintain that infrastructure of 200 membranes. Now, usually what happens is without any kind of software tooling, if you have to do it manually, what you do is you do a round robin, meaning you go to the first one, you check if it’s doing okay, then you go to the next one and so on and so forth. So by the time Emile come back to the first one, it’ll be like weeks or months. And during that time, if it starts consuming 5x more electricity for some, some kind of there are many reasons why it starts doing that you want now. So you’ll just get a giant pill at the end of itself. If you want to detect an event of interest across these 1000s of data streams, that said, you need machine learning to show that, hey, out of the 10,000 events, here’s the event that is anomalous, right? We detected it, because it matters to you. Now go to memory number 72. To fix it, as opposed to doing a round robin. Another example is predicting what’s going to happen, meaning, if you don’t do anything in about 24 hours, here’s what’s going to happen. Right? So predicting an upcoming event is another key issue. And this is where machine learning comes in very handy. Because the system itself, the baseline keeps changing, meaning when you install new, build a factory first, right? Everything is new things are great. As you operate it, the assets degrade the processes, they change, the baseline keeps shifting, and you got to adapt to the new reality. So retraining becomes a part of your offering. Because let’s say, every 15 days or every 30 days, you got to update your own reality so that if you call out something as anomalous, and turns out it’s not anomalous, there’s just a false alarm. So that’s, that’s another situation where machine learning is super useful. So yeah, so a lot of this is centered on extracting event information and presenting it to the right people.
Kostas Pardalis 31:36
Okay, so are we talking here about like more of the same methodologies that they have to do with working with time series data? Like the SIR with us like a little bit more about, like the techniques and the ML tools that are useful in this kind of like environment.
Prateek Joshi 31:55
Yes, a lot of this analysis, a lot of this work is centered on time series analysis. And with regards to the tools that we end up using, I’d say, when it comes to machine learning recurrent neural nets are, that’s where a lot of the modeling work is centered on that. We have different flavors of that model. But recurrent neural nets is what we end up using quite a bit, because we deal with time series data a lot. And in terms of the stack, right, Python everywhere, that’s obviously it’s our team’s favorite language. On top of that, you use a lot of tensor flow. And in terms of databases, obviously, we use time series databases to handle a lot of the data. And in terms of the actual models, we built, actually, we’re built modules that do different tasks, right. So detecting an event, that’s a different ml module, predicting an event, that’s another module both could be recurrent neural nets, trained on different datasets. And that’s what we end up deploying out in the wild. And we start with, then we work with a new customer. We start with the historical data, and usually about two years worth of historical data, all of that is timestamp it, it goes through a standardization step, meaning we want to convert all of that data into something that our platform can read and understand. And then after that, we have automated the process of building the model, testing it, deploying it, periodically retraining it, monitoring it, monitoring for things like drift, right? So these are some of the tools and again, along the way we use all these different tools. Some of the tools are built in-house like for monitoring. Some of the tools, obviously we use what’s publicly available libraries that have been tested, or the wild.
Kostas Pardalis 33:47
Yeah, and how much importance like the domain expertise that the customer has to set up on the platform?
Prateek Joshi 33:55
Yeah, in this case, the setup work is we take that responsibility, because it’s soft web work. And when you work with the customer that makes physical products, they’re not, they’re not software people. So we want to, we don’t want to burden them with implementation. So we take the work of setup. Now, where the customers domain knowledge comes in handy is when they want when the power users want to customize a platform for themselves. For example, we have provided that freedom or that ability to the user, like out of the box that works but if you are one of the power users, you’re free to drag and drop the modules and create new metrics, new graphs will keep no ill will. It will stay traded, they can do that. And that’s what has worked well, like out of the box. Mostly, I’d say a majority of the users, they just want it to work out of the box. They don’t want to play around too much. But there are those power users who definitely want the freedom and the flexibility to add drop metrics. and they don’t want to call anybody on, they don’t want to call us for a simple change. And that’s a fair ask, which is why we built it.
Kostas Pardalis 35:09
Cool. That’s all like super interesting is very, to me at least, like, very fascinating to identify, like, both the similarities and the differences between like instrumenting, the physical world and instrumenting, like software. So it’s great. Like, I couldn’t stop thinking of like, maybe I should, like, just buy a few like, sensors and put them into my house and measuring things and pretended I’m doing something important. But that gets me like to the next question because it seems like we leave like in, let’s say, in a time where I could do what I’m describing, right? Like, I could go and like, buy a bunch of like, sensors for a very low price. And I don’t know, like, just monitor the quality of the air like in my house, right? It’s pretty accessible, like to do lots. So what do you see? What do you see like, things going on? When it comes to like instrumenting? The world out there? And what kind of impacts do you feel like this will have to? Let’s say, how we interact with, like, the environment, right? Like, there’s a lot of like conversations lately about like, climate change, and like, all these things, right? So how do you see this physical instrumentation, like a big part of this conversation of how humans like us, like interact and work with like, in our environment?
Prateek Joshi 36:43
Yeah, the very fact that you start measuring something to will start noticing it, you want to improve it right? You will you, you feel like you should do something about it. And that’s what instrumentation does, is, if you don’t even know how much carbon dioxide you emit, it’s not like out of sight out of mind, it just just wouldn’t matter. But if you instrument, the physical infra, and then you the data is in front of you, at least now okay, to make, like a bottle of water or bottle of beverage, or Ebola ketchup, we consume these many resources, and each part of it has this associated carbon footprint. So just the fact that will make you want to think about, okay, among all these parts, I’ll attack the lowest hanging fruit first, meaning that part consumes so much, so much carbon that if I just take out that part, this can move the needle, right? So you can, you will be able to identify needle movers for a company, right. And that’s pretty much the big part of it is many people. If you’re a publicly traded company, and your job is to make shoes or tires, or beer, that’s your primary focus, or you will be as a company, that’s your responsibility to make that product and get it to cause customers and generate revenue. So along the way, every other initiative falls to the sidelines. So that’s what instrumentation does is, once you instrument your physical infrastructure, a lot of this will stay front and center which will make you want to do something about some critical problems like I’d be consuming so much how my consumption of chemicals consumption of electricity, we are no as a big companies, they use a lot of water to make products. Now, once you use that, you got to discard the wastewater, which means you are treated first before you throw it out in the lake, or else the government is gonna come after you and many people will file lawsuits. So I think just knowing what’s happening within your companies, and for I think it drives a lot of action. And that’s what’s happening around the world is all the forward-thinking companies like sustainability has become part of the practice, and just how they run their business, because not only do they get to meet the climate goals, but at the same time they get to reduce costs as well. Like nobody wants to have inefficient giant inefficient parts in their supply chain is the fact that instrumenting everything is expensive. It’s slow. I used to be but now people are doing more and more of that.
Kostas Pardalis 39:33
Yeah, so it’s interesting because we started this conversation with the metaphor of data dog for the fiscal ward. Do you see like space there for something that could be called like the Fitbit for environment for example, like something like it’s your buzzword, see, measure actually like the kind of like impact that we haven’t on the environment and consequently on like the quality of life that we have and like to react to that.
Prateek Joshi 40:02
Yeah, actually it’s a very interesting avenue. And many companies are building their own versions of this product. Like, basically, if you run a facility that how do you keep track of its health, right health from many, many different angles. So there are startups that are big companies, many people are building amazing solutions. And I think it’s, it is definitely a key part of doing this. Because one of the things is, when we look at consumer products, like something that you and I can just go to a store and buy, versus what’s available to companies, right. So it’s like, call it like, what, what we when we buy a device, and we put it in the house, I call it IoT. And the equivalent of that in the industry is industrial IoT, like IOT, right? So the product experience is this different, right? Like we as consumers, things just work out of the box, we gotta get a thing, we put it here, it starts working. It’s It’s amazing. So I think a lot of the newer companies like startups are bringing that product experience to the industrial world, which is driving adoption because nobody wants the headache of sitting and figuring out the 500 steps, you need to get like one temperature sensor to start beaming data, right. So I think that out of the box product experience is I think it’s very important, especially when it comes to installing hardware for physical facilities, I think I’m very excited to see a lot of the newer new companies and companies building amazing hardware that just, it works like magic. But obviously, if you’re a Fortune 500 company, you want a supplier that can meet all your needs, right? A startup will build the world’s most amazing sensor. But what about the 5,000 other types of sensors that a big company would need. So that’s where there’s a bit of a balancing act going on. But I’m pretty sure pretty soon, like some of these companies will grow up and we’ll be able to handle a lot of required, like samsara, for example, where a company, they built amazing hardware, they went public, and now they have very, very big customers. So it takes time to build an amazing product suite. But yeah, I think the product experience, the sooner it comes to the industrial world, more people will adopt it.
Kostas Pardalis 42:24
Absolutely. All right. So one last question for me, and then I’ll hand the microphone to back to Eric. So in terms of like, instrumenting the physical world out there, like where do we stand in terms of adoption right now? Like, would you say that’s like it’s 5% 10% 50% of like, the industrial world, at least out there has been instrumented or they’re doing it? Like what’s going on out there?
Prateek Joshi 42:49
Yeah, we’re definitely past the single digits for sure. Like, it’s not, it’s not 50% yet, because the physical in fly economy is mind-boggling. It’s spectacularly huge. And if you just look at, I’m just talking about America, right? America, many companies, they have the big companies a lot of revenue they have, they have bandwidth, and time and resources to innovate, right. But many, many places around the world. They’re like, first of all, there’s no budget, and there’s no time, there’s no patience, there just isn’t enough money available to just innovate. Right. So I think we’re still in the early innings. At least America, it’s definitely trending in the right direction. But when it comes to a lot of the physical infrastructure in Asia, where there’s a lot of large part of the world’s manufacturing happens in Asia, and that’s where we are still in the very, very early innings side, only the most, the richest, and the most innovative companies can afford to do that. Right. And yeah, so but the good thing is innovation that always starts with the tech early days is expensive, right? You build it, you deploy it, you innovate, and then you the cost goes down, the quality goes up, and then hopefully, everyone else can catch up. Right. So that’s where we are. But I think that the market is is is so spectacularly big that even though we have early innings, that is there’s a lot to do just understand us and obviously US Europe, South America, Asia, it’s an interesting dynamic that’s playing out. But yeah, I think we are trending in the right direction.
Kostas Pardalis 44:36
Awesome. Eric, all yours.
Eric Dodds 44:41
All right. Well, one last question for you here because we’re getting close to the end. But it may take up the whole time because I have the sense that you’re passionate about this. So it seems like you started Plutoshift, in part to have a larger impact just beyond sort of getting metric Next to someone who’s running a machine operating machine, when it comes to sustainability, that’s a really big, that’s a huge topic, right? In a number of ways that cuts across multiple vectors of society from the way that we live our daily lives and recycling all the way up to public policy, et cetera. So, I’d love to know, when you think about data and sort of what you’re doing at Plutoshift, what do you see as sort of the responsibility of data when it comes to sustainability? Or sort of maybe not responsibility? But in your mind, like, what’s your vision for how it could potentially impact sustainability?
Prateek Joshi 45:51
Yeah, sustainability, as you said, there’s so many— First of all, it’s all of this is, is falls under ESG. And it’s so vast, and there are so many things you can, I can count under that. But when it comes to industrial sustainability, that’s, that’s what we’re talking about here. Data is a way to measure what’s happening, be aware of it. And also, I think, extracting information such that you can take operation like actions, and on a day-to-day basis to make the change. So what I mean by that is, if you’re running a facility, a lot of the impact, you want to have meaning of that, if in the next 12 months, if you want to reduce your carbon footprint by 20%, let’s say you have a goal like that, that has to happen if you take action on a day to day basis, meaning you cannot let the pump run for like two months on 5x electricity, right, you cannot let the membrane eat up a lot of the city, you cannot let that clarifier eat up 5x more chemicals, right? So it’s about hunting down these spikes, that I’ve been predicting these spikes, pushing them down. That’s just one example of how to impact carbon footprint. So I would say data enables humans to take these daily actions, right, so that at the end of the year, all those things add up and you achieve a pretty substantial result. So I’d say that’s the next the key element to understand is, it’s not like a big bank event where okay, do something on this one day of the year and you’ll meet the goal. No, it has to be part of your day-to-day, it has to be part of your processes, right? operations teams, they have to make it part of how they get work done. And the data is a fabric that connects all of this.
Eric Dodds 47:51
Sure, follow up question to that. One thing that’s interesting, when you think about having a better understanding of the systems or the machines that you’re operating, do you think that the type of work that you’re doing at Plutoshift is driving a lot of awareness because I’m just trying to put myself in the shoes of someone operating machine or a process in a plant right in. Even if I care a lot about sustainability, I get a paycheck because I keep the machine and the process running without problems, right? And so like, of course, I would want things to be more sustainable, but like my primary objective is going to be like operational excellence. Do you see that dynamic a lot?
Prateek Joshi 48:42
Oh, yeah, absolutely. We see that quite a bit. And it’s a fair point, right? Because end of the day, if the operations team needs the requirement, the day like the throughput requirements, they’ve done their job, right. So the person, like, for example, is a global VP or the other C-level person who wants it’s there. They have to make things sustainable, like they have to achieve and meet the sustainability goals. But the person running the facility running the machines, as you said, their job is to meet the throughput requirements. So that’s why to, to make this practical, a product shouldn’t introduce a new process, meaning if the operator is doing these seven steps in their day-to-day job, you shouldn’t introduce a new process because as you said, I don’t need that, like I know how to do this job and I’m going to do it and I don’t have the luxury of focusing on sustainability. So the way we look at it is, you look at the seven steps you looked at the status quo, how the work gets done, and you accelerate one of those parts are poor, those parts are hopefully all the parts you accelerate you. You make it easy easier, better faster for them. So that to them, they’re not following a new process the same process. But now they get to, they get to save electricity, they get to save chemicals, they get to save water, meet the sustainability goals. And that’s where the trick is, right? So simple thing, their job is to make sure all the membranes are running right? Instead of Round Robin will just tell them how here’s on your phone will tell you Golang membrane number 17. Because that’s the one I think they’re more than happy, right? They don’t they don’t want round robin. Nobody wants to do that. They want to know where I’m coming to my shift at nine o’clock. What needs my attention? If there is a right message with the right reason, they’ll go do it. So. And that’s how that’s that’s how we are addressing this. This is key, I would say product adoption element.
Eric Dodds 50:55
Love it. Well, this has been such a wonderful show. Thank you, again, for sharing your time with us. And what a treat to learn about a completely new world of data that we haven’t talked about on the show. So thank you for bringing a completely new topic to us.
Prateek Joshi 51:09
Of course. Thank you so much for having me on the show, Eric and Kostas. It’s been a wonderful discussion. I’m glad we got a chance to cover this topic.
Kostas Pardalis 51:19
Absolutely. Thank you so much.
Eric Dodds 51:21
Okay, Kostas, my takeaway is directly related to my big questions from the beginning, which is the standardized data format coming from these sensors? It’s, that’s just so wonderful to hear that that’s like, not an issue. He almost breezed over. He said, Well, it’s like a timestamp and a value, right? And everyone agreed that that’s how we’re going to do it. And you just kind of moved up. And I was like, man, if every if all data were like, it would be, it’d be a much easier world. So yeah, how about you?
Kostas Pardalis 52:00
Yeah, absolutely. That’s like a part of the beauty of time series data is that they’re simple enough that humans can agree on like how to represent the data, which is great. But again, things started getting complicated after you start including other data that are also needed, like logging, we protect talked about ERP data, maintenance data, like all that stuff are obviously like, not as simple in their structure as time series data. But regardless of that, like I think I found, it was very, very fascinating for me to hear like how much progress we have done in being like to instrument physical processes and extracting data, and reusing the existing stocks and technologies and methodologies that we have in software to make some kind of sense out of this data, which is, which is great. It’s, I think it’s kind of like this lament of like, the likely universality that like software engineering might have and like how you can get like a methodology and applied like to many different things out there. So that was, that was like, very interesting parts of the conversation that we had.
Eric Dodds 53:14
Absolutely. What a fascinating show. Thanks for joining us again, and we will catch you on the next one. We hope you enjoyed this episode of The Data Stack Show, be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We hope you enjoyed this episode of The Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We’d also love your feedback. You can email me, Eric Dodds, at eric@datastackshow.com. That’s E-R-I-C at datastackshow.com. The show is brought to you by RudderStack, the CDP for developers. Learn how to build a CDP on your data warehouse at RudderStack.com.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.