In this week’s episode of The Data Stack Show, Kostas Pardalis and Eric Dodds connect with IFTTT data scientist Peter Darche. IFTTT is a free platform that helps all your products and services work better together through automated tasks. Their discussion covered a lot of ground involving their data stack, their use cases, and clearing up once and for all how to pronounce the company’s name.
The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Kostas Pardalis: [00:00:00] Hello, everyone. Welcome to another episode of The Data Stack Show. This time I’m very excited because, we will be, interviewing Pete from IFTTT. Pete is, one of the leading data engineers and data scientists there. And, we will talk with him about, the data stack that they have. We will learn how to correctly pronounce IFTTT, which I know that’s something that, many people, wonder about.
And, we will also learn more about, the very interesting product that they have. And I know that thousands of people out there are using it to automate tasks, for their everyday life. So what do you think, Eric? What are you excited about from this episode?
Eric Dodds: [00:00:51] It’s interesting. If you think about their products, you know, you, it’s not necessarily the type of product [00:01:00] that has constant in-app interaction. Right? So if I set up an automation, to send me a notification, every time something happens, I get the notification. But the jobs that are running in the app are running all the time. So I think what I’m interested in is just a use case where, you know, they have business users, and consumer users, but it seems like there would be more data generated by the jobs that are being run that the users set up. So I’m just interested to know how they handle it. Cause that’s a little bit of a different type of data. I mean, it’s an event in a sense, but it’s really a job that’s running. So, that’s what I’m most interested in.
Kostas Pardalis: [00:01:47] Me too. Let’s move forward and see what Pete has to say about all this. Hello, and welcome to another episode of The Data Slack show. [00:02:00] And today we have here, Pete from, IFTTT. And, hello, Pete, would you like to introduce quickly yourself and say a quick introduction about the company?
Peter Darche: [00:02:12] Yeah, sure. Hi, it’s great to be here. Great to be with you. Yeah, so my name is, my name is Peter. I’m a data scientist at IFTTT and I support, the company and in a number of different areas related to data within If This Then That. So, what IFTTT is is a, IFTTT is a tool for connecting things on the internet together, and that’s a pretty broad way of putting it, but, it basically allows you to, or allows users to connect internet connected services, whether they’re software products like their email or social network or hardware products, like a smart home light bulb or [00:03:00] a smart plug, together so that they can be more functional and that they can do things together that they can’t do alone.
Kostas Pardalis: [00:03:08] That’s great. So before we start moving forward to the rest of the questions, I know that like IFTTT has been around for a while and based on some of the quick introductions that you’ve made, I mean, I’m pretty sure that there are many, interesting things that someone can do with it too like IFTTT. Is there something interesting and not commonly known about IFTTT that you would like to mention before we start getting deeper into the technical stuff?
Peter Darche: [00:03:34] Let’s see, one thing about, yeah, IFTTT has been around for awhile. And one thing I think that that repeatedly comes up, it’s less, this is less a feature of the service than it is something that we see or that is a kind of recurring thing around it. People aren’t totally sure about how to pronounce the name.
They pronounce it I-F-T-T-T or IF-Tee. We pronounce it “IFT”, so for, anyone who’s been [00:04:00] unsure, you know, they’ve seen the logo before they aren’t sure what all those Ts were, what the pronunciation sounds like. You pronounce it “IFT”. In terms of the service itself or how, how people use it, I think, sometimes people will think about it either for potentially automating their social media workflows around posting content to different services or around their smart home automations.
But there’s a huge variety of different services that people. You are, excuse me, that are, that exist on IFTTT, and different things that you can connect together. So lots of people use our, use web hooks to make web requests, they connect fitness gadgets, all kinds of wearables and other things like that together.
And basically, yeah, people kind of use, you know, if you think of all, all the different things that are connected to the internet, many of those types of things are on it and people use them together.
Kostas Pardalis: [00:04:53] Yeah. That’s great. I mean, I think it’s very good that you mentioned the name because it’s a very common issue that I see also [00:05:00] with people that I’m talking around, by the way, for me the name that you use the IFTTT, the way that you pronounce, it’s very natural because my mother tongue is actually not English.
I’m Greek. So that’s, that’s really well how we call the name. and I found it very interesting with the interactions, with my directions, with people here in the States where actually they have a difficulty figuring out how to pronounce it. Anyway, that’s very interesting. I think it was very good that you mentioned that moving forward and, talk a little bit more about the product itself and the technology behind it.
I know that’s not your responsibility to the company as a data scientist, and we’ll get more into like the data stack that you are using later on, but, I’m pretty sure you have like a very open architecture. It’s pretty fascinating like the different technologies and tools and applications that you can connect together.
Can you give a very high level, [00:06:00] description of like the architecture of IFTTT and, some key technologies that you are aware of and you think of like, have been like important in the realization of IFTTT as a product.
Peter Darche: [00:06:14] So sort of the overall structure of the application, you know, we have the user facing apps and so we have a web app and mobile apps. I mean, you know, we have iOS and Android apps and then the web app is a Ruby app. We also have the infrastructure that we use for running all of the Applets as we call them, which are sort of the If this then thats. And that’s also, that’s also Ruby, and no, we use, Sidekick for queuing all those jobs. But we’re, we have over kind of the, the development of the product I think pieces of the system have been broken out into smaller services, so we have different services for handling realtime [00:07:00] notifications from some services, that we, so we use that for executing our triggers instead of, instead of a polling system or other things, having, some other services and Scala and that, that kind of thing. All of that is running on Mesos and Marathon. That’s what we use for, for container orchestration. We, everything is containerized. And so that’s kind of the core of it, of the app itself. Otherwise we’re on AWS. And so, you know, we use RDS and S3 for, you know, kind of the basic tools that people use for large interfaces and all that stuff. Like we are.
Kostas Pardalis: [00:07:47] Yeah, that’s great. Okay. Let’s start moving like, towards like the stuff that you are doing in IFTTT. So, can you give us like a very quick high level introduction about the importance of data and analytics at IFTTT, which [00:08:00] is part of the work that you also do there?
Peter Darche: [00:08:04] Yeah, you know, so data is very important to IFTTT and we, you know, use it for, we use data in kind of all the places you would, you would expect. So we use it, for our internal analytics and reporting for monitoring our business and product metrics and how we’re progressing towards various internal objectives. We use it for, we use data for, customer facing analytics for services on our platform, so we give them information about how their services are performing, how users are engaging with their services, what they’re connecting to, etc. So we have some data products. We have some kind of analytics products that are customer facing that way. We use it for our search in our application. Users need to find applets, there’s lots of things that you can do on IFTTT. There are lots of services. And so, you know, we, data, the data team supports our search efforts. We also use data in our [00:09:00] recommended our data products, like our recommendation services for applets and services and service combinations.
Let’s see, where else? How else we use data, we use data with AB testing and experimentation internally. And then, you know, for our kind of other types of internals, statistical analyses, or ad hoc analyses that we want to do, to better understand our users, better understand usage the service, et cetera.
Kostas Pardalis: [00:09:25] Okay. Makes sense. Sounds like you are certainly a data driven company. I mean, data is like driving, almost like every aspect of the company from like operates to the product itself because of course, like building a service like IFTTT requires that they operate and use a lot of data from different sources.
Peter Darche: [00:09:44] Yeah absolutely.
Kostas Pardalis: [00:09:46] Can you give us a little bit more information about like the data stack and the technologies that you are using to support like all the operations around data? I assume that, because you mentioned earlier that, all your, I mean, the product is hosted on [00:10:00] AWS. So that’s also like the cloud infrastructure where you do all the operations around data, but, yeah. Can you tell us a little bit more about the technologies that you’re using?
Peter Darche: [00:10:11] Definitely.
So we us a lot of the AWS tools, we, we stream in data, from our kind of primary data sources around the application around client events and things like that using Kinesis that gets written to S3. We have, yeah, so we have sort of an S3-based data lake. We use Spark for batch ETL and our recommendations. We use Airflow for orchestrating all of that. We use, Redshift for doing analytics and data warehousing. Yeah, as I mentioned Kinesis for streaming, but not just in terms of ingestion of data, we use it in search as well and in a few other places. [00:11:00] S3 for all of our object storage, let’s see, what else. We, we also run a number of internal services and often those are flash microservices, that are powered by either that use either Dynamo backends, or Redis caches for various things too.
Kostas Pardalis: [00:11:18] That’s great. What are the data sources like the types of data sources, where you collect data from. And if you can also give us like an estimate of the volume of the data, that you’re working with. I mean, you have mentioned technologies like Spark, Kinesis, and Redshift so it’s like typical, big data, big data let’s say technologies. So, yeah, it would be great, like to have an idea. I mean, you don’t have to like be very precise on that, but like, just give like an idea of what kind of data you’re working with. And also what kind of sources where you’re like collecting the data that you need every day.
Peter Darche: [00:11:55] Definitely.
So, so we, we kind of have four [00:12:00] primary sources of data that we are ingesting into, into the data lake. One of them is the application data from the IFTTT apps. So that has everything about our users, the applets that they’re turning on the applets they’re creating, the services that are being created, etc. So we ingest data from there. That we, we actually, so we used to just do snapshots of our database for that nightly. we switched over to reading from the bin log from, from our, application database and streaming that data in and then that allows us to change data capture and allows us to, have more kind of finer grain. I kind of reduced the window where much we wanna, you know, produce, insights based on that. Otherwise the, the other kind of major data sources are from our system that does all of the checking and all of the running of those applets, right. So [00:13:00] we have a large bit of our infrastructure dedicated to checking if users have new events for their applets and then running the actions that kind of, if this, and then the, then thats of, which is kind of the core of the service. So all of the checks and transactions around that. We get data from there. So that generates a lot of data itself. We’re doing something around, you know, half a billion transactions around that, a day.
That’s generating, you know, hundreds of gigabytes from that service. We also make a lot of, web requests and in, in the process of, of all of those transactions. And so we have a lot of transaction, we have that kind of request data as well that’s going through the different services that are a part of IFTTT and that’s, that’s another, you know, hundreds of gigabytes of data a day, and again, and also hundreds of millions of transactions. So those are kind of the primary sources in [00:14:00] terms of volume. And then we also have events from clients which we, you know, go through RudderStack and then also gets sent to Kinesis and then get streamed into, as well.
So we’re dealing on kind of on the order of, you know, low terabytes per day of data being generated from those kind of sources.
Oh, that’s very interesting. You mentioned something about reading, like the bin log from your databases. Do you use a technology, like Dibesium for that, or do you use some kind of like service, how you have implemented that?
We’re using Maxwell. I think it was created at Zendesk. But yeah, so Maxwell, it’s a, it’s a daemon that runs and is, is listening into the buttons, the bin log, and, you know, the changes from mySQL database and basically just streams all the J.SON documents into Kinesis.
Kostas Pardalis: [00:14:52] That’s great. It’s really interesting. The change data capture model is becoming more and more [00:15:00] used lately. That’s why I’m very interested to hear about it. Any challenges that you’re facing right now, like with, data stacks that you have and like the data that you have to work with? Like what is like the biggest problem let’s say that you’re trying to, to solve?
Peter Darche: [00:15:20] Well, let’s see. Yeah. I mean, with data stuff, there’s, you know, there’s always, there’s always lots of challenges. Well let me think of some, some, some of the big ones, I mean, they’re kind of the perennial challenges around documentation, right. And kind of, knowing what the data that you have available, what data is there.
Over the course of IFTTT, the IFTTT data team, you know, has been around for a while and has been producing metrics and reports and things for a long period of time. So, you know, we have lots of tables with lots of different reports, and kind of knowing which data is available [00:16:00] where, all of that that’s, that’s kind of always a challenge.
Let’s see other challenges around, you know, when you’re computing a lot of metrics, checking to make sure that you aren’t introducing errors into either the computations, or if there are errors introduced by, you know, code changes that are happening somewhere in a system somewhere. So, you know both, the, the system that does all of the checking of our applets and the IFTTT applications themselves are under heavy development and there are you know tens of code changes that we deploy every day going out for those. And so it’s pretty easy for, for something to happen there and, you know, for that to end up affecting the data.
And so seeing drift or other issues in some of the metrics, making sure that those are not, you know, making sure that all the data is correct. You know, that’s kind of a perennial challenge. You know, data has this, there’s, it’s difficult to monitor. Right. And [00:17:00] make sure that, that every metric that you’re computing is kind of appropriate, right.
is the way it should be. And oftentimes the way that problems get surfaced is when a customer, this is well less of like the customer, but like somebody internally, right. Looks at a dashboard and says, so, you know this number it’s much lower than it seems like it should be, what’s going on here?
You’re having, you have to think, Oh, well then you have to look into it. Right. So being proactive about that and checking those, you know, that’s, that’s a challenge. So being able to monitor all of those. Yeah, let me see. Those are, those are definitely two.
Kostas Pardalis: [00:17:43] That makes total sense. I mean, that’s also my experience, to be honest, I think that, we spend a lot of time, like the past few years trying to build all the infrastructure and the technologies to collect the data and make sure that we can access all of the data that are generated from the company.
[00:18:00] And now we are like entering a phase where we need to start, let’s say operationalize the data, and this is a completely different set of challenges that are more related with things, as you said about is the data correct? If there is a mistake, where is this happening? Like if there is an error there, how you can figure out these things, how you can fix these things, how you can communicate the structure of the data, or like, the processes around the data to the rest of the team. These are like, I think very interesting problems, and, we are still like, I think that the industry’s like still trying, to like to figure out how to fix and address these issues. Alright. Yeah. Well, let’s, start moving a little bit away from technologies. Let’s talk more about like the organizational aspect around data. You mentioned at the beginning that, okay, IFTTT is a very data driven company, I mean, data is starting almost every aspect [00:19:00] of the operations of the company. Can you tell us a little bit more about who are internally, at least the consumers of the data and the results of the analysis that you generate and that the data products that you create inside the company, at least the most important ones that you can think of?
Peter Darche: [00:19:22] Yeah so for internal audiences, there were a few, so, you know, we’ll have, we’ll have user they’ll, there’ll be internally. So we’ll have product, you know, product will use data. They’ll be interested in, getting data about how, how users are interacting with the product, what’s working, what isn’t. So that’s definitely one primary constituency we have internally. They want to see if, you know, new features.
So they’ll use data around, they use it for some experimentation. They’ll see, you know, [00:20:00] if we’re going to test out a change to a given feature, which one will perform better. They’ll also have KPIs that they’re looking for in terms of, new feature performance or other things like that. So they’ll use data tracking, let’s see.
Otherwise we, you know, the business team is a big consumer of the data that we have. They’re often, so we have, we have a lot of customers who are services and they’re interested in sort of the performance of their, of their service, or they’re interested in, you know, engagement, what they can do to improve their service, etc.
So, so our business team, our customer success team will want to know, you know, they’ll want to get insights about what a particular customer is, how a particular customer service is performing, you know, potentially what they could do if there are new kinds of applets that they could develop to increase engagement, you know, through IFTTT users, or if there are other ways they could modify or improve their service to increase engagement that way.
[00:21:00] Let’s see. Otherwise, the marketing team is interested in, in how various marketing initiatives are doing. You know, we’ll, we’ll have, we try to make sure that we can connect information, say from like our recommendation system or, we have kind of the venture of emails that will trigger outreach to users based on when users take certain actions, like they connect to a given service or, for the first time, or if there are recommendations to begin for set of applets, we might send that to them as well.
and that kind of, that kind of system then allows for meaningful reengagement with all with users, meaningful engagement with them. So kind of internally, those are, I think the constituencies, but kind of primary data products that we have otherwise are around our analytics. They’re more external or sort of around our analytics for analytics that we give to customers on our platform.
And then, and then, and recommendations that we offer search, [00:22:00] I guess, for users through the apps.
Kostas Pardalis: [00:22:03] Yeah. I mean, I think by definition from data teams, especially, they have like to interact a lot with many other teams inside the company, because usually you have like a lot of like, in primarily like the consumers of the output of like a data team is usually like other teams inside the company.
And I know that like in many cases, these can cause, like, friction and communication is always like, an issue let’s say, especially when we are talking about data and things that will, like, you know, like, help someone to achieve their goals or like help them, like find that to make a decision.
So, yeah. Do you have any kind of like best practices to share around that? I think it’s very useful because we always tend to focus more on the technology side of things. when we were talking about data, but I think that people are also important. So, any kind of best practices or like, things that you have learned from your experience at IFTTT on how to [00:23:00] operate like a data team and communicate with other teams inside.
Peter Darche: [00:23:04] Yeah. I think, well, let me see, there there’ve been some things that have worked well. One of the internal constituents that I missed was just the engineering team. So data’s often used internally to support our understanding of the performance of various systems we have internally. And one of the things that’s worked well is having, having processes or kind of standing arrangements where it’s going to be clear when certain data is used and how, how people are going to take actions based on what seen in that data. And so, you know, internally we’ll review on a regular basis, what the, what the performance of various services looks like. And so we can see, you know, on a regular cadence, what if, if there are, you know, if we’ve had increases in error rates or if there’s just been increasing the number of transactions that we have, or if a certain service has either [00:24:00] gone up or down and performance significantly. So, so having that kind of process is helpful cause we know it’s like, we’re going to look at this, we’re going to look at the data and then there’s kind of built into that, actions that can be taken and we’ll sort of create issues or assign work to people to make improvements if anything comes up based on that. So having those kinds of processes I think is, that’s been helpful for us.
Also, let’s see, focusing on kind of requirements gathering and, and usage. You know, before doing work, that’s been something that’s been valuable, you know, we’re a relatively small team. And so because of that, you know, it’s important that we prioritize appropriately and that means it’s important for us to sort of do the work, do important work for people. So when it comes to working with other members of the teams internally say like a business team, if they get an external request, you [00:25:00] know, really, really kind of clarifying and pushing on making sure that the, kind of talking about the data that you would provide to them and the kind of insight that you would provide prior to spending a bunch of time generating insights that might not be, that might not be exactly what they’re looking for. Those are two things, at least so far that have been useful.
Kostas Pardalis: [00:25:23] That’s great.
So moving to the last part of our conversation and this time, like going back to the technology again, I would like to ask you like to do like something similar as you did about the organizational side of things also for the technology and share with us some lessons that you’ve learned at IFTTT around maintaining and building a data infrastructure at scale.
And when I say data infrastructure like the technology as we said, but like everything that is around like generating, analyzing, consuming the data, and any [00:26:00] best practices that you would like again to share around that.
Peter Darche: [00:26:05] Yeah. I mean, I think so, I know we’re kind of learning just some of the lessons around, with data engineering, around, I mean, sort of what I’m trying to think of how to phrase this, are commonly considered best practices around making sure that jobs are deterministic and important like, you know the way that you set up sort of ETL, a lot of the data that we have kind of comes in and a lot of the insights are derived from, you know, our initial kind of computations and aggregations and how we, how we manipulate the data.
So, so, you know, I think, is it, is it the creator of Max, excuse me the creator of Airflow? He wrote a really good blog post called functional [00:27:00] data engineering. And it kind of goes through a number of principles around, around those things right? Around having like an immutable staging area where you have the log data that comes in and then you can process any downstream data from that, and it won’t have changed and you can be confident that that was, you know, the, that state where the law data is, is the way it was when it was generated having the cache speeds determined etc. That has. We’ve had some really good engineers, data engineers at IFTTT, and, and they’ve set things up that way and it’s definitely helped us in a number of different times. And you know when we’ve come back and had to reprocess data, because there’s been a failure somewhere and stuff like that.
So that’s definitely a lesson we kind of keep learning or it’s almost like the value of making sure that you’re sort of following those good practices because otherwise, you know, it can be, you can run into really big issues if you’re having to kind [00:28:00] of piece back together, datasets from a bunch of different sources and it’s unclear where that came from.
So that’s one thing. The next I think is, is thinking about how people within the organization are going to be interacting with the data. You know, we have, we have engineers who who want to query our data or access reports or generate reports. We have business people. We have a lot of internal stakeholders who are using it. So thinking about the tools that you choose and, and how you can create new jobs and what those interfaces are like, how easy it is for people to access is something that’s important.
You know, for example, we had everything we, we use Redshift for, or sorry, we would compute new metrics in Airflow, just kind of using the standard SQL templates and everything would come from and go back into Redshift. We moved to this sort of Spark-based data lake. And so most of the, [00:29:00] the new metrics computations were being written in Scala and Spark. And so that was really good. That was really good from the data product perspective, because that made it much easier for us to do things like expand our recommendation systems and, add more machine learning, to what we do, but it added more complexity to the process of creating a new daily metric.
And so, you know, thinking about structuring, structuring, how, you make data available for different people within, within like the engineering work for us has been that that’s something else to, to think about. Yeah, I’ll pause there for now.
Kostas Pardalis: [00:29:44] Yeah. These are like, I think a great point actually and very interesting, and I think it’s very interesting also to hear that like, okay, at the end, there’s always a trade off. You always have to consider these trade offs whenever you build, like complex engineering systems in [00:30:00] general, and this is even more true with data because of the nature. Great. So, last question, any interesting tool or technology that you would like to share or something that you have used like lately or something that you are anticipating like to use in the future?
Peter Darche: [00:30:17] Yeah. Well, so like I was saying before, well let me see. Something, yeah, as I was mentioning previously, one of the things, one of the challenges is of one of the challenges that we face is being able to monitor whether or not, whether or not the data jobs are producing, whether, what are the metrics generating, you know, there aren’t issues there, right they aren’t kind of failing silently in some way or similarly with the machine learning models too, right, that you haven’t had something degradation performance. So something that we’ve started using recently, for a sort of separate purpose, has been kind of valuable and I’d be interested in sort of exploring more.
We’ve [00:31:00] been using some of the kind of anomaly detection functionality that Elastic search of newer versions of X-pack in Kibana they’re making available and that’s been helpful. We’ve been using it in the context of monitoring metrics around service performance. So we can get some more sort of real time insight into, when the services that are on IFTTT are experiencing some kind of problem. And so we’ll use that either to alert the service owners if the service is run by someone outside of IFTTT or notify engineers internally, and so I’ve been interested or thinking about, you know, potentially using some of that functionality to monitor some of our important metrics you know, in case we see, okay, in a given day, if we see if we see a number, you know, again, either for some service or somewhere else that, that that is dropping, but because we have, you know, so many, there, there, there are so many different services to monitor simultaneously that it’s hard to just look at a [00:32:00] chart and be able to pick, pick out, you know, the fact that something’s going wrong.
So using some of that kind of automated anomaly detection around something, metrics is something that I’m interested in in using some more and looking into that.
Kostas Pardalis: [00:32:15] Yeah, that’s great. Very interested to hear in the future, how this works for you and like, what did you want us to learn from using these let’s say engineering tools as part of data management and engineering. We thank you so much , it was great having you today. And, I hope we will have the opportunity in the future to chat again and learn more about, what is happening in IFTTT, the amazing new stuff that you’re going to be building there.
Thanks so much.
Peter Darche: [00:32:43] Thank you.
It was great.
Eric Dodds: [00:32:46] That was a, that was fascinating. I think, you know, I think a common theme that we’ve seen in the last several episodes is a discussion around how to get meaningful data out of a database [00:33:00] itself. And we talked about change data capture with Meroxa, but they’re doing something pretty interesting. What do you think Kostas?
Kostas Pardalis: [00:33:08] Yeah, absolutely. I think that CDC is becoming a very recurring theme with these conversations. Pretty much like most of the guests we have talked with so far, one of the patterns that they implemented, they’re not least like using CDC to capture on time and on almost real time all the data that their own application generates, which is quite fascinating. I think we will hear more about CDC in the future and, what I really found extremely interesting on the conversation with Pete is also all the stuff we discussed about, the best practices, how, like someone should approach working with data, because I mean, it’s one thing to collect all the different data, of course, and all these new technologies and [00:34:00] fascinating technologies like CDC and all these patterns helped with that. But on the other hand, the big question is, okay, how to use the data. Can we trust the data, how we can make sure that our infrastructure does not introduce any kind of issues to the data that we have to work with, which becomes even more important in like organizations that are a lot of data driven as IFTTT, and so I think we’ve had some amazing insights to share with us about these best practices.
And I feel like, we will have, many reasons in the future to chat again with him and delve deeper into this kind of topics. So I’m very excited to talk with him again in the future.
Eric Dodds: [00:34:45] I agree. And I’m excited that I now don’t have to work as hard to say the name of the company. Because I said, I-F-T-T-T which is quite a mouthful. So, I hope all of our listeners feel the same way. And we’ll catch you next [00:35:00] time on The Data Stack Show. Thanks for joining us.
Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.
To keep up to date with our future episodes, subscribe to our podcast on Apple, Spotify, Google, or the player of your choice.
Get a monthly newsletter from The Data Stack Show team with a TL;DR of the previous month’s shows, a sneak peak at upcoming episodes, and curated links from Eric, John, & show guests. Follow on our Substack below.