Episode 11:

Why Modern Cyber Security is a Data Problem with Jack Naglieri of Panther Labs

October 21, 2020

This weekā€™s episode of The Data Stack Show features a conversation with hosts Kostas Pardalis and Eric Dodds and guest Jack Naglieri, founder and CEO of Panther Labs. Panther, a San Francisco-based startup, is an open platform that helps security teams detect and respond to breaches in cloud-native environments, providing a modern alternative to traditional SIEMs.

Notes:

Highlights from this weekā€™s episode include:

  • Introduction to Jack and Panther Labs (2:33)
  • The different pillars of data security (10:24)
  • Onboarding process for a company using Panther (18:40)
  • Thinking of security as a data problem (24:55)
  • Using S3 and other infrastructure suggestions that will be helpful in the long run (32:16)
  • Use cases for analyzing past and real-time data (39:20)
  • Pantherā€™s data stack (42:54)
  • Open source technology being helpful for the community (47:57)
  • The future for Panther (54:39)

The Data Stack Show is a weekly podcast powered by RudderStack. Each week weā€™ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.

RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcription:

Eric DoddsĀ  00:06

Welcome back to The Data Stack Show today we are talking with Jack, from Panther. And they are a tool that helps security teams automate the collection and analysis of security data. And I’m really interested in this conversation. Security is one of those subjects inside of companies that is kind of funny, everyone should and does care about it, but you know, it’s one of those things that it can kind of be a pain to deal with. So I’m interested to see what Panther’s building and how they fit into the stack. Kostas from a technical perspective, what do you want to ask Jack?

Kostas PardalisĀ  00:49

Yeah, I’m very interested in this episode today for a couple of different reasons. One is, of course, like, the technical side of things and learn more about the amazing technology that they have built with Panther. But also because I’m lucky enough, I mean, I’ve met Jack about a year and a half ago or something for the first time. I mean, it was like a couple of months after they started Panther. And I had the pleasure to see how quickly they grew and what the kind of team they managed to build there. And the product that they have today. And it’s amazing what kind of progress they have managed to do, which of course, has to do a lot with the people involved, but also with the kind of product and the problem that they’re solving. It’s going to be extremely interesting to discuss today with Jack, learn more about security. Security is one of these things that everyone has heard about it. But I don’t think that many people really know what security involves in companies today. And it’s going to be extremely interesting to hear about how security is performed today, and what kind of problems there are there. And how Panther is actually addressing that and what kind of technologies they are using. So really looking forward to it. Let’s see what Jack has to say about it.

Eric DoddsĀ  02:06

Sounds good. Let’s jump in.

Kostas PardalisĀ  02:08

Hi, Jack, and welcome to The Data Stack Show. It’s really nice to have you here today. How are you?

Jack NaglieriĀ  02:13

I’m good. I’m good. How are you doing?

Kostas PardalisĀ  02:15

Very good. We also have Eric here with us. Hi, Eric.

Eric DoddsĀ  02:19

Hello, good to be here after missing out on a couple episodes.

Kostas PardalisĀ  02:24

Yeah, it’s great to have you back. So Jack, would you like to start with like giving us like a bit of a background about yourself and also Panther Labs?

Jack NaglieriĀ  02:33

Yeah, of course, of course. So I’m Jack Naglieri. I’m the founder and CEO of Panther Labs. It’s a cybersecurity company founded in 2018, based out of San Francisco, California. And what we do is we build software to help security teams detect and prevent security breaches. My background is in security, you know, engineering, I was an analyst, I’ve done forensics, a little bit of everything, some application security, and probably like the last five years has just been spent all on detection. So this involves taking huge amount of security data, could live really anywhere in a company’s environment, putting it all into a single place, and then analyzing it programmatically looking for like, quote, unquote, security threats, or some type of suspicious behaviors that could lead to a breach.

Kostas PardalisĀ  03:22

That’s great. So how can you share a little bit more information about like, how’s the team right now? I mean, you said that you started like here in San Francisco, I know that you’re growing very rapidly so, can you tell me a bit more about like the people involved? How is the team right now? If you’re hiring, and like all that stuff?

Jack NaglieriĀ  03:43

Yeah, I guess I’ll give you some more info on like the origin of the company. So we started started back in 2018. That same year, I was actually working in Airbnb as a security engineer. So I joined Airbnb 2016. And my whole task was really to just build the detection out, you know, with the other members of the security team and really focus it around Amazon, and how do we protect, you know, everybody’s production environment from attacks? But more importantly, how do we get visibility and detect when, you know, really, anything bad starts to happen? So, you know, I had an open source project called Stream Alert. And I was the lead engineer on that for several years, you know, we built internally starting in 2016. I open sourced it at a conference called Enigma in 2017, in January, and then, you know, from there on, it was just continuing to develop the community development project. I eventually also became like a technical lead in the team, we hired more engineers. And then I became a manager and then, you know, had the idea really around that same time of becoming a manager that, you know, this could be something that we just build into a company. And there was a lot of problems that, you know, I wanted to continue to solve, just working on full time, and Panther was really the chance to do that. So 2018 I left to start the company. And you know, we started with just a really small, agile team of like, you know, three people. And then, you know, later on, we had some great inbound candidates, like some who came from Amazon. So that ended up being great. And they brought a lot of helpful experience to the team. So they had been building the same tools internally at Amazon. And actually, you know, a pattern that you’ll see really often is, a lot of security teams end up, a lot of advanced security teams end up building, like some version of what Panther is internally. But you know, the problem with that, obviously, is like you have to maintain it. It also takes a huge amount of engineering effort to really get a system like this running and sustainable over time. So, you know, they had amazing experience, we’re really happy to have them in the team. And then you know, since then, obviously, we’ve grown considerably, we’re over, you know, we’re over 20 people now, we just raised our series A, it was a $15 million series A led by Lightspeed Venture Partners. And you know, we’ve definitely come a long way since the first three, just you know, back in 2018. So it’s been a really awesome journey so far.

Eric DoddsĀ  06:07

Hey, Jack, I’m interested to know, you said you were doing security worked at Airbnb, in 2016. You know, which they’ve been around, obviously, longer than that. But I’d be interested to know, just from your experience there, and really, what your experience interacting with your customers at Panther? At what point of sort of scale and or complexity does the security problem become material in the way that Panther solves it?Ā  You know, I mean, there are lots of startups out there, right. But in the early stages, of course, they’re concerned about security, but not in a way that they need sort of dedicated infrastructure, because their stack is probably incredibly simple. But is there an inflection point? Or how do you think about that at Panther? or just in general? Having experience?

Jack NaglieriĀ  07:01

That’s a really good question. I think, when I generally answer this question, it comes down to, what’s the industry that the company’s in, so like highly regulated industries, like finance, and, you know, anything else that has that emphasis on the data that you’re collecting, they probably need a program like this earlier, because they need to have confidence that, you know, the sensitive data that they’re collecting, is secure, and it’s not being accessed incorrectly, or by some potential unwanted party. So I think for them, they want a program like this much earlier, or other teams only start to see it as they start to scale. So all in their growth stage as a company, or maybe to the point where they you know, hire a CSO, maybe it’s organizational. Actually, that’s, that’s probably the most common, we see that, you know, a company’s grown to a size that they hire a CSO. And then they start to bring in infrastructure, people who are like, okay, let’s, let’s kind of just take our inventory of where we’re at. And let’s kind of start to check these these really, like table stakes boxes, you know, off of our list, like, do we know what’s happening in our production environment? Do we know when, you know, people are making changes to our AWS account? Do we know when people are logging in as admins do we know when those admins are getting assigned, but you know, it’s really just like getting like going from zero to one. And a product like Panther really helps them from the get-go to be successful in that, because at the end of the day, you know, the company is going to continue to grow, and you need to be able to scale with that. And you need to be able to scale in a way that, you know, isn’t gonna break the bank, which is the problem today, and also just operationally isn’t gonna really overload the team, which is another problem today.

Eric DoddsĀ  08:46

Yeah, I was gonna say, you know, security is one of those things where it can be a little bit funny to talk about it, because even sort of bringing up the question around the importance of security is weird, because of course, it’s important, right? You’re dealing with customer data, and you you have to have security measures in there. But it seems like it’s something that companies would want to start doing as early as possible, especially if they don’t have to build it internally, which seems just like a huge limiter.

Jack NaglieriĀ  09:18

Yeah, the thing the thing with security, though, is you really need dedicated people. It’s not one of those things where you can just kind of buy a SaaS product and kind of call it a day, even though a lot of executives would want that. It’s just unfortunately, not the reality today, and I don’t know if it ever will be. You know, it’s really a thing that you have to look at constantly if you continually tune it and you know, give it a lot of support.

Kostas PardalisĀ  09:45

So, Jack, quick question, because that’s a like something that I also personally struggled a little bit to understand with security because okay, I’m, I’m aware of like, the complexity of it, but I’m not really into it like in terms of how it’s usually, like implemented in the companies, and based on also what you said about always having the need like to have people there. Can you give us like a quick, let’s say, introduction of what security at the end is? How it is materialized inside the company? What are like, like, basic components that comes just like the security infrastructure of companies today?

Jack NaglieriĀ  10:24

Yeah, sure. So I mean, there’s a lot of different pillars of security, right. So you have application security, which is looking at the application of your production level product, right. So a company like Airbnb application security team is looking at the changes that they’re making to the Ruby on Rails app, and I’m sure they probably migrated to something more advanced now. But they’re really responsible for that interaction between the customer and you know, our main production application. So they handle that layer, right? Then there will sometimes be another team that’s dedicated to data security, where they’re just looking at the access to the production data, which could have PII, it could have financial data, it could be really whatever the company is storing. So making sure that it’s encrypted, making sure that the access is secure. And there’s enough auditing and accountability there. And then there’s generally like an incident response team. And that’s more of the this is more of the area that I’ve worked in historically. So instant response is a lot of different components as well. But instant response, basically, we want to be able to detect breaches and respond to them. And responding to breaches, you know, we got an alert for something bad happening, we need to go in and understand exactly what happened, we need to contain the incident, we need to do a report, and then we need to put those controls back into our environment to prevent that from happening again. So it really also depends on the organization. And it really depends on the company and like what you’re trying to secure. But I would say you know, application security, data security team, you’ll also often really often see a compliance team, and their whole thing is just making sure that the company is accomplishing whatever compliance frameworks they need to get in order to either, you know, do certain types of business or go public, things like that. SoĀ  there’s a whole team dedicated to that, very normally. And then on the IR like this, the teams that I’d worked in historically, I was in an IR team at Yahoo, and Airbnb, and also at Verisign. There’s a lot of components there. So you could have a forensics team that their whole job is just to take images off of systems and analyze what happened. This is really common if you have malware that has like exfiling data, and you need to understand you know what exactly the attackers did, like there’s a whole team dedicated there. And then there’s also infrastructure security, which is highly related to IR. And this is the idea of collecting all this telemetry and doing this core detection. And this is always the area that I’ve been in. So they’re responsible for deploying tools to the production environment, to collect this really helpful telemetry, like who’s logging into our production systems, or now it’s more abstracted to like Kubernetes, or like who’s logging in and making API calls or Kubernetes, or making API calls for GCP, or AWS, and making these changes. And it’s really like understanding the state of that infrastructure, but also the activity within it. And that’s exactly what Panther is designed to collect. So Panther is a platform that collects all that data, puts it in a format that is very structured and gives analysts and engineers and ability to write Python, do analysis on the data as it streams in. And that’s really like what we feel is like the best way to do that function today. And, you know, we’ve seen that a lot of teams, you know, as they’ve moved move to the cloud, the amount of data that has increased so much that they need a new platform. And that’s really what Panther’s designed for. So that’s kind of like the lay of the land in security teams. And, you know, I may have forgotten a couple other specialized teams, but I think like those are the the main core ones that you would see in those companies, obviously CSO like above them kind of managing and directing everything.

Eric DoddsĀ  13:56

Jack, it’s so interesting to hear you talk about sort of the, like the two separate disciplines of collection and detection. And this may not be true in Panther’s world, but just sort of thinking about a similar paradigm with, with the companies, we work with it at Rudderstack people try to build sort of our, you know, customer behavior event streaming infrastructure internally as well. And one of the challenges that we see is that the collection piece is phenomenally difficult. And so most companies don’t actually… they they spend so much time if they’re trying to do it internally, just getting the collection done well, that they don’t actually end up spending any time analyzing the data, right. So like the most valuable piece of the puzzle is ignored. Is there a similar paradigm? And in Panther’s world of security, like is the collection piece pretty difficult and so detection sort of gets neglected if people are trying to do this internally?

Jack NaglieriĀ  14:58

Good questions, so I think that collection and detection kind of go hand in hand today, they don’t always, I think it also depends on the data source that you’re collecting. So there’s certain data sources that are really well understood. And then there are others that are fairly new. So the well understood ones are the ones that have just been around for longer. So you have things like event logs off of systems, right? That historically, that’s just been around for so long. And we have so much signal to determine, you know, what a breach looks like there. But then as we transition to cloud, I think there’s a little less certainty around, you know, like, how do we monitor SaaS applications for breaches? You know, it’s just a different mindset. So I think, you know, the, the collection and detection, I think it kind of just goes with, you know, how well do we understand the data source, in terms of the complexity, you know, I think everything in Amazon is fairly easy to collect. So they’ve done a good job of centralizing into S3. And that’s actually one of the primary ways that we recommend our customers getting data into Panther. Because S3 is so highly scalable, it’s very reliable. And it’s probably the most cost effective way at scale. And this is actually how they did it at Amazon. So I trust my, I trust my engineers who spent, you know, over five years doing this at Amazon, to give us that sort of intuition.

Eric DoddsĀ  16:14

Sure.

Jack NaglieriĀ  16:15

Yeah, I think, I think the challenge actually comes when you try to collect data that is like, very unique to the organization. So something that is an internal application, for example, they generally have their own formats. And actually even collecting syslog data can be really, really difficult, because you can arbitrarily format it. So it makes it really hard for us to say like, Oh, yeah, I clicked send your syslog data, because it could be in like, any format possible. So that’s also a challenge.

Eric DoddsĀ  16:44

Interesting. Yeah.

Jack NaglieriĀ  16:45

I think overall, like the huge advantage of just cloud and SaaS, in general, like this, this trend that the whole industry is moving towards, makes our job much easier, because it’s very highly structured. And it’s really predictable, right. And then it’s very, it’s very different from on prem, where, like, I was just saying, like syslog data can be in any format possible. And your entire application logs can be in any format possible. But as we move more to the cloud, there’s just so much more predictability, so it makes our job easier. And then what we can do, and this is actually what exists in Panther today, but we have detections that ship with Panther, for a lot of these common log sources that like we understand really well and the community understands really long, right. And there’s a lot of basic sort of checks that everyone should be doing, like, you know, send me an alert, if an admin gets created or something, you know, like, there’s a lot of this really helpful signal that could help find privilege escalation or exfiltration. There’s these common patterns across everything that we can identify. So we ship with some base rules that look for all of those basic things. And then the teams can go through and write their own, based on their own internal logic. Because also, another thing that we haven’t really discussed is that every company is completely different. So they have different threat models, they have different infrastructure. And as a result of that, you need a system that can be highly customized, to be able to, you know, work for their own business logic. And that’s a huge value prop that you go through, because you can define all these rules in Python, which is like highly testable.

Kostas PardalisĀ  18:10

So Jack, can you give us like a description of like a typical onboarding order setup process for Panther like, let’s say, company comes today, and they would like to evaluate the product and see like how it can fit into their their security processes that they have. So usually, how does this happen with Panther like, what’s the typical process that someone has to go through to set it up and start like seeing the value out of it?

Jack NaglieriĀ  18:40

Yeah, that’s a great question. So like I was saying a second ago, the main way that we recommend people to get their logs into Panther is to get it into S3 somehow. And it’s not just limited to S3, it could be you know, SQS, or SNS, or EventBridge, or something similar. But I think that the core thing is that it gets into Amazon somehow, and then we can pick it up. So generally, what you’ll see is, you know, for a new team who’s just really starting from scratch, they want to get their CloudTrail data first, that’s usually one of the…so it’s like a, it’s probably the data source that covers the most ground. And it’s probably the most valuable one it can do. It basically has a really high return on their investment. So CloudTrail natively sends S3, that’s awesome. What you do is you basically say, you know, you go into the Panther console and use add new data source, and then you basically give your Panther installation, the ability to pull from that bucket. And we do all this with configuration-as-code. So we use Cloudformation and Terraform. You can choose whichever to run any on your own internal workflows. And then we create some IM roles and we set some things up to where when your data gets into that bucket Panther gets those notifications, pulls it and then parses it, normalizes , does all the detections and then it puts it into data lake and then also it will send alerts if anything, happens that, you know, we find. So that’s more like S3 side. And that covers things like VPC flow logs like CloudTrail,Ā  GuardDuty can write to S3, there’s a lot of internal Amazon services that can write to S3. So that covers like your groundwork, right. And then you have SaaS. So in our enterprise version of Panther, we can pull down the SaaS logs periodically play like every minute, I think we pull or maybe even sooner than that, depending on the source. And we hit all these different APIs and pull the data into Panther that way. So it could be your G Suite, your Box, your Okta, your One login, and like the list goes on. And we’re continually adding more and more to this, based off of the feedback that we’re getting from customers, and really what the most highly valuable data sources that are needed. So that’s sort of the second piece. And honestly, between both of those, you cover most use cases. And then you if you want to pull in like your your on prem data or your data from your laptops, all you really need to do is use a logging framework, like D or Log Stash or anything that can write again out to S3, and then you get the ability to pull that into Panther too. And actually, a really exciting feature we just shipped is the ability to define like custom schemas. So you can do is you can have a YAML, a YAML file within the Panther UI that says, This is the structure of my custom data, like my internal application logs, or some internal tool that I wrote, that gives us really helpful telemetry, you put that in and then you, you basically say, like, this S3 bucket has these logs, and then we classify it, we put it into the data lake in a very structured format, and then you know, we can analyze it as normal. So that’s a feature that just came recently into Panther. Yeah, the whole data aspect of Panther, I think, is also so interesting that, yeah, I would love to go deeper.

Kostas PardalisĀ  21:42

Yeah, we are definitely going to discuss more about this, because the whole feeling that I get all this time that we are talking about this is that actually securities turning into a data problem, actually, I hear many, many terms that are usually used like in other like data related products, like schemas, like how you format the data and all that stuff, and also like processing the data. So we will get on that, but before we go there, just a quick question, if I understand correctly, like Panther right now. It’s mainly working on AWS, is this right?

Jack NaglieriĀ  22:18

Yeah, so the infrastructure itself runs on on AWS. That’s correct. We also hosted as a SaaS. So it’s kind of get away from for most people.

Kostas PardalisĀ  22:26

Do you see like the need there to also support other cloud infrastructures for whatever, like compliance reasons or whatever, because there’s some like moving the data around. It’s like part of security and like all the stuff, and do you plan to do that like is that the plan for Panther like to give the option to your customers to, like, store the data in other cloud infrastructures.

Jack NaglieriĀ  22:47

So the goal really is around being able to run components of Panther and other clouds, I don’t know if we’ll ever fully run Panther in other clouds just because of the complexity of translating server lists into, you know, other clouds, that could be a little difficult. I think, you know, today, what we like, what I want to get to eventually is the ability to say, Okay, I have data in Azure, I have data in GCP, I have data in Amazon, like, we can deploy Panther and we can, we can have a single instance of Panther that takes all that data in. So maybe it stores some of it and all some of the other clouds, and maybe it puts it all into one. It’s really interesting. I think like the multi cloud strategy is something that hasn’t even really been hashed out yet. Yeah, they’re all you know, I think there’s a lot of different approaches. So we’re still figuring that out. But I definitely have the aspiration to support the other clouds.

Kostas PardalisĀ  23:36

Yeah, yeah. That’s interesting. That’s one of the reasons that I was asking you about this, because I feel this like kind of trend around like the multi cloud deployments and all that stuff. So it’s interesting to see how these also affects security in general. But also like, how is security related to like, have to operate in this environment? Alright, let’s, that’s great. So let’s go back, like to the, to the security problem, and how it relates like to data in general. So yeah, I mean, it looks like more and more anything has to do with security, which, for me, as an outsider, to be honest, security was always something related to compliance, like many rules, like things that we need to follow. And that’s what I traditionally had in my mind around around security to be honest, although I’m pretty sure that’s not accurate. But I’m also sure that like, I’m not the only person like who thinks like this about security. But as you describe, like the product, and in general, like also how security works, it sounds like at the end, it turns into a data problem. Can you say a little bit more about that? Like, how do you think that this if this is true, first of all, and why it is true, and why like we need to approach it as a data problem in order to succeed with security.

Jack NaglieriĀ  24:55

Yeah, it’s absolutely a data problem. It’s been a data problem for years. You know, I’ve been trying to solve this for, you know, over five, six years, and we always felt that like the, you know, the ideal solution for us is that all the data goes into some big data warehouse, right? This is how the teams have been doing it for years, you know, they collect their production level data, they put it into something like Hadoop, or, you know, they put it into some really scalable data warehouse, and then they search over it that way. They have, you know, workflows like like tools like Airflow, things like that. And really like that was the northstar that I always wanted to see in security, right? What you see in reality is someone deploying into like Splunk, or Elasticsearch, that has a very different way of handling the data. And it’s not to say that they’re bad, it’s just to say that like, at the certain scale that we’ve hit, they just become ineffective. And what you really need is you need that structured, scalable data warehouse, or data lake is now more widely acceptable to do our security job. Because in security, it’s very common not to detect something for, you know, maybe three to six months, maybe you get a letter from the FBI one day or an email saying, Hey, you got breached. And we’re like, oh, okay, we need to go back and look eight months ago, but we only have 90 days of hot storage in our Splunk. So we’re kind of out of luck there. You know, so the scale of the cloud is really restricted, a lot of people even just get the most basic monitoring done, just because the scale so high, you know, we’ve heard some people who want to collect, you know, 50 terabytes of data per day or hundred terabytes of data per day. And you know, that that’s an astronomical scale that you just cannot do with Splunk and Elasticsearch, you know, unless you’re gonna be willing to spend millions of dollars and having having like a huge team of individuals. But the beauty of Panther is that, because we’re using so much serverless, and because, you know, Amazon has built so many, so many of these great tools, we can take advantage of them, and we get the byproduct of big data, right, they’re just for free, basically, you know, the only challenge is, you have to know how to set it up and configure it, right. So to get it into a data warehouse, you need to structure the data that really like is the core problem, right? So I built this a long time ago, it was ability to like classify the data, guess its schema, and then like, force it into a schema that you defined, and then you use that to put it into like a segmented part of your data lake in S3 in that exact format. And then you have a table that you can use to to do schema on read. And then you can actually read the data. And you can do like, you know, huge searches over like terabytes of data. So, at the end of the day, like, that’s what we need as security teams, we need really structured data, we need it in a way that can go to petabytes, and it’s only going to continue to get worse, right? In terms of data volumes. It’s just, it’s just increasing more and more every year, you know, so this problem is definitely not going anywhere. But also, you know, the thing for processing data with Python, you’d need this strict schema, because we’re looking at certain fields. And, you know, if we don’t have it in this format, then we can’t do our Python rules. So there’s a lot of different reasons for why we need this in this format. But I’d say like the core one, excuse me, the core one is really around just sustainability and having a way to actually go through and search your data, six to 12 months back, and you know, not pull your hair out. Because you know, a lot of teams have this problem where like, it would take days to get a response. And then you realize that you you search the wrong thing. And that you have to go back and do it again, you know, so we’re just trying to avoid that pain that we felt so much in the past. SQL adds a little bit of complexity, like I said, like I think, you know, those tools Splunk and Elastic, they’re not, they’re not bad products, by any means, you know, they, they definitely helped at a time when they’re really needed. But I think just for the scale that we’re at, they’re just not working anymore. And you know, the thing with SQL is that, yeah, there’s a little bit of a learning curve, but it’s extremely, extremely powerful. And when you learn how to really use it, and do these joins, and do the statistics, it’s going to change the game for security.

Kostas PardalisĀ  29:10

That’s very interesting. So from what I understand, like a big part of the problem at this stage, at least is like data modeling problem, right? Like, you have to take like all these different sources, you might have like also very ad hoc data that might be coming in. And then you have to model them like into one model that Panther defines. And then you can query the data with very specific semantics. So my … do you do you have I mean, in the data world, in general, and like what I’m talking about here, like data analytics, and data science and all that stuff. The data quality is a big thing. It’s a big issue, right? Especially when you have to work with many different data sources, and where you have like many different people who are actually affecting the data that will be transmitted and this is something that we see also at Rudderstack, like big, big issues like okay, we’ve collected data, that’s one thing. But then in order like to operationalize this data, it’s a completely different story and thinks about like the quality of the data, how you can monitor the quality of the data, how you can enforce rules around how the data should be transmitted, and how like to react when things go wrong. It’s a big thing. So is this also like data quality is also like something that is like important also in your case? Or it’s a bit different? Because the data that you’re collecting probably are coming like from, let’s say, more domain specific data sources in a way? Do you see that? Is this a problem that you have? And if it is, like, how do you deal with it?

Jack NaglieriĀ  30:41

Yeah, that’s a great question. So it’s definitely less of a problem for us. Because in Panther’s nature, we are forcing all the data in that schema. And like I was saying before, a lot of SaaS data or cloud based data is so highly predictable, that we don’t have to really worry about the format very often. So generally, like, if there is a problem, it’s that we couldn’t classify it. And then it goes into a queue. And then we can reprocess it once we’ve like fixed issue. But it’s pretty rare. I mean, honestly, like, once you onboard a data source, and you use it for about a week, you can figure out all these little variations in the data, and then we can add that into our schema, and then call it a day. But it’s usually something we don’t have to touch very often, it’s just kind of like a one-time cost. And then we get the benefit of, you know, just being able to repeat that across deployments.

Eric DoddsĀ  31:31

Jack, I’m interested to know, you talk a lot about sort of the, you know, if you try to do this yourself, there’s severe infrastructure costs, or there can be, are there things that earlier stage companies could think about just in terms of their architecture, that would make sort of the the jump to, like a higher level of security easier? I mean, it sounds like Panther makes that way easier, just in and of itself. But you just mentioned architecture multiple times, I’m just interested to know, from your experience, are there things that you say, you know, I wish I would have done this that way, or sort of looked more closely at this type of infrastructure, from a security standpoint.

Jack NaglieriĀ  32:16

It is a lot of elements to that, for one, I would say at least collect your data and put it in S3, because then you at least have a forensic record of it, you know, and that’s something that is really easy to configure, that has a ton of value. But really down the line, right? Let’s say you’re just an infra team, you’re getting set up. You don’t have a security team yet. But once you do have a security team, if all the data that they need is already in S3, it makes their life so easy. And let’s say they deploy…

Eric DoddsĀ  32:44

A gift to them in the future.

Jack NaglieriĀ  32:45

Absolutely. And then let’s say they deploy to like Panther that will actually use that data for detection, for storage, for analytics, things like that, then they basically can go from zero to one very quickly, like within an hour, you know, it’s really quick.

Eric DoddsĀ  33:01

Wow, that’s incredible.

Jack NaglieriĀ  33:02

Yeah. And the fact that I honestly, I’m always amazed what we’ve created in Panther, because this type of system was impossible for us to create as security practitioners, we just never had the skill set at the time, you know. And that’s actually one of the things I’m really thankful for having the opportunity to do is to, you know, be in a startup and run a startup rather, that is building this for other security teams to get value out of it. Like that’s an that’s a really fulfilling part of like my job, right? And it’s just, it’s going to make a huge difference in the next like, five to 10 years, like in this next wave of security tooling. So yeah, I would definitely say if you’re, if you’re an infra team listening, like put your logs in S3, please.

Eric DoddsĀ  33:48

Jack that you talked about your job being rewarding. And I, I want you to acknowledge the sensitivity of, you know, your customer relationships. But is there any way you could share maybe a story around how Panther sort of caught something that could have been bad for a company, obviously, you don’t need to use the company’s name, but just interested if there are any stories like that, where, you know, someone running your technologies was saved from a potentially, you know, big pain funnel, you know, because of some sort of security breach.

Jack NaglieriĀ  34:21

So I can’t speak to like, specific alerting or intrusions because of, you know, privacy and no respect for that. I mean, honestly, what we what we hear often is just, it’s very easy to get set up. And, you know, teams love the fact that they can write Python, and I think, in a sense, the Python is what enables them to detect new types of things that they couldn’t otherwise do. So that’s a huge that’s a huge advantage for a system like Panther. And we’ve heard that pretty repeatedly and also like the time devalues been pretty quick. You know, once you get the data in of course, which can be sometimes challenging in large organizations, but you know, again, If you have all your data in S3, it’s pretty easy just to get going. It’s really more about around like the capabilities and like the platform that, that, you know, we get that feedback from versus like, oh, it detects this specific type of thing.

Kostas PardalisĀ  35:16

Sure. So talk, I have a bit of a, like a more technical architectural equation that I have. So in data infrastructure, in general, there are like, two main, let’s say, models that we usually think of right one is like more of the data warehouse, or data lake centric system where data is delivered, like to the destination, and then in a more of a batch mode, let’s say we go there and process and come up with insights, we can analyze the data, etc, etc. And then there is also like the, the streaming model, right, with things like doing processing, something like Apache Kafka, or things that also technologies like Spark from Data Bricks is doing. So in security, I mean, they’re, of course, this kind of part of things, they they are supplemental at the end, right? Like I mean, something that in many cases, you will see them like living together in a company. So in the security context, is one of these brands like more prevalent and other both needed, and how Panthers like associates to these brands, if it does.

Jack NaglieriĀ  36:24

I mean, funny enough, Panther is like one of the first tools that I’ve ever seen, I mean, besides from Stream Alert, my other projects that I worked on, that actually use both of these technologies together. every other solution in security has been some homegrown or, like, index-based searching system. So you know, the status quo, and I actually just wrote a blog post about this, that kind of goes into detail. But really, the status quo in security is log analytics platforms like Spunk, Elastic, Sumo Logic. And they don’t really utilize any of the tech that we just talked about, right? It’s their own sort of internal proprietary search engine, indexing engine, things like that. So you have like Lucene, for Elastic, which is the the query syntax. And then for Spunk and Sumo, you have their own search syntax that they created, right. And under the hood, they’re using, again, their own proprietary solutions for for searching and handling the data. But you know, when we did Stream Alert, that was really the first time we even had the opportunity to use tools like Kinesis, and Lambda and things like that, and get that really highly sophisticated data pipeline, because it’s serverless. And because Amazon, you know, exposed the service to allow us to do it without needing a huge ops team. You know, I went and did a talk at Netflix in like, I don’t remember what year was, I think it was 2017 or 20. Yeah, 2017. And I got a question from them, like, Well, why don’t you just do this in Spark, and I’m like, because I’m a team of two people. And, you know, we don’t have this massive infrastructure that we can just manage, you know, but you know, what serverless did is it made it really accessible. So now, because of the accessibility, and a lot of these things that I’m talking about are pretty recent, like, Lambda came out in like, what 2016, 2015. And Kinesis also is, you know, was fairly new as well. So, we’ve basically been living on the bleeding edge for a while. And we continue to do that at Panther. So like, all the Lambdas are using Go and I think Go Lambda support even came out in like 2018 or so that’s the year that we like founded the company, you know, we continue to do stuff like that, like, we just use the bleeding edge, Amazon based solution. And that’s actually one of the things that makes it a little bit difficult for us to go multi cloud, but we get the advantage of we can run it like a crazy scale. And we can do it at a fairly low cost. So that’s the trade-off that we’re making. But you know, the fact that it makes it viable to do security at this cloud scale, it’s like that’s a good trade off.

Kostas PardalisĀ  38:54

Well that’s amazing. So with with Panther, like what is the most common use cases, when they like the teams working with data, it’s more about like, digging in the past and trying like to address like something that might have been like in the past, or it’s more about having real time alerts and trying to react as fast as possible and like in a thread or something that has happened? How are both? I mean, what do you see that’s like the most common practice right now?

Jack NaglieriĀ  39:20

Yeah, that’s a great question. There’s a couple pieces to it. So I think the real time is really, really great at finding things that are happening right now. And a big thing in security, like I was saying before, is you sometimes don’t know that something’s happened, you know, until months after. Or maybe it takes a long time to issue a search on your data, because it’s so it’s so much data, right? So real time really helps just get that really quick feedback loop, right. The disadvantage with it right now is that it’s looking at, you know, very limited window of time, it’s generally only looking at a handful of events, right? And to really detect certain behavioral patterns you need to look at like batches of events together, right? So that’s where like the SQL comes into play. So it’s really interesting to combine both of them. So maybe you have a real time detection that said, Hey, you know, Kostas just logged into this prod box. And he is not supposed to be doing that, right? Like, this is a really sensitive box that only this one team should have access to. Right? That’s suspicious. So what we can do is we can go back in our data lake and say, show me all the other boxes that Kostas logged into, maybe we don’t have a rule for that. But we have the data because we’re collecting it. And that’s the other thing that makes the data collection piece and storage to be so important is that you may not have a detection for everything, but you have the data probably. So if you can go and search the data really efficiently, then you can answer the questions that you couldn’t answer in real time at that moment. So it definitely has like this, like very complimentary solutions. And you know, I think it kinda goes back to like teams want that like real time response. And they want to be able to within a minute to go in and fix something if it’s needed. Like, there’s a lot of organizations that I’ve worked with that they want to have everything fully automated, they want to know what literally within a minute, if someone logs into a system, they want to get an alert and remediate it as soon as possible. And systems like Panther allow you to do that. Because we can plug the alerts into things like SQL cues, or into web hooks or into support platforms and like really take advantage of the automation.

Kostas PardalisĀ  41:26

It’s amazing. It’s also very interesting, exactly, because it sounds like security’s the kind of like, context where real time and batch they really need to coexist. And when you need them in, it’s probably like to have like access to both of them, which is a very interesting use case, usually, like with other data problems, what I’ve seen that’s like that, I mean, you might need both, but like for very different, like needs, and probably also, like different teams are involved there. So you know, you have this situation where, for example, streaming processing is more about like detecting something that’s happening now. And you want like to send out alerts, but your BI team, for example, probably never like mess with that, right? Like they’re probably working only on batch processing and on your data warehouse. So that’s pretty unique, I think in security. And I find it like from an data probing perspective, let’s say like, very, very interesting. So Jack, you have that little bit of like, the kind of technologies that you are using, you mentioned, like serverless is, like you mentioned Lambdas, in how like all these technologies, they have allowed you to deliver like the product and the experience that you have right now, would you like to get a little bit like, like the more detail of like your technology, the stack that we are using. And I mean, okay. AWS, as we mentioned, but like, I think it would be like super interesting to see, to hear a little bit more about the architecture or the product. And the technology behind it.

Jack NaglieriĀ  42:54

I’d love to. I’m always really impressed actually, with what the team has been able to how the team has been able to evolve the architecture as well. So Panther started just as a set of lambda functions connected together with some cues, some S3 buckets, things like that. So the first thing I’ll say is Panther’s fully serverless, meaning there’s not a single virtual machine that we manage, when we deploy Panther. It’s all managed services. So if we start with like the web front end, so the web front end runs as a as a Fargate container. And if you’re unfamiliar with Fargate, it’s basically just like a, quote unquote, serverless version of ECS. So you give Fargate a container or an image, rather, and then it runs it as a container, then you don’t have to manage the underlying VM. So that’s the first piece, our front end is a React, a React app. And our middleware is GraphQL. And then our back ends are Golang Lambda functions. And we back them with things like Dynamo DB, and S3 and things like that. So again, the stack is like completely serverless, which is amazing, because we get the advantage of scale, of low cost, and just low operational overhead. So that’s been really incredible, especially with the GraphQL layer like that is such a cool component to our architecture.

Kostas PardalisĀ  44:13

Yeah, it’s building its technology as we mentioned before, that’s very interesting. Quick question. Why did you choose Go as the language for your lambdas? I know that like, you can build lambdas in like many different languages. Why? Why Golang?

Jack NaglieriĀ  44:26

Good question. So with Go, when you’re stream processing, like you, you get so much benefit from the performance out of Go. And also like the type safety aspects too. You know, I think just in general, it’s, it’s better for the use case that we have, and that’s really the reason. You know, we’d written it in Python and Stream Alert and then we’d found at certain scales, it just was a bottleneck. So in order for us to run it like a 10x scale, which was one of the goals with Panther, you know, I decided that you know, we really should write it in a in a language that is more performant. That’s been really helpful for us actually.

Kostas PardalisĀ  45:02

Makes sense. That’s good. And then you support Python as the language for literals. Right, which is something that it’s also like a very interesting differentiator compared like to other tools, as you mentioned, like Splunk, where you need like to learn their query language, then you have Elastic, which again, I mean, they have their own language there. And one of the things that they always disliked about Elastic, but why did you choose Python? And how does this interoperate like with the rest of the technologies that you have there?

Jack NaglieriĀ  45:32

Great question. So Python’s probably like the most widely understood language in security. So when we wrote Stream Alert, that was one of the huge value adds there, right? It was like, security engineers love to code in Python. So we’re gonna allow them to write the detections in Python. And I mean, the second part of it is really, you get a ton of accessibility with Python. Now, you don’t have to write some crazy long, proprietary search in Splunk, right, you can use things like classes, and you can use things like helper functions, and these, like, really widely understood programming concepts you can apply into security now, which is awesome. So that was the reason that we kept that in Panther, you know, it’s just such a highly powerful feature, you know, eventually I think what will end up happening is, like, we’ll have an option to where like, you can define the rules in like a YAML format, or like a simple format. And then, you know, for the teams who don’t write code, you know, it allows them to still do their job and do it effectively. But I love the mantra of like, simple but powerful, right? Like, at the end of the day, we’re allowing teams to write Python on classified events, right? Like on events that are parsing normalization, say rather, and in its essence, is pretty simple, right? We take some data, we put it in format, and then you can write some Python on it. But I want to keep that going forward, as well as where you could expand it to where you can, you can choose how advanced you want to get. And that’s really like what I aspire to have in the product eventually.

Kostas PardalisĀ  47:03

Yeah, that’s great. And actually, as you were talking about it, I was thinking that, because you said that Python is like probably the most what’s like in the security space. And I think that that’s another indication what like, probably security is a data problem at the end, because Python is like the de facto language of like, every data professional is pretty much like using. So yeah, that’s, that’s also interesting. So Jack, you mentioned at the beginning of our conversation, your involvement in open source, you mentioned like that, bark, when you were at Airbnb, you open source like a project. I know that sponsor is also like, pretty active in open source, and you have open source like part of your product. Why? Why do you think like open source is important? And what like the value that Panther gets from the open source community? And what’s the value that also like Panther gives back to the open source community?

Jack NaglieriĀ  47:57

Yeah, that’s awesome question. So I think, all in all, open source has been so helpful for the security community in general, you know, a lot of tools like, for example, like OS Query, and Google has a tool called Santa. And now there’s a bunch of other really popular security tools, there’s one called Elastic Alert for Elastic Search. And they’ve really just helped to kind of push the whole industry forward quite a lot. And more importantly, it allows the security engineer who’s starting from nothing to go on GitHub and pull a lot of their tool chain down for free and try it out. And, you know, get a ton of value out of it from the from the get go. I think from there, it’s extremely valuable for new teams. But also just in general, open-source is helpful. Because one, it adds a lot of transparency into the projects, it I think promotes better code quality, too, because it’s all out in the open. And you know, you have a lot of eyes on it. And also part of that is like, you can have people who are very security-minded developers look at the code and say, oh, actually, no, this could be a problematic thing. This could be a vulnerability, actually, in your code, you should fix this, or this thing’s too permissive, you should fix this or whatever. So you get that sort of that free kind of security consulting with it too, which is awesome. So I think it strengthens the project overall, you know, with that added level of transparency, and then you really get the opportunity to have a bunch of people test your software on all these different types of infrastructures. And that actually makes it more resilient. So I love that aspect of it, too. I think like, that’s how I think about open source in general, it’s helpful for the community, it’s helpful for us who, you know, we’re working on the problem every day. And it helps us think about the problem, maybe in slightly new ways. You get that diversity of feedback, I think, which is amazing, right? Because when you’re a company, you have all the same people with, you know, all the same perspective, looking at the problem every day, but you get someone completely fresh, and they look at it and they say, Hey, have you thought about this? Have you thought about doing it this way? And we’re like, actually, No, we didn’t consider that. But now we will. And that I think pushes the whole project forward more. The last thing I think that’s also really important is like you can kind of standardize on a lot of different methods of doing security, right. So, you know, OS Query kind of became the de facto for, for getting data off of systems, right. And Elastic Alert became the kind of the de facto of, you know, searching data in your Elastic Search for security purposes. And then Stream Alert became the de facto for really like people who were starting with nothing and had AWS infra. And now Panther is kind of the successor of Stream Alert with the UI and all these other like added features that teams really need. So, you know, it just felt right. And at the end of the day, we want, we want the security engineer, the person who’s tinkering with the system to just be able to deploy it right away, and get that value immediately. We also open-sourced all our detections as well. So we used to have, like, some packs that were internal, and then we just decided, like, you know, we want to have everything out in the open, and we want to get feedback on all this. So we have both Panther is open core, where you get like the core cloud security log analysis features and then there are a couple things that we have in our hosted SaaS that, you know, allow you to search to the data easier, we have an indicator search, which is really cool. We have like a custom log support, our SaaS support, like, our back, like all the you know, normal enterprise stuff is available, like with a license from us, but that’s really like how we trade off like open source and our proprietary version.

Kostas PardalisĀ  51:20

Oh, that’s great. So I know that you’re like deploying a number of like, alerts and like, ways of like processing the data. And you also allow as you said like with Python, like the customer to create their own rules there. How do you come up with these rules? I mean, is that like, okay, that’s pretty much like, actually, what’s the work of securities? And how is like these also, how do you see this kind of work together with open source? Like, do you see that, like, people might be submitting rules publicly? And how you are reviewing these rules? Because, okay, it’s not something that’s like you just take it and run it and see what happens. I mean, it’s about security. So how do you manage this?

Jack NaglieriĀ  52:05

So the detections are really based on really commonly accepted frameworks. And that kind of goes back to like, one of the advantages of open source in general that I was mentioning, right? So there are frameworks called like, by Miter, Miter Attack is very common one. And there are others very similar to that. And they basically layout like, these are the common types of attacks that exist in every environment, or every type of environment. Some examples like privilege escalation, you know, maybe on maybe this vulnerability on a system that you can get root on. Like, that’s a very common type of attack, we can write detection that’s very generic that will detect that a lot of different different types of scenarios. So following frameworks like that is really like the source of our detections, like how we are informed and how we create. And then it also really comes down to just like research, right? There’s a lot of new attacks that are happening every day. There are new breaches all the time. So like, for example, like the Capital One breach, we’ll read reports like that, and then say, okay, this is how we would detect it in Panther, and actually did a webinar with Snowflake on that. So I broke down, you know, this is how you would find a retrace similar to that. And then we write those detections, put it in open source, and then anyone who deploys Panther gets those rule sets. So stuff like that’s really helpful in terms of sharing them in open source, I think it kind of goes back to like getting more eyes on the problem. And, you know, maybe having them review but also getting other teams to contribute their own, right. As security practitioners, we’ve all worked in different companies. And we’ve seen so many different things. And part of the goal of Panther was really to democratize a lot of that knowledge. And the way you can do that is by like committing rules back to our source repo, that are widely applicable to other companies as well. So we had like, for example, like we had one of the engineers, we know, at HashiCorp had committed a rule back in that looks for signal from Amazon, if you commit something into GitHub. So there’s some scanning that happens where Amazon can proactively, like detect credentials that were leaked. And there’s a cloud trail you can get that would alert a team. So that stuff’s really helpful to get just again, like the different perspective, the thing I was mentioning before, right? So we can we can get all that feedback into Panther just by it being open source as a byproduct.

Kostas PardalisĀ  54:17

Yeah, it’s super interesting. It’s a very interesting kind of community built around that, and I’m really curious to see how it will evolve in the future. So Jack, one last question. I mean, I know we’re gonna keep like, chatting about that stuff, like for forever, but what’s next about Panther? Like, is there anything exciting that you’d like to share?

Jack NaglieriĀ  54:39

Yeah, so it’s been a pretty interesting year, obviously, especially with the pandemic. But you know, we’ve had a really exciting year, we’ve grown we’ve, I think, more than doubled as a company. And we’re continuing to hiring continue to hire engineers and you know, non-engineers and really just kind of grow the company, both on the In technical sides, so, I mean, as always, like we’re working on big features all the time. And we’re, we’re trying to ship stuff that’s super impactful that teams can jump in right away and start using. So we just shipped, you know, some more logs support for things like CloudFlare, stuff like that. And then some bigger features are coming down the line that allow you to do you know, more of the behavioral-based analysis that I was talking about before. So being able to take advantage of this data lake that we’ve created with, you know, S3, and you know, the data classification process and look in windows of time, and do more around automation. So that’s really like what’s coming down the pipe with Panther. And yeah, just continuing to support more use cases and push these features in open source. And also, you know, get super helpful feedback from engineers.

Kostas PardalisĀ  55:48

That’s great. I’m really looking forward to chat like in the future with you, and also see how, what other amazing stuff you’re doing to build at Panther. Thank you so much, Jack, have a good day. And as I said, I’m looking forward to chat again. Yeah.

Eric DoddsĀ  56:04

Well, that was a super interesting conversation. I think, security to me is, like you said, when we were when we were talking before we started the interview with Jack, it’s one of those things that can be a little bit ambiguous, but amazing just to hear how he built on the work that he did at Airbnb and an open-source project, to turn the technology into something that’s a big and growing company. That’s really exciting. What do you think Kostas?

Kostas PardalisĀ  56:33

It’s amazing,I mean, what they have managed to achieve so far, in Panther, and with a product they’ve built, I think what I found extremely interesting, and a bit surprising, to be honest, is how security has evolved all these years. I mean, from the first time that I started working in technology, I remember chatting about security, but how security was performed like a couple of years ago with what security is today, something completely different. And what I find extremely interesting is that, and that gets us also like to why it was interesting to have this podcast, as in show that called The Data Stack Show is that increasingly, the security problem is becoming a data problem. So about how to collect the right, the right data, how to search into huge amounts of data, and how you can react in real-time when it’s needed, but at the same time have access to a huge amount of historical data that you can browse and query effectively. And it looks like Panther has done an amazing job in addressing these requirements of the security problem of the day. And I’m pretty sure that we will see an amazing growth trajectory of this company. And I’m looking forward to chatting again with Jack in the future and learn more about what they are doing and what exciting stuff they’re building.

Eric DoddsĀ  57:58

Me too. I think, you know, it’s interesting, thinking back on our conversation with Slapdash and how they achieve so much with search by approaching the problem differently. I think that Panther has done the same thing in many ways thinking about a serverless architecture that sort of allows you to manage petabytes of data. It’s neat to see people approaching problems differently. So be excited to catch up with Jack in another couple months and we will catch you next time on The Data Stack Show.