Aligning Data Security With Business Productivity To Deploy Analytics Safely And At Speed

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Are you tired of dealing with the headache that is the modern data stack?

It's supposed to make building smarter, faster, and more flexible data infrastructure a breeze. It ends up being anything but that. Setting it up, integrating it, maintaining it, it's all kind of a nightmare. And let's not even get started on all the extra tools you have to buy to get it to work properly. But don't worry, there is a better way. Time extender takes a holistic approach to data integration that focuses on agility rather than fragmentation.

By bringing all the layers of the data stack together, Time extender helps you build data solutions up to 10 times faster and saves you 70 to 80% on costs.

If you're fed up with the modern data stack, give Time extender a try. Head over to data engineering podcast.com/timeextender

where you can do 2 things.

Watch them build a data estate in 15 minutes and start for free today.

Businesses that adapt well to change grow 3 times faster than the industry average.

As your business adapts, so should your data.

RudderStack Transformations lets you customize your event data in real time with your own JavaScript or Python code.

Join the RudderStack transformation challenge today for a chance to win a $1, 000 cash prize just by submitting a transformation to the open source RudderStack

transformation library.

Visitdataengineeringpodcast.com/

rudderstack today to learn more.

Your host is Tobias Macy. And today, I'm interviewing Yoav Cohen about the challenges that data teams face in securing their data platforms and how that impacts the productivity and adoption

organization. So, Yoav, for anybody who hasn't listened to your previous episode, can you give a brief introduction?

Sure. Hi, Tobias. Thanks for having me. My name is Yoav. I'm the cofounder and CTO

of a

data security startup called Satori.

What we do is we help companies

streamline access to their data

by helping them implement just in time access to data

and

resolving

a lot of the bottlenecks

around getting the right data to the right people at the right time in a secure and easy way. And for, again, for folks who didn't listen to your previous appearance on the show, can you share again how you first got involved in working in data?

Sure. So, I've been fascinated with data and databases for as long as I can remember.

I think I was at, like, in in 3rd grade or something like that when I went to this computer class, and they taught us about

relations and tables and SQL and all that stuff.

So it's been it's been an area of interest of mine for a very long time.

But more professionally, in my in my previous role at a company called Imperva,

I I got the chance to build

a globally distributed, highly scalable data platform back in the days where it was a lot about do it yourself,

composing all these open source components together and figuring out all sorts of challenges that later on became architectures that we know today, separation of compute and storage and all that stuff. So anyway, I got acquainted with the challenges around data management and and securing access to sensitive data specifically. So that that's been that's been a lot of fun, but,

it did get me exposed to all of the challenges

in in that area.

And so in terms of the topic at hand today,

we're discussing some of the elements around data security, and that's a very broad term. And as with a lot of things in the data ecosystem, it means different things to different people. So I'm wondering if you can just start by enumerating some of the different concerns that are involved and encapsulated in that term of data security and maybe even some of the ways that term has grown to mean more and different things over the past few years.

Yeah. Sure. That that's that's so true. It's a very convoluted term. It can mean different things to different people. So some of the concerns of data security,

like, the basics is I have this data,

and it's sitting somewhere in in a database or a disk or in cloud storage. And how do I secure it just there and then? Call this data at rest. Folks talk about

encryption

and access and all of these different things that can help secure data as it just sits there waiting to be used. Other aspects

are securing data as it as it moves between systems and it, you know, goes to users. And there's

also,

securing access to the data. You could also think about concerns like database security,

which is basically,

how do I make sure that my data system is up to date, patched, and and so on? But if you think about it,

that specific concern is is mostly

being held handled today at the infrastructure level by the data platform vendor. A lot of us are migrating to more managed services

so,

we don't have to take care of, you know, patching our database software to make sure that we have the latest security update.

I do think that managing access to the data itself, that's a concern that is going to be with us for the long term because I don't see how that can get solved at the infrastructure level. Like, there's no 1 size fits all, and a database vendor cannot make a decision about who's going to be you know, who needs access to what what data. That's a that's a customer, consumer

decision.

So,

I think that's going to be a concern that we're going to have to deal with for a long time. Another interesting element of this question of data security is that

particularly back in the nineties early 2000, that

largely probably just meant, can I control access to my data warehouse, at least from the perspective a of an analytical workflow?

And now as we have proliferated the different types of data systems and the ways of working with data and the ways that it's being used, that has drastically increased the overall surface area of the problem. And in your experience as somebody who is working in this space of managing data security and data access control,

I'm curious what you have seen as a typical order of magnitude of the number of data locations that an organization is trying to manage access and permissions within and some of the,

challenges that they're facing in terms of being able to

get a holistic approach to that data security and access problem?

Yeah. It's a great question.

I I totally agree. If you go back 10, 20 years, then things were much simpler.

Companies,

organizations

didn't have as many

different data platforms as they have today.

It was, you know, you had BI.

You had reporting.

You had transactional databases supporting applications,

but that that's just that's just what you had. And

use cases around consumption of data were quite limited to, again, BI and reporting. Also, the amounts of data

that companies

process, store, collect, and use have grown significantly.

What we see from our customer base is,

on average, you see many thousands of, let's call them tables, could be, like, buckets and files and other stuff, but thousands and on thousands of of

data assets that need

to be, managed. Access to them needs to be needs to be managed. There's obviously

the regulatory environment

that has grown significantly more complex and restrictive.

And when you combine all of these things together,

it's a lot it's a lot to handle. Like, if you think about your

our our average customer,

multiple data platforms, you know, 100 or thousands of users who need access

to thousands of data assets,

and all that needs to be, you know, conforming to various types of regulatory

systems and frameworks,

it's,

it's a pretty tall order for any organization to be able to handle effectively.

And another aspect of the problem space that has grown in complexity recently is the question of the regulatory environment where,

for a while, you were mainly concerned about the regulatory aspects if you were working in health care or finance and, otherwise,

you know, you wanted to be a good steward in of the data, but you didn't have as many legal concerns to think about. Whereas now with the advent of GDPR and CCPA and just the

broader awareness and understanding of data privacy and the impacts thereof,

it has made this overall space of security and kind of the the,

ways that beta is being applied and where and by whom a much more complicated problem space. And I'm wondering, what are some of the main

kind of buckets of challenges that teams are facing in trying to

prioritize

what security controls they need, what security controls are kind of a nice to have and beneficial, but not absolutely critical, and some of the ways that they're thinking about

how to tackle the overall problem space of

data security, data privacy, access control,

and the the the ways that those factor into the kind of compliance and regulation

aspects?

Yeah. So I think you can think about this in in 2 main

layers or levels. The first 1 is is is more high level, And that's basically,

I would say, organizations

have to do

the right thing. And I don't think it's hard to know what the right thing is. So it starts by

implementing

the table stakes of data security.

As I said before, all the vendors today, they provide data encryption at risk. They provide encryption of data in transit.

They provide pretty good basic security controls for managing access to data. So I think the first thing that organizations

have to implement are all of these

pretty basic

controls. And most of them, they get out of the box. And then they have to take into account

why they're doing what they're doing with the data and whether that's

aligned with

the the, what they're allowed to do and their purpose. I I don't think the regulation is as complex.

I think it aims to to protect,

data subjects from their data being being misused.

And I think organizations

that are generally

that want to do good and want to use the data in a responsible way, they can they can really do that. And as long as they match the purpose of why they're doing things

to the appropriate control. So for example, if I'm in

customer success and I don't have to see

all of your

personal data or just some of your personal data,

that's something that that companies need to take into account and and need to implement. I think where the challenge is is

how easy it is to go and implement

those more complex controls. That's where there's a gap

between what the technology provides and what

organizations need in order to be able to move fast with their data,

but still remain compliant and and secure and responsible.

The other interesting element of this is that there's always the

gradation of how much of the problem is technical versus how much of it is organizational,

and

trying to map some of these technical concerns around the business requirements and the requirements of being able to ensure that you're not blocking progress and productivity for the people who are supposed to be just getting their work done. And I'm curious what you see as some of the areas of trade off or some of the areas where data security needs to

have some measure of compromise in the interest of ensuring that the business is able to kind of have the access that they need and be able to get their work done. Because if it's just a technical problem, then, yeah, we can add all the security we want, and it'll be perfect. Nobody will access anything. There there's no questions about data leakage, but then it's no use to anybody. So I'm curious what are some of the gray areas of compromise that organizations

need to work through on their own?

Yeah. So I think it's a great question. I think at the root cause, I think it is technology. And and

I can talk all day about why the technology is lacking.

But what it creates, it creates an organizational

problem. It creates a procedural problem because

the lack of great technology

to secure access to data or data in general

is lacking the ability to roll it out in a safe

and easy way. So the technical problem creates a change management problem

to organizations.

If you go into, as I said, some of the controls are already there. But as you said, it's you can implement them, but then no 1 is gonna have access. So I think the the thing to crack here is how organizations

can roll out these controls

in a way that doesn't, you know, doesn't

stop everything they have to do with data, and they have to rethink it. And it's it's almost like replacing the tires on your car while while driving at 60 miles per hour. Right? It's,

data is used

all the time, like 1, 000, 000, 000 of millions of queries a day in any sizable organization.

How do you

the big question is how do you roll out these controls without stopping

those businesses from doing what they need to do? That's

that's where the,

that's where innovation has to come

and and help us, and that's where we focus on. And as a vendor in the space of data security and somebody who's working very closely with some of these organizations to help them figure out how to manage that balance, What do you see as some of the broad categories

and the effective boundary lines for those different elements of data security that we were discussing earlier where there are questions of the database security of who has permissions on what objects versus, you know, where where you're spanning across multiple different discrete

storage locations or compute locations, and you need to figure out what are the kind of RBAC or ABAC policies,

where and how do I need to apply masking,

and just some of the ways to think about how to kind of bucket those concerns so that you don't lose your mind trying to figure out an end to end solution for everything all at once.

Yeah.

So that's that's actually 1 of the problems that data security suffers from today,

that despite

being an age old practice,

it's still

there's still no playbook. And in other areas,

in computing, in IT

or security,

you have playbooks. Everyone knows

that they need single sign on. That's the way to manage application, access to applications at scale. Everyone knows they need to protect their websites from all these different

sorts of attacks, types of attacks. In the data security space, we see this playbook

emerges

as we speak.

Gartner has a very good perspective on this. They actually coined a term called data security platform.

And a data security platform are solutions that bring together several of these, what now is a bit disparate capabilities

in the area of data security into a single product. So that's good. If 5 years ago you had to buy 5 products, do 5 different

things, today you can get 1 product and get these 5

integrated

into 1 solution. And usually, these products, data security platforms,

they don't support just 1 type of, you know, data platform vendor. They support multiple.

So that's, that's definitely a bright spot

in in how this space is evolving. I think that and what Gartner suggests, and it also follows the 0 trust architecture

concepts,

is it's best to focus on implementing

late binding controls.

And what I mean by late binding controls,

these are controls that come into effect or enforce or are applied

as late in the data access

lifecycle as possible. So to give you an example, think about

static masking versus dynamic masking. Static masking is an early binding control.

The control is implemented, is enforced way before anyone is even accessing the data, as opposed to dynamic masking, which is a control that is enforced

when someone is accessing the data. Obviously,

dynamic masking is much more flexible

and appropriate for today's environment than static masking because it doesn't require a lot of up front investment in going and

creating copies of your data and and masking all the data upfront could be a lot of data. And because

use cases change, because access patterns change,

dynamic masking is more appropriate because it's more flexible.

You can create new dynamic masking rules to fit your new use cases

rather than go and create more copies of your data with static masking. And I think the same

is same applies to role based access control and attribute based access controls, RBAC and ABAC in short.

Like, RBAC is more

let's plan everything up front. Let's create our roles.

Let's give users, you know, let's grant these roles to our users. And then

by, you know, my birthright as having this role, I'm going to have different

access levels to data. Where attribute based access control and, you know, if you take it to the extreme 0 trust architecture basically says that it there's no birthright access. Your level of access will be determined by different attributes

that will be evaluated

when you access data.

And, you know, you can take into account a lot of different attributes. It could be the department

in which I work in, in the company. It could be my office location.

It could be

which network I'm using to access data

or which client tool I'm using to access data. You can also have

behavioral aspects.

If you have, you know, flexible enough policy engine, you can say, well, if this user

has been consuming

a lot of sensitive data in this session,

maybe it's a good idea to validate that user's authenticity

or raise

an alert or something like that. All of these things

are more late binding

than early binding. And I think when companies

start to thinking start thinking about how we build our data security

program,

that's where they need to focus on those late binding controls.

Another

interesting element of the security

problem is we're talking about data security platforms, but security has also been a concern for, kind of, IT and application systems for many years as well. And I'm curious what you see as some of the reasons that those 2 have largely

evolved along separate tracks and what you see as some of the potentials for kind of unification of those concerns and some of the ways that they are distinct problems that need to be solved in their own way?

So despite data or securing data being a very hard problem, I think it has more chance of being solved universally than application access. And the reason I say this is because if you think about 2 different applications,

they can they have totally they can have totally different

domains

of of of knowledge and domains of expertise. Like 1 application could be a financial application and another could be an H. HR application.

Controlling access

to both of these applications in a unified way

is it's a bit impossible

because they they don't have the same objects.

They don't have the same

concepts.

However, when you think about data, data is actually more organized and it's more uniform.

And yes, you have a Snowflake table and you have a S3 bucket and you have a, like, a collection in MongoDB.

But you can make you can you can derive similarities

between these

different data assets and also

the types of

operations you would want to

perform on these data assets are somewhat similar. You can update a table, you can modify a collection, you can write a file into

a cloud storage bucket. So with data, there's actually a chance of building this this universal

language, this universal

set of concepts

that would be applied to many, many

data systems, maybe all of them or almost all of them. And so that's where I think there's

an opportunity for data security platforms to play a role. Whereas if you think about the application space, the most advanced,

you know, we went into is we got to is just

doing authentication

and authorization

is largely kept at the application level. So if you think about any employee in any, you know, reasonable company today, they have, like, a single sign on solution, could be, like, Azure AD or Okta or all of these

systems. Basically, what these systems do is they do authentication

into applications. And they might transfer some data about the user to the application, but then the application

has its own security engine, its own authorization

engine to, like, control what actions

that user can do on on its objects. With data, because you can derive these similarities,

there's a real chance here to unify the space. And that's what gets me excited because I'm an infrastructure guy. I I like things to be organized.

I want things to be as uniform as possible because that's where we derive

efficiencies. That's where companies can move faster with, with with the data that they have. And another thread that we've started to touch on is the question of productivity

on working with data

and some of the ways that it can start to become at odds with data security controls. And I'm curious

how you have seen that manifest

both in your own work and in some of the ways that you are working with some of your customers at Satori

and some of the some of the ways that that tension can lead to bad data practices?

Yeah. So we have 1 of our customers.

It's the biggest

Fintech in in Canada.

1 of, the folks that we work with there, he has this saying that

you need to make the secure way the easy way to use data. Otherwise,

folks are just trying will just try to go around it and, you know, try to avoid using using the secure way or or just get stuck. So there's,

definitely

a trade off if you don't use the right technology between

productivity

and security.

And

what our goal is, is to eliminate

that trade off. We strongly believe that companies can be both secure and productive. And I'll tell you another story from

1 of our other customers,

another well known SaaS technology provider. And before using Satori, they told us that their Slack channel

with with the DevOps team who manage the access to the hundreds and hundreds of different databases they have on on Amazon. That Slack channel was filled all day long with requests.

Hey. Can you

grant me access to this database because I need to troubleshoot this ticket in Jira? Hey, can you open access for me to this system? So that company,

they process a lot of sensitive data. They're very responsible

in how they, you know, see their role as processors and custodians of that data, but they didn't have the right technology

to secure it. So they were suffering

from a severe productivity problem. When they started

using Satori,

we helped them automate all of those all of those,

manual access requests and

implementations of these requests

into

something that was very or is very convenient for their data consumers to use. You know, they have a Slack app they can request access from or even grant themselves

access in a sole service way as long as their purpose for accessing the data is legit. And the way they mitigate

that is by having Satori dynamically mask sensitive data, for example.

So we help them, like, create these really simple workflows,

but yet very powerful workflows that help them

both protect the data, get the right data to the right people

immediately

just in time, and, you know, not have them wait for data.

And that that was a that was a really

big win for both,

them and us seeing that happen.

And you mentioned that the kind of the goal is to make the secure way the easy way, and I'm wondering if you can talk a bit more about some of the ways that

the kind of education

of the motivation behind some of the security controls

can help to

encourage people who maybe hit a little bit of friction and just say, ah, I just wanna go and, you know, directly access the database instead of going through this proxy or going through this procedure to to get the appropriate access. Some of the role that education plays in that overall

environment of making the secure way the easy way, but also encouraging people to understand what is the secure way and why does it matter?

Yeah. I think it's a great,

it's a great question because like any other area

involving security,

awareness is key in in mitigating the risk. Because

unless, you know, people are are aware of the risks to the company, to themselves

in not using

the tools that and information that they have been given

by the company in a responsible way,

then I would maybe argue that it it would hard to find

an effective

security system that would be still effective if people are completely irresponsible. You know, it's always us humans who are the the weakest link. So I think, first of all, you you have to educate your employees

on what is this data that we're collecting.

Is it patient information?

Is it, you know, personal information? Is it financial information?

And talk to them about what could happen, not just for the company, but also for those people who own that data, that they belong to them. What can happen to these people if we're irresponsible

with handling the data? We can talk about identity theft. It could be insider trading.

It could be

health related

issues.

It could be fraud. It could be a lot of bad bad things. 1 thing hasn't changed

is that bad actors

will always try to leverage

information,

would always try to leverage

companies and systems

to

to gain financial

advantage. And

all that data that

a lot of companies collect today that is sensitive is is, you know, has a lot of value

in some markets.

And so I think talking to people about, you know, what's right, what's wrong is very important because, you know, you can you can do all sorts of things if you really wanted to to get access to to sensitive data. Or maybe you you do things the right way, but then you're a bit careless

with other things. So I think it's very important to have these conversations.

Make sure your employees or data consumers are aware,

and and they act in a responsible way with, the data and the tools that they're they've been given.

And

another

way that security can become challenging is

when you try to bolt it on to an existing system or try to implement it as an afterthought.

And I'm wondering what you have seen as some of the impact of incorporating security in the early design and implementation phases of a platform

versus

just focusing on the functional aspects of I need to be able to get data from here to here, process it this way, and then send it over to this other place. And then say, okay. Now I need to figure out the security protocols around these data flows versus saying, okay.

Upfront, I need to be able to get this data here to here. How do I make sure that it is properly secured? How do I make sure that I'm only pulling the data that I want into this other system versus all of the data that might have PII or other sensitive information?

How do I process it in a way that I'm being cognizant of? What are the fields that have that PII data?

How do I make sure that I'm only exposing the attributes of this data or only processing the aggregate attributes of this data in a way that I'm not going to be, you know, violating any sort of compliance issues or, you know, data governance or policy issues and just some of the overall impact on the effectiveness

of those security controls when they are designed up front versus bolted on afterwards

and some of the impact that it can have on the kind of total delivery time of a given data project?

Yeah. So, I I think it's

obviously

better to plan more upfront

than

having security as an as an afterthought.

The way I think that

principle

needs to be applied in the modern data infrastructure is not necessarily

by

baking all of the security concerns into your data engineering

concerns.

I think the best way to address that is to have a a component in your data stack that is responsible for these data security aspects across the board because

your, you know, your data intake is gonna change.

You might introduce

more data platforms or tools into your environment.

And

having something that is decoupled from the data layer, something that

can adjust to your changing needs,

I think is the better way of implementing what you mentioned,

which is

planning planning upfront.

Because

it also if you don't do that, it also puts a lot of dependency on your data teams

because they are the ones who would have to go and implement all of these controls in all of these different systems. They're gonna spend a lot of time on that, which is gonna take them away from their core activities, which is to generate more data products and deliver more data to more people. So

I do think

that it's a big challenge, and that's why,

you know, companies like us are trying to offer this alternative

and decouple those security concerns from the actual data layer.

Join in with the event for the global data community,

Data Council Austin.

From March 28th to 30th, 2023,

they'll play host to 100 of attendees,

100 top speakers, and dozens of startups that are advancing data science, engineering, and AI.

Data Council attendees are amazing founders, data scientists, lead engineers,

CTOs, heads of data, investors, and community organizers who are all working together to build the future of data.

As a listener to the Data Engineering podcast, you can get a special discount of 20% off your ticket by using the promo code data eng pod 20.

Don't miss out on their only event this year. Visit data engineering podcast.com/datahyphen

council today.

And as far as the

ways that data teams are thinking about security controls and data privacy,

what have you seen as some of the notable shifts or evolutions in the space from when you first started, I think it was about 3 years ago, to where you are today, and some of the overall

visibility and understanding

of the challenges that are involved, how to address them, and what are the kind of necessary controls around ensuring that their data is being used appropriately?

Yeah. So a lot has changed,

in the past 3 years. I think when we started having conversations with

data driven organizations,

it was

more educational.

They felt like they needed to understand

and learn from us what are the best practices

and what they need

to, to take care of and what's the most important thing to take care of. I think if you look at conversations we have today

driven by the maturity of the market,

and the complexity that is growing

is, you know, companies are much more

educated.

They

understand that

the first thing that they need to take care of

is the access piece.

And what I what I call the access piece is who can access what data and and how,

and

making sure that their

admins,

DBAs,

data engineers

are not do not have to get involved in those on a daily basis. So that's the first thing that they,

you know, they know they need to take care of. And on top of that,

it's,

how can we protect that sensitive data and and, you know, deliver more data

as we and and desensitize it. So,

for example, I wanna give very broad access

to a sensitive dataset

as long as

the the sensitive data is being, you know, properly handled, for example, dynamically masked.

But then,

if someone needs access to the actual sensitive data,

then,

you know, let's have them go through an approval process with the data owner to make sure that, you know, they're doing it,

according to the right purpose and everything is, you know, they check all the boxes. So, yeah, I think companies

today realize that, you know, they need this component. They suffer.

Data teams that we talk to, they suffer from

being overloaded

with managing

access.

They, you know, they have different workarounds and different systems.

And 1 of the funny things that I didn't expect coming into this space, but I learned in in in by by operating in this space that, when you talk about really basic

security controls or paradigms,

let's say, role based access control, they're implemented completely different

in in different systems.

Like Snowflake has RBAC, BigQuery has RBAC, other systems have RBAC. These are completely different implementations.

It's very hard to derive similarities between these implementations.

And, you know, guess what? Data teams then have to go and and deal with all that complexity

about how

a certain engineer

7 years ago at a certain company decided to implement what they thought is our back.

And it's obviously not aligned with that, you know, the other engineering, that other company,

8 years ago that decided to do something,

that sounds the same, but it's actually very much different. So that's,

that's a funny funny little nugget. Absolutely. Yeah. It it's definitely funny how the the same word or the same term can come to mean so many different things as you span engineering organizations

and particularly

the kind of generational technology shifts. And then another aspect of the kind of data security industry that's interesting over recent years is that it has kind of grown to encapsulate more

considerations where, as we were saying, it, you know, was just, can you access the data warehouse and what tables can you access? And now it's also all the data masking and,

you know, auditing. And I'm wondering

what impact the

recent growth in metadata systems has had on the

capabilities and effectiveness of security controls where you are more able to identify,

you know, at ingest, this field has

personal address data. And so now I'm going to propagate that labeling

throughout the,

you know, different transformations and different stages of the data life cycle so that I can maintain visibility of what fields do I need to be concerned with from a security and governance perspective,

and what are the fields that are know, in the clear and I don't need to worry about, and some of the ways that those metadata and lineage views are being integrated into the security and compliance regime.

So we we definitely have seen

advances

in metadata management in recent years.

3 years ago, if I were to ask a a 2 1, 000 person organization

whether they are thinking about implementing the data catalog. They would say, no. That's not for us. It's only for, like, really big companies, and you you need, like, a 10 person team to manage that. And that's not something that we're gonna do. You see more and more companies adopting metadata

management solutions for different purposes.

Some just want to have really good documentation

on their datasets

so folks can just know what they're accessing.

Some,

are more interested, as you mentioned, in being able to classify the data that they have and understand

where they have sensitive data. You know, the the higher the quality of the metadata that's available in the environment, the better,

security tools can can become.

Obviously,

there's no dependency

between implementing a security tool and a metadata

solution. Most security tools know how to create their own metadata,

classify their own data, and and so on with you know, without having you supply that externally. But if that's available, it's a plus. Also from a workflow perspective,

metadata

solutions

typically

don't get into managing

access to data,

although that's shifting a bit.

They

feel like they do want to create or or be involved in more parts of the the data life cycle than just, you know, the documentation piece. You do see an interesting,

like, subspace

in the data security

market, which is called data security posture management, which is a very hot topic right now in our industry,

which basically

it's aimed more for the security leaders

than than the data leaders and data teams. And it's all about, hey. We need to understand, like,

where are our data assets? What type of databases we have? Are we properly securing these databases at the infrastructure level? Do do we have the encryption checkbox turned on and so on? So that's something that is, you see many many players

addressing. It's not a big problem to solve, but if if I were a consultant going into a company today and, you know, trying to help them,

work on, you know, improve their data security, that's the first thing I would do. I would, you know, go and understand where the data is, what type of platforms, whether it's sensitive or not, and whether whether the table stakes, whether the the checklist is is is covered.

And in your experience of working in the space and working with customers, what are some of the most interesting or innovative or unexpected ways that you've seen teams approach the challenge of data security and aligning that with the productivity needs of the organization?

So I have a really good example from 1 of our customers, which I think it was last week or a week before I had a conversation

and with and,

they told me about their use case and and what they're doing. So that's a that's a US based

technology company that is processing a lot of patient data from from really big US based hospitals.

And they have

hundreds

of

databases

on prem.

They have twice as many databases,

in the cloud, and they have

to manage access to all of these,

databases in

something that would be, you know, considered a uniform way. Otherwise,

it's very hard to both stay productive and secure. So what they actually built was this

system on top of Satori

that meets their users

in their existing business flows.

And what I mean by that is, for example,

let's say 1 of their customers opens a support ticket, and that support ticket gets assigned to,

a support engineer in Salesforce. What they implemented, they implemented a hook into Salesforce that automatically grants

access to the relevant data

for that specific customer to the support engineer,

assuming they would, you know, need that level of to go and troubleshoot the issue. That happens automatically

as the ticket gets assigned.

You know, if ticket gets assigned to someone else, that person loses their access and, you know, the new person gets access. And when that ticket is resolved, they all lose their access. So if you think about this, they didn't really have to go and modify how they work with data.

They didn't

have to implement

a lot of new

business processes,

for their employees. Obviously, they invested a lot in in their back end to make this happen, and, we,

you know, helped them with technology and the tooling around that. But they basically

effectively eliminated

all of the risk around overprivileged

access to patient information

from their users because they implemented this flow and they're as productive as they were before,

but taking on much, much, much less risk.

And in your experience

of building Satori and working in this space of data security and the technology controls around it, what are some of the most interesting

security

controls,

which I mentioned before, and how things that you would expect to be, you know, security controls, which I mentioned before, and how things that you would expect to be quite similar are actually quite quite different even though, you know, they have the same name. I think 1 of the

1 of the things that keep keeps on surprising me is how

maybe it shouldn't surprise me because it's human nature, but how data teams

prefer to be self sufficient and independent

as they implement their projects. And I'll give you an example. We have this

attribute based access control feature where you can have

use attributes on users, and you can use that in your policies. Let's say that I have an an attribute that says that I'm part of the Israeli office, and then I get access to a certain database on that app. And you would think that, you know, the source of these user attributes would be your identity system

where all of the, you know, system of record about your users, employees, and all that stuff is being managed.

However,

we hear from data teams that not necessarily

they have access

or cooperation

from the teams managing those identity systems

to go and implement all the things that they need. And sometimes they get pushback.

Like, we don't want to put these attributes in our system because, you know, it's not aligned with how we see the world, and it's not it's not an identity concern. You go and figure this out.

And so,

in some cases, they come to us and ask, hey. Can you help us, like, manage these attributes on the story side? Because

we don't have anywhere else to go.

We're being blocked. So that keeps on surprising me. When we started out, I thought

mistakenly that the world is going to be very organized, like identity in identity

systems, metadata in metadata systems.

You know, but,

I think it's it's it's a bit messier.

And, obviously, there's the human and organizational

aspects that that come into play that require

capabilities that I I didn't think we had to build, but we're building. So that's good. Yeah. And then you run into the situation where the identity and the metadata system is all just an Excel file somewhere.

I read into that last week.

Exactly.

Exactly that.

And from your perspective

as somebody who is building a data security

platform, I'm wondering what you see as some of the areas

of improvement that the industry needs to focus on and some of the ways that we can help to unify these concerns and make it less of a point to point to point to point solution?

Yeah. So we've we've made a lot of progress as an industry, as an industry, and I think that things like managing permissions to data,

with the help of data security platforms, obviously, and things like dynamic masking

are largely

are largely being handled in a pretty good way for a pretty

good amount of the data platforms and use cases out there. 1 of the things that

we hear from

more forward

thinking organizations

is encryption of data at risk, but not at the infrastructure level, meaning not at the disk level,

but at the application

or, you know, sometimes they call it client side encryption. What it basically means is that let's say that I use

PIC any,

you know, SaaS based,

data platform vendor today.

They offer encryption. But if I don't want to trust them

with access to my sensitive data, then I would have to go and encrypt it on my side before I load it into their system to prevent them from

somehow,

you know, getting breached or

accidentally

or

otherwise

have the option of accessing my data. It's a pretty advanced use case. And while the technology around encryption

is super advanced,

the technology around integrating

encryption

into the data layer

is still very much in its infancy

from a it's still very much complex.

It slows things down

and that's something that I think would be a focus area. Well, not now, but maybe in a few years. But we do hear from,

you know, the larger financial institutions

and more forward looking companies that that's something that they would like to do.

Are there any other aspects of the

overall practice of data security and data security platforms and some of the ways that they factor into productivity

of data teams and organizations that we didn't discuss yet that you would like to cover before we close out the show?

I I think the, you know, the main takeaway

for me, for our audience, is to think about with all these different options on how to secure data

is to think about this concept of early binding and late binding.

And the fact that if you secure the point of access to the data, that's the most effective

and flexible

way of solving the problem.

And looks like this is the playbook that's being, you know,

commonly developed,

more in the industry. There are some really good Gartner write ups about data security platforms. I encourage you to

to check out and educate yourself about,

this new component that is

materializing itself into the modern data stack. And,

you know, there are solutions out there and and common problems,

and common solutions to do your research. And,

I think the space today is much more

advanced than it was a few years ago, and you don't have to solve everything by yourself.

Not everything has to be solved in SQL in some view or a database. There are better tools today.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

So,

the biggest gap is

I think integrating

all of these

things together.

You see a lot of vendors in the data space

suggesting

and,

promoting

different

different solutions to the same problems. There's there's no standardization.

I I like to I like to talk about the application space where it's very, as I said, a lot there's not much overlap, but then

you do have some areas where you have overlap, like single sign on with

or or other types of protocols.

That's not something we see

on the data space so much.

Like, I haven't seen a unified way of managing access. I haven't seen

a unified

security protocols.

I think that's a really big gap in in data management when, you know, when you talk about data security.

And

I think the industry's

leaders

should really step up and and work together

on building these building these protocols

and not just compete on price performance and on features

and and help data teams

improve the the the operability

of the different systems that they're

using. Because at the end of the day, and I think most vendors understand this, there's no 1 size fits all. And then, you know, companies will always want to select best of breed. But then they want all these things to be

reasonable reasonably integrated and working well together

so, you know, they don't suffer from

productivity losses or security issues or or other

negative

consequences

of their choices.

Absolutely.

Well, thank you very much for taking the time today to join me and share the work that you're doing at Satori and your perspective on the overall space of data security and some of the ways that it can be aligned with and not at odds with

productivity. So appreciate all of the time and energy that you're putting into,

making that a tractable problem, and I hope you enjoy the rest of your day. Thank you so much. Thanks for having me, and you too. Have a great rest of your day.

Thank you for listening. Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story.

And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links