Cloud Native Data Security As Code With Cyral

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Are you bogged down by having to manually manage data access controls, repeatedly move and copy data, and create audit reports to prove compliance?

How much time could you save if those tasks were automated across your cloud platforms?

Immuta is an automated data governance solution that enables safe and easy data analytics in the cloud.

Their comprehensive data level security,

auditing, and de identification features eliminate the need for time consuming manual processes, and their focus on data and compliance team collaboration empowers you to deliver quick and valuable data analytics on the most sensitive data to unlock the full potential of your cloud data platforms.

Learn how they streamline and accelerate manual processes to help you derive real results

from your data at dataengineeringpodcast.com/immuta, that's

imuta,

and get a 14 day free trial. And when you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With their managed Kubernetes platform, it's now even easier to deploy and scale your workflow, so try out the latest Helm charts from tools like Pulsar, Pachyderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode,

that's l I n o d e, today and get a $60 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Manav Mittal about the challenges involved in securing your data into the work that he is doing at SIRAL to help address those problems. So, Manav, can you start by introducing yourself?

Hi. I'm Manav Mittal, the founder CEO of Cyrel.

Cyrel is the first of its kind Cloud native solution

that makes it easy to observe,

control,

and protect the Data Cloud.

SIRAL intercepts all activity

across all your data repositories,

applies granular access controls,

and stops anomalous behavior,

all with 0 impact on performance.

And do you remember how you first got involved in the area of data management?

Oh, yeah. Of course. So there was an early big data startup back in the day, which I joined in 2008

shortly

after graduating from school. The name of the company was Aster Data.

It was eventually acquired by Teradata,

and that's where I met my cofounder

and CTO of SIRAL,

Srini.

So that's where both of us got our start in data management, databases, data warehousing.

And then after

Aster was acquired, we followed different paths, but we landed together again here at SIRAL, protecting

data management systems this time.

And so you mentioned that at SIRREL, you're working to help with managing security and observability

of data resources in cloud environments. I'm wondering if you can give a bit more of description about what you're building there and what motivated you to build the business focused on those aspects of data security in the cloud specifically.

Sure thing. So what we've seen in the last few years that prompted us to build SIRL, Tobias, was the emergence of this data Cloud.

You have all the operational data,

all the business intelligence data,

the system of record type data

that companies used to host in their databases, their data pipelines, their data warehouses,

we are seeing all of this data move to the Cloud where it now lives in services like Snowflake, BigQuery, Redshift, S3,

and gets

analyzed using tools like Looker,

like Tableau,

and gets processed using tools like Fivetran, Databricks, Kafka, etcetera.

And these are all now different third party SaaS services

where the crown jewels

of every organization

live.

This adoption of Data Cloud has made things very simple,

very manageable, very agile

for

engineering teams, data engineering teams,

DevOps teams.

Businesses benefit from it in lots of different ways.

However, it has really complicated things for security teams.

We saw an opportunity

to build a tool that really

made it easy and simple

for security teams

to guarantee better

security and better overall manageability

over all this data splattered everywhere

in the Data Cloud.

That's why we built SIRL. It's an interesting anecdote actually.

The name SIRL

is derived from a word in my native tongue, SIRL,

which means simple.

That was our North Star, that the Cloud is making everybody's

life simple, and we should leverage the same trends and the same constructs to make it very simple for security teams to collaborate with their peers and secure their data.

And in terms of the data resources specifically, what is it about cloud environments

that introduces so much complexity in terms of being able to maintain the security of those resources? And what are some of the common issues that arise when trying to work with databases

or

object stores or data lakes or data warehouses in Cloud environments?

So there's 2 issues which are at the core of it. 1 has to do with accessibility,

and the other 1 has to do with tooling. Let me explain both.

If you're a traditional large enterprise,

think of a large bank, they would have thousands of people

in their IT and technology department,

and very few, a very small number of them would actually know where the database is.

Now fast forward to this new world

where all the assets, all the applications, everything is moving to the Cloud,

you just have to know the name of the bank to know where their database is in Snowflake, where is it in BigQuery, etc.

These data repositories

have become a lot more accessible and that was by design.

That was to enable data democratization, that was to enable agility for application development.

But what that does is

it takes away these various different

layers of defense that had been implemented

in front of these databases and data warehouses that are now irrelevant.

That increases

the propensity

for which this data can be breached and this data can be stolen. That's 1 to bias.

The second 1, like I said, has to do with tooling.

Companies are moving to the Cloud. They're adopting

their DevOps team, development team. They're adopting infrastructure as code,

models for development and deployment. The goal of all this is to iterate very quickly

on the

development, how they want to set up their infrastructure stack, how they want the different services to talk to each other.

Historically, security teams used to come after the fact that once the engineering and IT team set up a deployment in place, they would come in and they would put all the security controls and policies in place to make sure it cannot be compromised.

But now the cycle time of this deployment is

several orders of magnitude faster,

and that makes it really, really hard for the security teams

to keep pace

with

the engineering and IT teams working in the Cloud, and that's the other piece of this problem.

And in terms of the

breakdown of responsibilities

for security of the resources and provisioning of them and the data engineers who are working on populating information into and out of these

systems, how does SIRREL help with

unifying

those different roles and responsibilities

to align them along being able to ensure that the databases and data warehouses

are

operational and easy to deploy and maintain and interact with while also still being able to maintain the necessary security elements of things like encryption

and TLS

and auditability?

Yeah. So that's a very interesting question, Tobias. What we saw

was a very latent need

for

truly enabling security to be a shared responsibility

between the development, the DevOps, the IT, and the security teams.

What they lacked or still lacked actually

are the right tools

and the right processes

around which they can collaborate.

That has been a general area of investment and innovation for a data Cloud, where

any application,

the same approach but to the data Cloud.

Any application,

any service, any user that wants to talk to a database or a data warehouse or a data pipeline, we can help secure that interaction and that communication.

The way we enable that is

we

are championing this methodology called Security as Code.

The idea here is just like you're using Infrastructure as Code based workflows

for an application and infrastructure deployment,

you can use a security as code model for deploying all your security tools, all your security policies, all your governance

constraints

into the same workflow.

As the engineering, Dev, and IT teams

release new applications or upgrade their infrastructure or change their deployments,

the security controls

stay in place in lockstep with all those changes.

That is what really enables

companies to adopt our solution very quickly and roll it out very quickly.

In terms of organizations that already have deployed resources, they've got databases running either on dedicated virtual machines, or they're using something like RDS or Google Cloud SQL,

or they've already got a large amount of data

in various s 3 buckets. How do they go about using something like SIRL to be able to

identify those resources and the information that they contain and be able to apply

appropriate security policies to those existing resources?

A lot of these companies

that we are working with, Tobias, we're actually catching them at the right point. Right? Like, 3 years ago, if I came on this podcast

and threw around the word data Cloud, very few people would actually understand it. Right? 3 years ago, if I went outside of a few select pockets and through the term Snowflake,

they would not understand that I'm referring to a Cloud based data warehouse.

Just in the last 1 year,

all these

models, all these tools, all these solutions have literally caught on fire.

The larger enterprises that we work with, they're all, at this point, either just beginning to move to the Data Cloud

or just

planning to move to the Data Cloud in a pretty big way,

and that is where we start working with them. So for all the legacy databases and data warehouses that they have, which are on prem or

that have been deployed and secured using their existing investment. They're happy with that. They work with us to deploy this next

generation, this new architecture

of these data engineering

components that they're deploying.

Did I answer your question, Tobias?

Yeah. I think the main thing I was driving at is just

whether

the sort of complexity arises from existing resources and being able to discover them. And if the primary issue that you're seeing is around

sprawl of data and how it's being propagated across

different systems, then you don't necessarily have the visibility of that. Or if it's for people who are deploying resources, and they want to ensure that there is an appropriate amount of security applied at the time of creation.

Oh, I see. Yeah. So the way to think about it is there's a lot of

sprawl like use the word at the data layer where companies would be using different kinds of databases, different kinds of data warehouses,

different teams inside the same organization, the same company would be using different stacks for accessing, managing, manipulating, analyzing the data.

What we have taken in approach is that we're going to embrace all these tools and all these different repositories and work uniformly across them. It sounds very audacious and very grandiose,

but the key insight here is because we are focused on these data repositories,

there's like a small set of grammars. Right?

Very small set of grammars that we have to be effective for, which gives us a very wide coverage

across all the different components

in a typical customer's data stack.

Now with SIRL, when they think of security

policies, when they think of access control policies,

they can almost forget

about whether behind the scenes it's a MongoDB or whether it's Snowflake or whether it's S3.

They just think in terms of information types. Like, for example, if it's an ecommerce company, they would say that, look, the really valuable information for me is credit card data associated with my customers or that name and direct shipping address.

Regardless of where it is, I want only these people inside the organization to be able to access it, and only under these circumstances

should an application be able to query them. And Sirel takes care of that diversity, that heterogeneity

in a very uniform way.

And as far as the types of security issues that you're seeing,

what do you see as being the most problematic natively

deployed there in the first place? And some natively deployed there in the 1st place. And some of the common

points of confusion

or

lack of understanding as to the impact of what they're building and the particular

settings that they need to take advantage of and maintain observability for?

Really big issue

that they see is, you know, when they move to the Cloud,

oftentimes, the transition to bias will be driven by agility.

Right? That they want to really

invest in a very modern microservices

based application. It's going to be all deployed using infrastructure as code,

and that's driven by a lot of different business reasons. It could be digital transformation,

just because

they want

to really enable data democratization.

Companies said that data is the new oil and the way we are going to really extract value out of it is by putting it on the fingertips of everybody in the organization, whatever the drivers may be. They end up with a very fast changing

technology stack where

application services,

oftentimes databases also, they spin up, they do something that they spin down. A lot of these services become ephemeral.

Across the board,

these companies invest a lot in observability.

They make sure that they're collecting

traces and logs and metrics

from everywhere,

from all the different systems and components that they have deployed in the Cloud.

Recently, if you've seen

investment in tools like Datadog, like SignalFX,

Splunk, etc, they've really shut up because of this particular reason.

However,

when it comes to their databases, their data warehouses, their data pipelines,

there is no good, easy, simple ways for them to even log that information.

For example, in a database that your application is talking to, if you turn on logging, it really syncs the performance of the database,

which has a downstream impact on the performance of all your applications, all your infrastructure.

It's getting very basic observability

out of your database, out of your pipelines, and all these systems is very hard. It starts from there because if you can't really see what's going on,

then protecting it, securing it, managing it becomes that much harder.

That's 1 of the big

drivers

of adoption of SIRL with our customers.

And so in terms of SIRL itself, can you talk through the way that the platform is architected and what's involved in actually

adding it to an existing environment and integrating with data sources like databases or object storage?

Yes. So at SIRL, like I said earlier, 1 of the north stars,

a north star for us

is simplicity.

Right? We want to build a service which is very ergonomic

in nature, very simple to deploy, easy for the teams to adopt.

The underlying technology

that we have built, which enables, which powers all of this, is what we call stateless interception.

It allows us to intercept requests

to any SQL

database, NoSQL database, data warehouse, data pipeline without

impacting

performance or scalability.

This interception service

is something that customers

run locally

in their own environment.

We call it a sidecar.

The sidecar runs locally,

it intercepts requests to any data endpoint that they may have on prem in the Cloud as a third party SaaS service,

and all application

service user tool requests get routed to a sidecar.

Sirel provides a SaaS based control plane

from which

a customer can see all the sidecars

deployed in their environment

and centrally,

then they can manage

what policies they want to enforce and who can access what data or get observability metrics

routed from the sidecar to the favorite tool of their choice.

Because

the sidecar is a very simple containerized service, customers are able to deploy it almost however they like. Some customers deploy it as a Kubernetes service, some of them run it as a self managed hosted service in their own Cloud.

Some customers deploy it using Terraform, CloudFormation.

You've seen all deployments.

I know that Cloud native has often been associated with Kubernetes and that that's 1 of the driving factors in that overall space and containerization.

What are the challenges

for organizations

who are haven't already adopted things like Kubernetes in terms of being able to

keep up with the rate of change as far as being able to

get databases deployed and maintained and keeping them up to date and

being able to leverage something like SIRAL if they're just using the sort of previous generation of cloud management tools like

Terraform and Ansible or SaltStack or something like that versus

fully leveraging the capabilities of container orchestration?

Kubernetes,

container orchestration, they certainly

complicate

the problem organizations already have with visibility and with security.

But even if you take all that away, right, and we think only about accessibility, there's like a customer that we are working with that, you know, has been growing really, really fast.

They are very heavily

invested

in data analytics.

They run their

data warehouse in Snowflake,

and every week, they would have people joining the organization who would get access to Snowflake. Right? And then because, you know, a lot of what this company does is analysis

of their existing consumers' data and figure out how to best engage them and what kind of, you know, offers to present in front of them. You know, all sorts of data ends up in Snowflake, and most of the organization has access to the data.

And at some point, it ended up becoming a big concern for the CTO that, look, I'm reasonably sure that we have good hygiene

in terms of, you know, what data will end up in Snowflake and what type of, you know,

applications and services are accessing, what kind of are transmitting,

what kind of information

to Snowflake.

However, he still wanted assurance and visibility and some guarantees

that that data will not be misused and PII information will not be visible

to any user by accident.

This has almost nothing to do now with Kubernetes. This has just to do with general rate of change where there's a Cloud based data warehouse,

namely Snowflake in this case, and

a very fast number of people with a RapidClip

are getting access to it and

their analytics

workload is changing with a fast rate. Even in that scenario, SIRL can be extremely useful where all

requests are monitored

by SIRL and it can help organizations like these

make sure that data is never inappropriately

accessed

even if by accident.

And that brings up an interesting point too as far as

how to effectively

prevent things like leakage of PII when you don't necessarily know

what is contained in a given data repository,

particularly for things like Snowflake or s 3 where

anybody can land semi structured or unstructured data into that source.

How do you prevent the leakage of PII

when you don't necessarily know what the schema is going to be ahead of time

or what the sort of rules will be for eliding a given record when it's not clear upfront what exists in the dataset?

Yeah. No. That's exactly right. Great. And this is exactly why

you need a solution that works seamlessly across all data repositories.

What will happen in a typical organization, there will be some tribal knowledge, if you will, around what data is sensitive

and where is it stored.

The challenge is that data will be living in some

database. It will then move to s 3. From s 3, it will move somewhere else. Then it will land in Snowflake. Then somebody will read the data, push it somewhere else. And very quickly,

people lose control

over

where the data is going, where it has landed,

and it is impossible

to express any kind of governance policies

at the application layer or in a disaggregated way saying that, look, this will be the governance policy for this database or this will be the governance policy for this tool. You have to think more fundamentally.

You have to think about it in terms of the data itself. That is exactly what SADL enables.

What customers will do is they will say, let's put SADL in front of a few data repositories. Now we understand

what the PII data is, where is it stored, what the structure is.

Once SIRAL starts seeing all the interaction, we can very quickly inspect all the different data flows, help customers

map out where the data is, where is it flowing.

And very soon, they start getting a lot of confidence

in their own assessment of where the data is flowing across the organization,

and then they can start putting in policies.

Look, it's okay if a developer comes in and spins up a new database in AWS and has an application

talking to the database

and pulling in data from somewhere else to populate the database. However, for this dev test type thing, there is under no circumstances

you should be reading PIA information about our customers.

And that is where SIRREL is very valuable.

The other thing is for the case where you do know what the schema is ahead of time and you map out the fields that need to be elided for particular roles, how do you address things like schema updates and being able to

identify when those underlying columns or table structures change so that the rules that you have defined upfront are no longer going to be valid and need to be updated? And how do you raise that awareness for people who are managing that infrastructure?

Yeah. So there's 2 ways that we help with this.

1 is, again, we enable

our customers

to think in terms of data types.

Customer says that, look, the information that I really care about could be somebody's age, it could be somebody's phone number, and it could live in this column, or it could live in this field, or it could live in this bucket. That schema, of course, you're right, it could be changing all the time. First of all, the policies are defined on these information types.

Then for the enforcement of these policies, because it's a security as code approach, we stay in line

with all requests going to a data endpoint as the applications and the underlying schema and all that information is evolving.

To update the schema,

for example,

the way customers use us, for example, is if

they see

a service or a human being or a DBA

update a schema or create a new table, they want to be flagged about it. Then they start tracking

that new schema or that new column or that new table that was created, and then you can start inspecting the data which is showing up over there. If the data is coming in from what used to be some sensitive data that you had tagged, then all of a sudden you know that you need to automatically

extend your policies

and your visibility

and your constraints

to that new data location as well. And that's how Sirel helps customers to constantly

stay on top of their evolving

data models and these schemas and tables and whatnot.

Today's episode of the data engineering podcast is sponsored by Datadog, a SaaS based monitoring and analytics platform for cloud scale infrastructure,

applications,

logs, and more.

Datadog uses machine learning based algorithms to detect errors and anomalies across your entire stack, which reduces the time it takes to detect and address outages and helps promote collaboration between data engineering,

operations, and the rest of the company.

Go to data engineering podcast.com/datadog

today to start your free 14 day trial. And if you start a trial and install Datadog's agent, they'll send you a free t shirt.

And going back to the concept of sprawl where before I was talking about

the preexistence

of various different sources of data and locations that they might be residing,

The other aspect of sprawl is where you have a well known place where most of your data is being located,

and then you're building various reports

or data extracts, and those get sent to different people and then copied into things like Excel. How does SIRL help with identifying or mitigating

or removing the need for that type of sprawl where the different elements of data are being copied into multiple different locations that you don't necessarily know

how they're being maintained or if they're being kept up to compliance with any sort of regulatory or security regimes?

This has to do with basic visibility.

I was talking to a CISO

recently. Right? And he had a really interesting perspective on this. And it was, like, look,

let's just focus on solving even the most simple problems that are the highest bang for the buck

and all these super complicated problems we'll get to later.

The challenge

that we have, that a lot of these security leaders have, that there's no tools even for solving simple problems. Right? And let's take this Excel

scenario as an example. Right? Let's say if you have a company which is hosting

a bunch of data in their database

and they're partnering with other companies and their partner

and an employee in another partner organization wants to hook up into the database and pull data out for Excel. Right? Today, it's hard for companies to answer a very basic question that which employee in which partner access for data. This is for way for them to do that, right, in a lot of cases. This is where a Settle can be helpful, where we can sit in front of all this activity

and then hook up into their identity provider

and give them real time visibility

that who is accessing what data, And then they can implement some very simple policies

saying that, look, I have a gold partner.

And a gold partner, if there's some analyst, he should be able to read data, but only 1 that is relevant to that partner. They should not be able to read data that belongs to another partner and show me exactly all the data that has been read.

Another way where this becomes valuable is now you can see that based on attributes

of different users,

what type of data have they been reading. For example, if you see that, you know, your support engineers

are reading data

generally between, you know, morning 8 AM to evening 6 PM. But at midnight,

some support engineer decided to read data associated with 1 specific individual

even though there was no case or issue

triggered against that user

for that time period, that quickly becomes a red flag. So just by providing this visibility

to buyers, you can a lot of issues and nip them in the bud before you end up on the front page of the newspaper. It looks 100, 000, 000 user records are stolen or 200, 000, 000 credit card records are stolen.

This is what we enable,

security leaders and organizations to do. To your point too about the data breaches and

unauthorized access to large volumes of data, a lot of that has to do with things like s 3 or other object storage platforms where the security controls aren't properly configured. Does SIRL provide any way to gain insight into that where you can say, here are the buckets where I have information stored and then being able to use the cloud APIs to introspect what the security policies are and determine

whether the access is too permissive and what other systems are being used to

read and write data to and from that? Yeah. No. That's exactly right. So we can certainly help with that. In fact, if you think of, you know, like, data governance frameworks. Right? Data governance frameworks typically have different aspects

to them. You know, 1 big pillar is discovery.

Right? Finding out where all my data is, what are the different data sources that I have, where are they kept. Another 1 is

classification,

which classifies your data into, you know, 1 or many different kinds of categories. And depending on the category, you can assign severity and policies around who can access how much of the data, under what context, etcetera.

Then there's another 1, a very important 1, which is access control.

Right? But now you understand where the data is, what type of data it is. In real time, you have to enforce access control to that data.

That is what is unique and special about Suddle. We are able to sit

in line to all

requests and implement

that access control.

Because we are seeing all this activity, now we can integrate

with other tools that companies may be using for classification or cataloging or discovery,

or we can just start from there and help them build that out. And most importantly, whether they build something themselves or they use a third party solution or use us for that, we can make sure that's just kept up to date

because we don't require our customers to do offline scans and discovery anymore because all the activity is now in front of them at their fingertips.

We can make sure that catalogs and discovery engines are always kept up to date. Another element of the challenge of being able to build something like this where you are monitoring

access and gaining observability into the interaction patterns of these systems is the variance in APIs that exist for different data sources. So I'm wondering how you decided

which platforms

to prioritize

as you were building out SIRAL. And what are some of the systems that you found to be most challenging to work with?

As you were starting SIRAL, we were building the first of its kind of product. Like, it's

when most of the customers that we work with, this is a new spend or a new budget item for them. Right? We're not replacing

something that existed before. The big bet that we made to bias is, you know, as companies move to the Cloud, they are going to use these Cloud native, Cloud first, Cloud friendly

data repositories like Snowflake, like BigQuery, like MongoDB Atlas, etcetera.

For our first cut of going to the market,

that's

the set of data repositories

that we prioritized.

Call it luck, call it foresight,

it actually worked out really well for us because most of the customers

that we

work with, that we discover,

90% of their workload

involves these early repositories

that we had decided to undertake.

And as the company has been growing, we kind of, you know, try to stay very close to our customers, very

determined to solve their most pressing needs. And then, of course, we keep on adding coverage for new types of repositories or tools that they want us to be effective for. So that's the strategy that we've taken.

As far as your interactions with your customers and as you are

building out the product and figuring out which directions to take, what are some of the biggest

security

challenges that organizations are dealing with or the things that are most likely to keep them up at night and lead to

breaking compliance regimes?

Yeah. So there's 2 big

areas,

2 big vectors broadly

in terms of how we think

of designing and prioritizing a solution and in terms of, you know, the needs for our customer that we tackle.

1 is

adoption of the data Cloud where, you know,

companies

a lot of companies for the first time

having their, like, really sensitive, mission critical, company critical data

in a data repository

where they don't have where they cannot put their arms around the infrastructure on the server where the data is stored. Right? And this data is being analyzed using 3rd party

SaaS tools like Looker, like Tableau,

all other such similar tools, And there is basically no control that these organizations

have on the data which is flowing from 1 repository to a tool and back end. Right? So that's an area of concern

that a lot of security and technology and CIO leaders have inside organization that we focus on. And the other 1 is around agility and simplicity of deployment. You know, you can solve the grandest challenges if you don't get overwhelmed by them, and that's how we approach this. Look,

this data protection,

heterogeneity of data, applications spinning up and down, data flowing from 1 place to another. It seems like a really daunting problem.

But by really being maniacally focused

on building a product which is simple to use, simple to deploy with very intuitive workflows,

that's how we are

enabling customers to step up and solve these issues.

And as far as the

challenges that you're facing or some of the most interesting or unexpected lessons that you've learned while building SIRREL, what are the things that stand out most to you?

Well,

if you had asked me

last year,

since you're asking me now, I would say that, you know, life can throw any kind of a curveball at you, including a global pandemic. So you have to be agile as hell and responsive as hell to adapt to a changing environment of, you know, customers and investors and employees'

needs and all that. Right? But I suppose I'm not alone in this. Every organization is dealing with it. But really the most interesting

challenge that we had to solve at SIRL

was that of education. Like, how do we

explain

what we are trying to do, the product that we're trying to build, the benefits of this product

to a to an audience

that

has not used a product like this before.

And, you know, that is both daunting and fun

because we really have to think from first principles. Look, how do I describe the problem? Is data Cloud the right word? Is there some other way to capture this? Is security as code the right way to explain how we operationalize that or is there a different term?

These have been some of the most fun, yet the most challenging aspects

of building Sarell so far. But like I said, we got very lucky

that, you know, we've had the right team

and the right market trends

behind us, And we were able to come to a place where

the SIRREL positioning and the SIRREL story

does actually resonate

quickly and easily with most people that we talk to.

Particularly

for

security engineers who might be familiar with software applications and the traditional access patterns and life cycles of things like maybe a web app.

What have you found to be some of the challenges that they're facing and some of the gaps in knowledge and appreciation for the complexity

when they are being tasked with

understanding

what policies to implement and how to

manage the overall accessibility and access patterns of these larger repositories of data that have multiple stakeholders and much different access pattern?

No. So they are in a bit of a bind. Right?

The

business is forcing the

development,

the IT teams to run at the speed of light. Right? And the security teams, you know, they are still

trying to figure out how to best work out

their

operating relationship

with these teams, how to best interject themselves

into their release management processes,

how to best communicate

with these teams about what are the right policies to be enforced, where should they be enforced, at what frequency will they be updated, who will own that.

These

new models, these new Cloud native architectures have

thrown a wrench in the agreement

that all these teams had

over the last many, many decades. And a lot of companies are still trying to figure out that how do we work together, how do we operationalize the security of our our Cloud or overall Cloud platforms

in a way that makes sense for everyone

so that they can still continue to move fast

while making sure that they're just not opening themselves

up to some massive breach and something that really

erodes that trust

of their customers in the business.

So that's basically what we're seeing. And again, I don't think this is unique or specific

to any particular vertical or any particular segment.

And it's a fairly hot topic

inside most organizations and, you know, 1 that we are all collectively trying to figure out from different angles.

And similarly for data engineers who have,

up to now, largely been working in their own silos

and maybe not necessarily

as directly

integrated with security and development teams or the overall operations

workflows.

What are some of the areas areas where they're being challenged and some of the gaps of knowledge that they might be presented with as they become more integrated into the responsibility

of stay

productive, stay above stay productive, stay above water, and make sure that they can keep up with growth in both the volumes of the data

as the complexity of their own jobs. Right? Like, they have to support an increasing amount of

business use cases that are imposed upon them. They have to continuously

adopt the latest technologies that show up in the data processing world

and continuously

improve their toolkit and sometimes their own skills, go for different, you know, certifications, etcetera,

to make sure things are just running smoothly. And they're all very security conscious, but they will often look to the security team for guidance. Okay. What are the best practices to follow over here? What is the right tooling to follow over here?

This is exactly

the

gap

that SIRREL helps cover

because

ours is a service that can be adopted by these engineering teams, recommended by the security teams, and helps both of them collaborate with each other

and make sure that their data stays safe.

And so for teams who are

trying to improve their overall

security posture and their capabilities

to be able to move fast while keeping things appropriately

locked down.

What are the cases where SIRL is the wrong choice, and what are some of the alternative

options for being able to manage this either via in house tooling or some other types of service?

So

a lot of companies that are kind of, you know, working on

the more traditional,

very on prem

centric workloads.

Right? And there's still a very large number, 1 of them. For them, it's not like Sirel is the wrong choice. It's more like Sirel is probably not a pressing need for them.

The reason is because, you know, they have a security model that has been working for them for many years,

and, you know, they probably have better things to work on.

Another scenario

is where, you know, the sensitive data

in the organization

is,

you know, 1 that lives

on file servers or inside, you know, some of these

Cloud storage

services like Box and Dropbox,

right, which is completely free format, sitting in the form of files

used by information workers,

that is where Sarell

is irrelevant for them as well. But whenever

organizations,

you know, have

data which is highly proprietary or very sensitive,

which is very core to their business and something which is a competitive advantage for them,

which is setting in these, you know, databases, data warehouses, data pipelines, access by an increasing number of stakeholders in the company, that is where SIRREL is really valuable service for them.

Another element of what you're building at SIRREL is the question of

performance,

where you are sitting in line with the data resources

where you need to be able to block requests that aren't allowed or elide certain elements of the data sources or obfuscate things like credit cards.

I know that proxies can be sometimes a performance issue. So I'm wondering how you have tackled that particular problem and the issues that you've seen in terms of people who may have had bad experiences with that type of system and trying to encourage them to try out SIRAL?

Yes. So that's a great question.

At SIRAL, we came up with this technology

called stateless interception,

and it allowed us to build

what we call a sidecar, a data layer sidecar.

And this is something

that

can sit in front

of your data endpoint,

but does not come

with the traditional

inflexibility,

the traditional performance penalty, scalability challenges

that you have with proxies,

and the manageability

issues

and applicability

restrictions

that you get with agents.

Very unique and interesting thing that our engineering team came up with, which allows us to intercept requests

to any of these databases and data warehouses without impacting

performance and scalability.

And when we go present

our solution

to a large enterprise, you know, the security team would obviously be very receptive to something like us, and then we would get introduced

to the

data engineering teams.

What we saw was they were very interested

in our design considerations and how we can

be in line without having the

traditional challenges

that people associated

with these proxies

and how it actually

further simplified their lives by giving them real time observability into their

database connection tools that they did not have before.

That we started getting more and more pull into these accounts as we started working with engineering teams.

What we decided to do, Tobias, to answer the second part of your question, was we just decided to completely embrace this, and we

decided to be very open

with our overall architecture,

our design principle.

And if you go to our website, www.sara.com,

in the technology section, you can actually recap how we designed

the sidecar and what the key insights are that we applied to kind of, you know, do away with this whole notion of a proxy and build something which is very new and very

lightweight

and very effective.

So we kind of have as opposed to other companies that try to hide their high IP stuff, We actually just put it out there on the web for everybody to read, and the reception for that has been fantastic.

As you continue to build out the platform

and try to

bring on new customers and support new data systems, what are some of the things that you have planned for the future of Cyrel, both technologically and from the business side?

For us, what we have seen at Cyrel is like a really greenfield opportunity

where companies

are increasingly moving to data Cloud. The engineering teams are increasingly adopting

infrastructure as code, and we'll continue to invest

in making sure

we play well

with

the various different ecosystem tools that these companies use for

data management, for data analytics, for deployment, for orchestration, for monitoring,

for log collection, etcetera, etcetera.

And that's going to be our big focus for a long time to make sure that we play well with these tools and we have we remain

the

simplest, the easiest to use security product for these security teams. I think as long as we keep doing that, there's like a very large untapped market for us to go after.

Well, for anybody who wants to follow along with you or get in touch, I'll have you add your preferred contact information to the show notes. And as a final question,

I would just like to get your perspective on what you see as being the biggest gap on the tooling or technology that's available for data management today.

See, it's,

what I have seen is, look, data management is a you know, tooling for data management is a very fluid and very rapidly

evolving team space. Like I said, Srini and my cofounder Srini and I,

we started working together

in the big data space back in 2008.

And in the last 12 years, right,

number of tools, the number of companies that have been built and that are still

growing up and still

coming to the fore,

it is a very, very large number of choices

that data management, data engineering professionals have today versus 15 years ago, right, where it was just Oracle, DB2,

SQL Server. Remember, there were a few number of players that completely dominated,

data engineers' headspace, and today there's so many alternatives in front of them.

And lot of them, fairly established and a lot of them that are still just bubbling up.

So, you know, can't think of any gap. In fact, I think we are at a point where it's looking at 3 different alternatives for doing

anything that anybody wants, which actually creates the complexity

that Sarend is trying to address.

Well, thank you again for taking the time today to join me and discuss the work that you've been doing with SIRUL. It's definitely a very interesting project and 1 that is addressing a particular need for

data platforms and data teams because security is something that is and always will be challenging. So I appreciate your efforts to simplify that. So thank you again for taking the time, and I hope you enjoy the rest of your day.

Tobias, thank you so much for having me. It was a real pleasure coming here, talking to you, and I hope you have a great day as well and have a great weekend.

Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Links