Improve Data Quality Through Engineering Rigor And Business Engagement With Synq

Hello, and welcome to the Data Engineering podcast, the show about modern data management.

Data lakes are notoriously complex.

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for.

Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash.

Want to see Starburst in action? Go to data engineering podcast.com/starburst

today and get $500 in credits to try Starburst galaxy. Is Tobias Mesas, and today I'm interviewing Petr Yanda about sync, a data reliability platform focused on leveling up data teams by supporting a culture of engineering rigor. So, Petr, can you start by introducing yourself?

I think. Yeah. Hi, Toby. So my name is Peter. I'm engineer by background.

At this point, I spent about 2 decades building different technology solutions.

And especially

the last 10 years, I spent a lot of energy on building not just kinda typical engineering teams, but also technology, which is spanning engineering and data

mainly in, kinda, scale up organization. So I had a chance to scale a few teams somewhere in range of

from few people to about 150,

which gave me also a lot of,

learnings and experiences of building both engineering data systems, together.

And, most recently,

2 years ago, I started a company called Sync where I started as a CTO.

And,

here we're kinda solving some of the challenges in our own data reliability,

which are inspired by a lot of kind of these lessons learned, over the last decade or so. And do you remember how you first got started working in data?

Yeah. So I think the it it goes back to, I would say, like, 2014,

where I joined, actually, market research company as an engineer.

And, there is something, like, special about it because you realize that the entire company exists

around data,

because market research is fundamentally around

collecting data from

the market and and and the industry and trying to really understand what's going on. So in that case, we focused on a lot of

surveys and website tracking and really, like, putting all these data together.

And back then, 2014, that's kinda premodern

data cloud. So you didn't have, all the all the modern cloud technologies. So we had to build a lot of,

kind of data processing engineering,

solutions ourselves.

And so, like, that was the, I would say, like, quite strong transition from,

you know, typical engineering to a company where data is at the very heart. And someday, you're gonna spend a lot of energy

solving the challenges around data, which which is kind of the fundamental of the company.

And now in terms of the sync project and business that you're building, can you give a bit more overview about what it is that it does, the problem that you're solving, and some of the story behind

how it came to be and why you decided that this is where you wanted to spend your time and energy?

Yeah. So think it goes back to my time at Pleo, which at the time, where I was there as a CTO, was a fintech.

Think about

about, like, 500 people organization,

about, like, 100 engineers,

10, 12 people working in data.

And I had, a responsibility

for both engineering

and, data side of the of the company.

And there was this moment,

where where we had a few incidents. 1 of them was on the engineering side, and another 1 was on on the data side.

When I worked with Techano Engineering,

we we saw that incident. I think it was, like, 15, 20 minutes because it was something related to our cards. If if cards transactions at the,

financial institutions don't work, it's really high priority, and we basically deal with it almost immediately.

And then I remember the issue which we found a couple days later in our in the data analytics stack. We didn't resolve for it. Like, I think it was more than a week, and that was very frustrating.

And I kinda looked at it as, like, it's it's about technology. So

how so the the kinda

the approach is so vastly different. And so that was kinda the trigger when I felt like there is something what has to be done and, essentially, it started sync.

And, like, the the underlying,

kind of main mission, which we which we felt about we we should we should go go for is to really bridge

like, close this gap and bridge the the tooling from the engineering world

and bring it to data analytics

world.

And so, essentially,

the what we're building towards is that we'll work with data teams who are powering business critical systems

as much as engineers. And I was like, we look at we look back and,

hopefully, like, almost, like,

look at some of these experiences. It's like, yeah. This is almost, like, ridiculous. It it shouldn't happen. And we kinda treat

building data systems to to the same rigor as as engineers. So that's kinda, like, the, like,

broader picture, and there's a lot of kinda nuances and and technology solutions we have to build to to make that happen.

In the space of

data reliability,

data quality, data observability,

that is

a product category that's been growing

in terms of

overall investment and

companies and solutions

for

the past 3 to 5 years. And

a number of the vendors are focusing on different

aspects of that problem space. Some of them are trying to do more of a horizontal view of it. Some of them are focused on point solutions.

What are the capabilities

that you saw as lacking in that overall ecosystem

that you're trying to address, whether it's specific point solutions that we're missing or the overall experience for data teams. I'm just wondering

why is it that you saw the need to add another data reliability,

data engineering rigor

product in some of the solutions that you're trying to provide to solve for that?

So so I think it's about, like, looking back, about 2 years ago, there there really was just a couple of

startups with I think all in in a very early stage.

And I think at that point, almost everyone was focusing on the problem of detecting issues. So I was like, how can we discover that something actually went wrong?

And to a degree, like, this is still, like, a core of of data observability

solutions. Right? We have to kinda

uncover that that something is not working in a data stack.

But where I felt there is, like, lot of space to innovate and a lot of opportunity is actually what happens after that? So once we detect

it, that certain table is missing data or that that certain test is is now failing because some business,

validation is is not met.

What happens afterwards? Like, what is the workflow from the perspective of

finding the right team to deal with that issue,

assessing what's even the impact

on the company?

And is this even an issue we have to deal with right now, or is that something what can

what can wait and be dealt with later?

And then ultimately driving the whole resolution and bringing the system back to,

normal operations and try that communicating

with the with the rest of the company.

I feel like there's, like, a range of solutions or range of problems which need to be solved.

I would say to a degree, even in engineering, it's still, like, being solved even though, like, engineering is, of course, like, way ahead from the perspective of managing

business critical systems and and and incident management, etcetera.

But I I believe there's just, like, so many problems to be solved. And ultimately, I

especially in, like, today's economy, I kinda don't believe that,

we could be a point solution, which this is 1 of these things. So we're very much focused on building a platform which goes end to end from helping customers even set up the right testing strategy from detecting issues and then all the way to the resolution and kind of the entire workflow.

To that point of incident management,

on call management, incident resolution

in the operational and infrastructure realm, that is a problem that has been very thoroughly explored. Obviously, there's always room for improvement, but it is something that is part of the

default and assumed

characteristic of a team who is operating infrastructure.

For data teams, that is something that is coming to be more

more widely accepted, more widely understood, but there isn't necessarily a clear playbook for any given team to know what constitutes that incident, what does it mean to be on call as a data engineer or as a data platform engineer,

What does resolution look like? How do I think about SLAs?

Who do I notify? How often do I notify? I'm just wondering what are some of the ways that you

see those concepts being mapped into the ecosystem of data engineering and data systems?

So I guess, like, first first thing to say is that it's relatively new, especially when I when I think of, like, a a traditional

data analytics team that that they even think about incident management.

I think this is a good thing because it also means that that the team is probably

powering

something a lot more business critical than they used to before. Because, otherwise, what's the point of,

doing incident management in the middle of, evening or night if if actually could have waited till the day after? So I think it's it's almost like I would say it's coming to the to the data platforms.

In terms of

what actually has to be done,

I kinda believe that data systems

should be looked at as software.

And in that case, I almost think there is very little that we should do differently than from the from what we already know from engineering.

So in that sense,

all the gonna process from

even declaring an incident and escalations

and and the the communication

around handling the incidents and towards the rest of the business or customers,

I think that should be almost identical.

The the difference maybe here is in terms of

the nature of the system where

if you look at typical data platform in a company,

I think 1 very defining factor is that that platform is integrated to almost every system in the business.

So that means that you might be dealing with

hundreds of sources of data and a 100 of use cases on on top of this data, and the platform sits somewhere in the middle.

When things break, you don't even know if it's inside of the platform or if it's somewhere upstream.

And so where we also focus with with Xing is helping understand,

like, what is actually going on and what's happening. And I think this is critical part of incident management is when something triggers,

how do I go

from under like, that that moment to understand this is a business impacting issue

that we really have to deal with right now versus later,

that I think is sufficiently different in, like,

when you look at the traditional software system and when you look at data system, mainly due to the nature of these, like, very rich dependencies which go across the company.

But the workflow which follows after that, I think,

should be largely similar.

To the point of

incident management,

SLAs in particular,

and what it means to be on call for a given company,

only a portion of that is a technical problem. A lot of it is organizational and cultural.

And I'm wondering how you see sync playing a role in that

corporate and organizational

and cultural transformation that is necessary to

build that competency

as a team that does incident management, incident resolution where the uptime of the data and the analytics systems is

the

maybe not the core focus, but a core focus?

Yeah. So this is really, I think,

1 of the the toughest thing to solve for for many data teams. It's almost like how do we do that transition, which definitely is to larger degree also cultural.

1 of the, again, anecdotes I always go back to is when I look at my time in Pleo and then,

eventually seeing data teams building a lot of

complex

systems on top of our

production systems data.

And there was that point where I took a lineage of that system and put it on the screen

in front of my engineering management.

I I just saw how surprised

almost all of them were that this thing even existed.

And so, like, from that point, I realized that, like, really big part of solving this challenge is actually

increasing the transparency across the company of what is actually happening.

Because in a way,

I I think there is sometimes, like, this bad reputation that almost like, engineers are breaking data and that that that's bad.

And I think that that's definitely happening and it's true. And they also have their own kinda agenda in terms of building a product and road maps and very tight deadlines.

But I fundamentally believe that if you tell to that engineer before they push code of if you do this, this very critical thing in the business will break. They'll not just say, like, yeah. Don't care. I'm gonna press the button and go. And so so I have my kind of work done. So I think, like, even helping map, again, the what is actually critical in the company and in the data stack, to me, is, like, something what the tooling can help with.

And then once that's mapped, how do I transparently

communicate that across the company in the right moments and in the right workloads?

So I think there's, like that's where the tooling can really help to

do all that work, which otherwise is really hard, where I saw

teams in the time of, let's say, outage

where they went into,

some sort of,

lineage solution, and they went model by model to find who is the owner. And then, like, it took really, like, half an hour just to assemble the list date of people they should talk to

versus

things like that just being completely available

with 1 click. And you you almost, like, have all this information

to understand what's happening in the company,

readily available. I think that's that's definitely 1 1 aspect, and and I think it's all around, like, working on implementing the solution, but putting it in the hands of people so they actually use them.

On that point of incident management,

uptime for data teams,

on call rotations,

the broader question is, what does it mean for

a component or an overall analytics system to be down?

When does it merit being paged in the middle of the night versus waiting until business hours? What does an on call rotation for a data engineer or a data platform team

to be on an on call rotation? Like, just wondering some of the ways that you're seeing teams tackle that set of questions and some of the ways that you think about that as a product that is trying to support them in

facilitating that functionality?

Yeah. So

I think, like, 1 of the 1 of the ways how we look at data systems,

at at our customers, but in general, is that,

like, the key thing we really wanna help them with first

is to, like, code codify

which parts of the data stack

drive a business critical use case.

So 1 example could be you might be

automating your advertising bidding based on customer lifetime model.

Customer lifetime models typically have a lot of inputs, which means they would be pulling data from a lot of parts of the company.

And so I'd argue that that that model

is 1 of your critical data products.

And I think, like, the first thing is it's it's even helpful to codify it into platform like Xing to say that thing is a critical data product.

If it's if it's

somehow affected by an issue, this is a p 1 issue, and it has to be escalated accordingly.

And then

when something happens anywhere in the data stack, which

might have impact on this p 1 data product,

the person who gets that alert, which might be on completely different side of the company,

will get an alert, which is not just saying here is a

log record of an issue which happens,

but it also automatically does the assessment of what's actually downstream from here. And could this type of issue have an impact on

on that,

customer lifetime lifetime value model.

And so I think the the the whole challenge is from, again, like, understanding, like, what is critical,

which will ultimately help me answer the question

of

am I looking at the failure which might be bringing down this critical component, or am I looking at the

test failure which happens

on a model which is just being created yesterday and no one's really using for anything in production?

So, like, even differentiating these 2 alerts is must be helpful from the perspective of if I look at this without knowing the data stick by heart, I might not really know the difference.

And then in terms of, like, the

the actually being on call in data

teams, I think this I see this varies a lot across companies I see,

where I actually see a fair bit of companies who are actually running, like, the

almost, like, engineering, like, incident management systems when there is a escalation to the point that someone gets woken up at night if if if something's broken.

But the most common

I see is

this kinda, like, in hours

on call rotation, which is typically called, like, goalie or some sort of, like, person appointed for that week

to be, like, a first

first level diagnostics for all the issues coming

to the team. And I think this is, like, a good approach in smaller organizations. When I see larger teams with many, and the data

teams

and many teams contributing to data stack. I tried to almost say,

do we really need that? Or can we route the alerts automatically to the relevant people

directly? So it actually, in some companies, we managed to reduce that role and almost, like, you know, make the relevant person aware of the issue immediately.

But, again, like, it's it's something what's still coming to the experience. I think everyone is learning. I think

it really then should be actually tailored to the business. So I'm I'm kind of reluctant to say there is the right way to do it, and I think every business has to assess it.

And I think that the best way I always look at it is is kinda going backwards from this kinda critical systems

and understanding, like, okay. If this breaks, do we need to solve it at 2 AM, or does it wait till

till the morning? And that's kinda the decision, like, I think every business has to make on their own.

In the operational realm,

there's the analogous use case of different services have different levels of criticality. So you've got different gradations of how severe are particular outages, which maps to the idea of page me in the middle of the night. I don't care what I'm doing to

I don't care about this unless it's during business hours, and I can take my time with it.

And given your

observability

functionality that you have in sync and the core focus

on data products and data assets versus the individual table or individual pipeline approach. I'm wondering how you see that

shift the thinking in data teams around how to try to map those different products to that

level of priority of, oh, this product is something that is used every day.

The observability data

supports that. Or

based on the observability, I can see that is a quarterly report that doesn't matter unless it's the, you know, close of quarter at the end of the month, in which case I do need to address it immediately into some of the ways that that asset centric

focus shifts the ways that data teams approach their work.

So, like, the this is quite interesting in a way that

even from the point where we started the company, we

was, like, always thought and we built the entire system around this notion of data assets,

which means that

we purposely

didn't wanna build a system which,

let's say, revolves around tables. And that's, I think,

before,

solutions like DBT and and and modern data stake and the analytics engineering workflows,

I think it was right that, like, a lot of the data stake revolved around tables.

But now we have tables and metrics and models, and now we're talking about data products, which is a little bit overloaded term in terms of what it really is, and I think everyone has their own version.

But, ultimately,

in my mind, it goes back to, again, defining the critical parts of the data stack. I think you mentioned the the concept of tiering, which I see,

like, a lot

in terms of companies figuring out

how do I define which of our models are critical, like a p 1 or p 2, p 3.

By the way, a little bit,

skeptical about some of the technical indicators. Right? Because sometimes you could look at observability data and see this is a lot of downstream dependencies. This looks important or there's a lot of queries happening.

And then there could be 1 asset used by, like, a CFO for some really critical decision used, like, once a month, and that really shouldn't go wrong.

And so, like, I always

like to combine some of the technical indicators,

but ultimately have the customer say that thing over here is really critical.

And that's exactly what we did with the data products. And that's, to me, in

a way, the way how we also build it technology wise, we can create data products from group of dashboards or group of models or set of metrics. It doesn't really matter, but it was always around

almost, like, ability to let

our customer express

this group of things

is important. The these things together have some sort of meaning which goes beyond

their kind of physical manifestation in our data stack.

And we attach certain criticality

to this data product.

And then when things break, we wanna communicate that to to an engineer related to that. So I'm a really big fan of, you know, data product thinking, and I was, like, really defining it as a this is not, like, a outcome or, like, this is critical output which leads to some sort of outcome in the business.

And and so I hope that

almost, like, at some point, we will look at tables and it will be the same as, like, files in the containers

in engineering system. It's like, yeah, it's there. It's doing its job. But we're not talking about files when we build systems. We're talking about the the system,

which does something.

That's that's a good analogy. I like that idea of tables are just the files. We don't necessarily care about the tables in and of themselves. We only care about them insofar as they are useful for something else.

Exactly. Yeah. Exactly.

The other interesting aspect of

treating these

data assets as a product and something that is consumed

and relied on by the overall business is that it accentuates the fact that data is a team sport, and it's not just you as a data engineer

doing table

transformations

and pipeline management. It is

your efforts are in this broader context of the overall purpose of the organization and how it's going to be used.

And I'm wondering how you are seeing that change the ways that data teams approach their work, both technically, but also, more importantly, organizationally,

and some of the ways that the rest of the organization is being brought into

the work that's being done for that data engineering, data product definitions,

how you see data governance may become more to the fore because of the fact that it is a collaborative and cross functional problem and not something that is purely technical.

So I wanna ship actually was 1 of the first thing we focused on solving when we started sync.

And, again, reason for that was I've seen exactly this problem where I even as a leader of both side of technology organization,

I I realized that it's really hard for me and and for all the teams to even communicate and understand,

you know, what's happening across this, increasingly complex

data stack.

And so especially when you start to see, like, multiple analytics teams, some central engineering team,

dozens of engineering teams which are producing data, and commercial teams which are producing data,

it becomes really

opaque as the as the ecosystem.

And so to me, like, solving for ownership as a concept across

this whole

structure

is really important. And and almost without

it, any

observability will be almost, like, not actionable.

Because, like,

if I don't know who the owner is or how other owners are impacted by issues, then

how can I really, help this organization solve them?

And so to me, the I think,

luckily, a lot of companies are realizing that, and and you see a lot of lot of different, different,

approaches where, you know, 1 way or another,

companies are starting to define who the owner is.

The biggest problem I've seen and

still to a degree I see across the industry is that that information about ownership is is bit,

let's say, not actionable. So so it could be anything from we maybe are tagging our models in something like DBT,

But also, I've seen versions where there is a spreadsheet of saying that that these folders in this project are owned by the team over here,

which is probably good as like, to some degree to take on a very high level, but I think this is really hard to action. And so

the way how we approached it is that our goal was to bring this

to the

to the, like, the the the path of solving issues is that this ownership is projected across the entire

platform.

And, of course, part of it is

understanding that if something happens, the right owner is the first 1 to be notified.

But then telling that owner,

well, based on on on this issue,

we see that these are the teams which are which are owners of assets, which are downstream from this failure.

So to me, like, putting

on once, like, ownership

to the to the concept of lineage,

that means that I'm starting to look at observability system from,

like, dependencies between

teams rather than dependencies between tables or or data assets.

So I think that's 1 way we definitely

like to work with ownership that

we use it as a as a almost like a map of like, we're layering the teams

on the data assets.

And then as we do impact assessment or or, like, some different queries across the system, the ownership is always part of it, if that makes sense.

Digging more into

the sync product itself and the technical details of how it integrates with the data systems and the organization.

Some of the things that are coming to mind are the

lineage tracking and

observability that you get from hooking into the data warehouse, looking at the table transformation logs to see what came from where.

It also brings to mind the idea of these metadata platforms where you have a cross cutting view of all the different ways that data is being

transferred across different system boundaries.

And I'm wondering if you can just describe a bit about how sync itself is designed and implemented

and the integration points that it has into a company's overall data suite?

So so I guess, like, maybe 1 1 way to explain it would be would be to start at little bit high level where

we've built sync around almost like a 3 key concepts where

1 was, which we already discussed, that

everything in a data stack is modeled as assets. So whatever if it's dbt model,

data warehouse table, BI dashboard,

ETL pipeline, all of that is, to us, an asset.

Then the second pillar is that we model the relationships

between these assets because that it's ultimately

allows us to build almost like a map of the entire ecosystem.

And the third concept is, something we call executions,

which means that every of these assets is doing something,

whether it's, transforming

data or creating a table or query which is going in front of the user.

And if you think about these 3 concepts and you built your entire experience on top of it,

then

building integrations becomes a lot easier because

for us, like,

everything is first class citizen. So whether if it's a warehouse

or transaction database or transformation tool, we can model everything into these concepts.

And so the first thing we've done,

we we focus on on the the heart of data platforms, which is the data warehousing.

We invested heavily on into dbt.

Reason is obvious that it's becoming almost like a standard for data transformations, which means that we see a lot of teams using it.

And then we started expanding the the the coverage into BI world. We're now working a lot on APIs and a push towards, let's say, the data sources and the ways how we can tap into ecosystem,

which is upstream of warehouses.

And

from the perspective of what does it mean in in terms of integration from a customer perspective,

we very much believe that a lot of this should work out of the box with, like, minimal configuration.

So for most of the systems we're building, most like the off the shelf connector where what we really need is the,

access credentials and

a security review. Let's say so we can actually go and connect to these systems. But from that point, everything is automated.

And then we felt that

in order to build the best possible solution on the market, we should do some kind of strategic investments

in the infrastructure

level,

which means that we, for example, build

our own parser for SQL, which understands,

lot of, kinda, specific dialects of different warehouses. We understand

where exactly in the files the logic for different column is. So that ultimately allows us to build,

like, new experiences where we're, for example,

starting to blend workflows across lineage and code

into a lot more unified experience, which which I at least haven't seen on the market.

And so, like, the the goal really here is to make sure that many of these solutions are

automated.

And then on top of this, we're building

some of the kinda unique capabilities, which ultimately,

allows us to go deeper and help the the practitioners to back their systems,

to the next level.

In your work on sync from when you first started working on it to where you are today and working with some of your early customers,

what are some of the ways that the scope and goals of the product have changed since you first started working on it?

I think the

the hard truth is that they keep expanding. So so we definitely started with the with the

approach of potentially being a more point solution. But

as you know, the the the market is a bit more demanding nowadays. So

I think the biggest change is that we keep kinda

expanding the scope,

from kinda initial focus on

ownership and critical assets into

now basically building the entire reliability platform,

which has a component of observability as well,

largely because that's

it's not necessarily

because we wanted to build data observability company, but more so

because the market started to see us that way.

And so in order to to be competitive and and and win deals, we we have built a lot of that functionality.

And

I still believe there's a lot of ways to almost, like, redefine what observability is or where it can go. It's still relatively young, and then we're still talking about, like, monitoring and lineage and schema detections, which is all fine. It's, like, important features.

But, ultimately,

we're we're kinda

thinking of, like, where does it expand next? So now that we have, let's say, solid foundation

in the in the kind of data ecosystem,

what would be the next thing which can be done on top of data observability, which is that kinda what we're thinking about as well.

For teams who are

looking to bring sync into their ecosystem,

they want to start using that for all the features that we've been discussing. I'm wondering if you can talk through the workflow of actually getting it set up, starting the onboarding process, and

given

the breadth of functionality that it supports, maybe what is the first entry point that you see as being either most common or most effective for that broader adoption?

So to a degree, this this really depends on on customer kind of pain point, and we are still at the stage where we we love to work with our customers. So,

the the the first step is to get in touch with us.

And from that perspective, there really are few different avenues. And

because of the the depth of the platform, there are now in a different ways we can lead with the different functionality,

if that makes sense.

And so

I think 1 of the examples of use cases is,

are companies who

the biggest challenge is is the

detection of issues. 11

1 kinda type of companies is typically

teams who have to work with 3rd party data. Right? So they they are ingesting data from even different companies, and they don't really have control over testing them. So in that case,

the the leading functionality

is is anomaly monitoring, which means that we

integrate their data warehouse. We discuss where actually the critical aspects are,

and then eventually deploy a set of monitors which are,

let's say, fitting

the the, let's say, the type of issues which might happen with with the data.

In other companies, it's a lot more around

uncovering a structure of the system. So we have

1 customer who has multiple

DBT projects. Now we're working with the customer,

who has multiple SQL mesh projects.

And for both of these, the

first goal to was to, you know, like, understand end to end picture

of this this ecosystem.

So so, again, in this case, it's all around, like, lineage and

impact assessment of of issues which might happen across these systems.

And, again, the the the first step always is integrating into into the data platforms

and then, potentially, in this case, onboarding through the use cases around

around, like, uncovering the structure of the data stack through lineage, for example.

So it really depends.

Of course, like, the the the the common step here is integration to the data platform, but that's also why

we've really invested in making these connectors

work off the shelf. So there's, like, not really much,

if at all, like, manual work in terms of, like, tagging assets, etcetera. This this kinda is all picked up

automatically.

And once

a team has sync

deployed,

integrated,

they

have all of their assets modeled. They understand what are the data products, who are the end users.

I'm wondering if you can talk through what a workflow or an incident resolution process looks like with sync as the

hub of that activity.

So 1 of the the so so once we finish the the base level integration,

we now have

cross system lineage. We might have, like, basic level of monitors deployed.

The next typical step is to codify a few of these additional concepts. So 1 of them is codifying ownership.

We do that in many ways. The most typical 1 is that we work with metadata

from,

DBT

where if the team already defined Auris in DBT, we simply lift that metadata and

and set,

let's say, mirror of the structure inside of sync.

In other teams, this could be done by specific data sets or folders in the weekly project, so we have a lot of concepts how to do that.

Second 1 is setting up data products. So, again,

in in a typical deployment, it starts with, like, handful of data products which we focus on. That's where we kinda start the deployment.

And then

every customer takes it from their different ways. We have customers with

a 100 or so data products because they actually wanted to make them more granular.

Some of them stay in the range of, handful. So so it's it really depends.

And and so once we have data products defined, we have ownership defined,

we typically define alerting, which is

kind of mapped to these owners.

We can then run, like, more,

like, let's say, more powerful incident management.

And what that means is that a a typical process is something fails in a data stack, could be either test or could be a monitor, which could come from

other system like DBT or it could be RO.

This essentially triggers what we call an issue,

which means that issue is, something that recognizes that something failed,

but it's not yet clear how critical that is. And so our issues end up, first of all, alerted into business systems like Slack and Microsoft Teams or email.

But then we also bring list of all issues into a view we call triage. So you can think of it as this is a list of things which failed.

And in this triage, we're

giving the person who is responsible to go through that list

as much as context as possible

in order to quickly assess,

is this business critical? Yes or no. Are there critical products impacted? Yes or no. Is there a team which, is important impacted? So all of that information

is completely collected into 1 single screen, or someone can go through the list and triage the issues.

In some cases, the decision could be no action needed. This is gonna be, let's say, quietly fixed in the next build.

In other case, this could be declared an incident.

And once an incident is declared

from a subset of issues, this triggers

the the incident management workflow.

And we were thinking a lot where the boundary between sync and more traditional incident management

workflow tools is.

And we've decided that essentially post incident

declaration,

we still wanna give the the data practitioner this kinda single page view where you see

the lineage of assets in incident, the list of all the issues which are included,

list of teams, list of products, all basically the the kinda impact assessment

on 1 screen.

But ultimately, at this point, the incident is linked to external incident management system where, let's say, the the

incident management system where, let's say, the the traditional incident management process could happen. In some teams, this is simply a Jira ticket to to deal with. In other teams, this is a PagerDuty

or or Opsgenie or a system like that where they actually manage, let's say, wider incidents also from engineering.

And so what we really wanna be is that bridge.

But the critical thing which we've built is that concept of promoting an issue to an incident.

Because what this actually does is that

it allows teams to

almost, like, reason about

all of their, let's say, quality from both perspectives.

1 is on the issue level, which is saying, okay. We have this asset which is

firing a lot of issues at us, and maybe we have to do something about it.

But also the second level is

we have declared some business impact in incidents, which are typically originating from somewhere in the data stack. And this is also very important from analytics perspective for governance

because I think that if teams report on issues,

it's almost

almost like a very negative picture they might be creating. Because

to me, it's the same as if

engineering would be reporting on every single issue which happens in that system.

But I almost can guarantee that in any sufficiently large system, there is something failing all the time.

But most of the time, it either recovers in few minutes or almost immediately.

In some cases, it could be left like that because it's not critical.

And so

creating this obstruction where you differentiate

issues and incidents is really important, and that's really,

like, a big part of our incident management process in sync.

And then again, the the the actual incident is handed over to to the tools which are designed for this,

like, PagerDuty, incident IO, etcetera.

And in your work of building sync, working with these different data teams to understand

how they think about their roles, how they fit in the broader organization,

how to get everybody

working in the same direction. What are some of the most interesting or innovative or unexpected ways that you've seen the sync platform used?

Yeah. So the

I I don't know if I have, like, 1 big anecdote, but it I I almost, like, like to be surprised by our customers every now and then, where

it's almost like a privilege to have

engineers as, as customers because they are creative.

So so there's a couple things which come to mind. I think 1 of them was, like, a great surprise where

it's it now goes back couple couple months ago where

we had 1 of the engineers who connected

their warehouse into 2 separate dbt projects.

And we didn't even realize that it's gonna work out of the box, where essentially because we resolved lineage from 1 1 project to warehouse, and then from warehouse to another project, it just worked out of the box. So there was definitely, like, surprise that we were very happy with.

And other examples are things like,

I guess, 1 internal tool we have, which I never thought we would have, exposed to customers, is that we've built almost like a query engine for assets where you can do a query such

as find all the dashboards

and then find all the assets which are upstream of these dashboards,

which are also of a type of,

DBT source and have a certain tag.

And we built this internally because we wanted to express some of the concepts which

are very hard to do in the UI and through, like, a drop down boxes.

But it's some information that it existed somehow leaked to our customers, and they started to write these queries and and, like, build some of the functionality, which,

I haven't thought of. So

we can use these rules to deploy monitors. So we had the customer who said, deploy monitors on all,

DBT sources, but only if there is a certain type of the models downstream.

And so that basically,

kinda used some of the things which we never really designed in the 1st place. So that's always kinda good surprise. And

in that sense, I realized that it's it's really fun to build almost, I guess, more sales let

team because you get to work with customers very closely. So you get to uncover also a lot of these interesting use cases they find, themselves.

And in your experience of building this business, building this product,

operating in this ecosystem? What are some of the most interesting or unexpected or challenging lessons that you've learned personally?

So for me, it would be, like, 2 things.

1 of them we just touched on, which is,

sales is actually fun. And, you know, being 2 decades in engineering, it's maybe not the the most,

most expected learning I would have, but I really realized it's fun to to work with teams across different companies. And and I realized, so this

almost, like, just a different type of problem solving.

So this is really fun. And

the second 1 is

almost, like, confirming that what we believed

could be true around data ecosystem

is actually doable, and that that relates to how we're actually building sync where

under the hood of our technology at the very heart of the system,

we have a data warehouse or we have data warehousing technology. We use ClickHouse.

And so what we've done is that we've built

entire product around data platform, which means that when customers send data to us, the first thing which happens is it gets stored to ClickHouse,

and then all sorts of processes

kick off inside of ClickHouse

and with our microservices

built around it.

But I really wanted to challenge that that aspect that that, you know,

the

the operational systems are built with Postgres and transaction databases, and then there is some other system which is focused on data.

And so

the learning,

I guess, was that before that I never had a chance to do that. It was kind of like a theoretical

thing which could be done. But I always had a team where you had a warehouse managed by data team

and, operational system managed by engineers. And I felt that always that the, the barrier was

a bit artificial. And so we kinda

delivered on that where we put that data system at the heart of the company. We, of course, monitor it with sync itself.

So there was, like, really a lot of kind of problems we had to solve. But, ultimately, we have lifted kinda data platform to be operational system, and and that's exactly what we believe should happen across different teams.

And maybe final point on this is that

I didn't realize until building sync how

powerful

the concept of testing data actually could be.

And the example which consistently comes to mind is that

we ingest data from a lot of companies from

about 30 different types of systems, such as, like, Looker, dbt,

bunch of warehouses, etcetera.

And, again, we are very reliant on these data streams coming to us as a company. And

you can't really test with unit tests, from the software perspective that all these streams are actually working.

So we also run anomaly monitoring

at the ingest of the data warehouse.

And we had many, many cases where we notify the customer

couple hours after an outage saying, hey. I think you misconfigured your Looker. We're not no longer ingesting data.

And in in that way,

we're testing data, but we're detecting issues, you know, almost like the operations of our actual product.

And so

maybe if I was

if I ever went back to, running engineering team, I would maybe think about the the power of some of these kind of data testing techniques

and how can be how can that be enriching to some of the different ways,

engineers test their systems. So that's definitely something I discovered on the way.

For teams who are trying to get a better handle on their data systems

and maybe they're interested in incident monitoring, maybe not, I'm just wondering what are the cases where where sync is the wrong choice?

So I think that if I even look at our deals where we we didn't,

we didn't close, I think the the the common

pattern is that

we realized that that there just wasn't such a need to to bring that engineering rigor or that reliability rigor into the data stack.

And so

I actually have this kinda qualifying question for for for many companies we speak with, which focused on, like,

what what is the most critical use case for data in the company?

And I was, like, trying to understand,

like, where the team is on the basis of that. And

I even think, like, this is very good

questions

question that data team or data leaders should ask themselves.

Like, what is the most critical thing I am powering in the company?

Because that kinda

is a good proxy, let's let's say, to the value you provide to the business. And so

we basically need to work with the teams where that

that question is answered well because we're

ultimately helping them. So so we can't really go beyond that. So I saying that's definitely 1,

which means that the teams who maybe haven't yet found these business critical use cases,

we might end up being a nice to have, which might be too early. Maybe it comes later, but that's definitely the most common is that maybe the the the use cases for data are just aren't there.

And as you continue

to build and invest in and scale the product and the business of sync, I'm just wondering what are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to explore?

So I touched on that a little bit

earlier in terms of there are kind of different

areas where I think we can take this. And

I think this also goes back to this this question of business use cases and business critical use cases where

what I hope we will do is is that we will bring

the the data reliability problem closer to the business.

What that means is that

maybe we break a little bit away from

being

technical data tool, which is focused on detecting issues in data tables

and gradually get closer into

a tool, which is your platform that is underpinning

business workflows.

And so we have examples of this where we've been asked by our customers to send

various different alerts

into non data teams.

And so I really like this trend where where, for example, work with the financial institution

where we

where we are detecting

issues

inside of the data warehouse.

But we're,

let's say, surfacing these issues back in front of operational teams and compliance

or in,

in,

treasury even where we are

essentially becoming part of the the critical business workflows. And so I hope maybe that, again, like we discussed that maybe tables are becoming files.

And in the same sense, the

the data will be still, like, technology anchor of the solution.

But, really, we will gradually talk a lot more around, like, we're solving

for reliable business processes,

which

might or might not be just data team's problem. It might actually be involving wider company.

And I think this is super important because

the data doesn't originate

or doesn't end in the data platform. So it's really

it would be really a leap to to make data quality or reliability

a problem of the wider company. And and I think there is this lot

of new solutions which have to be built

in order to make this topic engaging to

non data,

more operational teams, or maybe

they don't necessarily,

want to see, like, a deep technical alerts. But they might see they might wanna see very specific type of alerts such

as we've detected these issues

with your sales data. Go here to the Salesforce and fix it in this and this way. You know? And then suddenly, it's not really data reliability. It's just we're working with the business people and helping them kinda run their business.

Yeah. Yeah. That's a very interesting point as well as the

data ecosystem and the organizational data economy becomes

cyclical

with things like reverse CTL or operationalizing

your data, however you wanna term it, where the data doesn't end in the warehouse. It gets fed back into those operational systems or application systems and then fed back into the data warehouse and enriched, etcetera. And so the fact that you Exactly. Looking at reaching out into those other systems to understand what is the downstream impact of this transformation that I'm making that is going to then feed back into the Salesforce or the HubSpot or the operational application that is feeding back into the warehouse and

brings it more back to that cycle of it being a team sport and not a,

solo,

event.

That's exactly yeah. Like, the the point is to to how can we bring

non data teams into the into the mix? Because I think that's where

the really, really meaningful change could happen.

Absolutely.

Are there any other aspects of the work that you're doing at sync or this overall

concept of

incident management,

data being a team sport, the organizational

transformation

involved in bringing data into these business critical use cases that we didn't discuss yet that you'd like to cover before we close out the show?

Yeah. So I guess, like, the

like, maybe 1 1, like, a parting thought I have is that

1 of the biggest learnings I have goes back to this this kind of thinking of how we built sync, and

I really hope that what we will see is is a lot more

of was, like, data platforms

and data teams being much much more integrated into,

like, wider business processes.

This could be anything from, like, data getting much closer to engineering and

actually powering a lot more user facing systems,

or the notion is is you just outlined in terms of data powering more in a business critical systems.

But I guess what I find

would be a shame is that

if we come if if data practitioners and data teams stay in this

in this world of, like, we're we're maintaining company reporting, and and we don't wanna be woken up at night. We don't want really wanna be dealing with all that.

Because I ultimately think this is, like, hindering the value

or, like, the potential of the the data in the company. And so I really hope

that we will see more data teams going into this business critical world.

Of course, there are solution, I guess, and others which will help them to, you know, have the right tooling to play in that, critically.

But I also think that's where there's just so much potential

to to use data in a new ways in front of customers

or in in in in very business critical systems where,

it just needs, you know, the the

that that higher rigor and higher focus and reliability. So that's what I'm really excited about. I see that happening in companies, and

I hope this just happens more and more because that's ultimately where the kind of power of data driven business really is.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Yeah. So

I I guess my my point goes back to I think there is enough of technology to do the right things,

and and the biggest gap I see is the mindset. So

I really hope we see

a lot more integration

of data teams to the wider organization.

I also, by the way, think this goes both ways. So I still think there's

a lot of engineers who maybe don't think about data and analytics

maybe to the extent that they they maybe should.

And so I hope to see

a lot more teams,

like, being cross functional in that way where

I remember how we were deleting,

this or how we were

removing boundaries between

front end and back end and infra teams, and we created this cross functional units.

And I wonder if we should do the same with data where we kinda integrate the organizational structure. We integrate data platforms with engineering platforms into wider technology platforms.

And all of that to me

sounds like

is the biggest kind of barrier. So I think it's less about kind of technology because

we have a great storage processing.

Now observability,

cataloging, all of these solutions, I think, are sufficient.

But it's

the mindset of kinda seeing data as this kinda, like, this is the thing on the side. It's not, like, operational thing. I think that's the biggest gap,

which I think if if that's solved,

that's gonna really opening up the potential of where data can

go. Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing on sync and the ways that you're thinking about the

cross functional aspects of data and how it impacts the organization and the broader business case. So appreciate the time and energy that you folks are putting into that, and I hope you enjoy the rest of your day.

Thanks for having me.

Thank you for listening. Don't forget to check out our other shows, podcast.init,

which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com.

Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast