Summary
This episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Synq's platform helps data teams manage incidents, understand data dependencies, and ensure data quality by providing insights and automation capabilities. Petr emphasizes the need for a holistic approach to data reliability, integrating data systems into broader business processes. He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this.
Announcements
Parting Question
This episode features an insightful conversation with Petr Janda, the CEO and founder of Synq. Petr shares his journey from being an engineer to founding Synq, emphasizing the importance of treating data systems with the same rigor as engineering systems. He discusses the challenges and solutions in data reliability, including the need for transparency and ownership in data systems. Synq's platform helps data teams manage incidents, understand data dependencies, and ensure data quality by providing insights and automation capabilities. Petr emphasizes the need for a holistic approach to data reliability, integrating data systems into broader business processes. He highlights the role of data teams in modern organizations and how Synq is empowering them to achieve this.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end-to-end data lakehouse platform built on Trino, the query engine Apache Iceberg was designed for, with complete support for all table formats including Apache Iceberg, Hive, and Delta Lake. Trusted by teams of all sizes, including Comcast and Doordash. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino.
- Your host is Tobias Macey and today I'm interviewing Petr Janda about Synq, a data reliability platform focused on leveling up data teams by supporting a culture of engineering rigor
- Introduction
- How did you get involved in the area of data management?
- Can you describe what Synq is and the story behind it?
- Data observability/reliability is a category that grew rapidly over the past ~5 years and has several vendors focused on different elements of the problem. What are the capabilities that you saw as lacking in the ecosystem which you are looking to address?
- Operational/infrastructure engineers have spent the past decade honing their approach to incident management and uptime commitments. How do those concepts map to the responsibilities and workflows of data teams?
- Tooling only plays a small part in SLAs and incident management. How does Synq help to support the cultural transformation that is necessary?
- What does an on-call rotation for a data engineer/data platform engineer look like as compared with an application-focused team?
- How does the focus on data assets/data products shift your approach to observability as compared to a table/pipeline centric approach?
- With the focus on sharing ownership beyond the boundaries on the data team there is a strong correlation with data governance principles. How do you see organizations incorporating Synq into their approach to data governance/compliance?
- Can you describe how Synq is designed/implemented?
- How have the scope and goals of the product changed since you first started working on it?
- For a team who is onboarding onto Synq, what are the steps required to get it integrated into their technology stack and workflows?
- What are the types of incidents/errors that you are able to identify and alert on?
- What does a typical incident/error resolution process look like with Synq?
- What are the most interesting, innovative, or unexpected ways that you have seen Synq used?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on Synq?
- When is Synq the wrong choice?
- What do you have planned for the future of Synq?
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com with your story.
- Synq
- Incident Management
- SLA == Service Level Agreement
- Data Governance
- PagerDuty
- OpsGenie
- Clickhouse
- dbt
- SQLMesh
[00:00:11]
Tobias Macey:
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for. Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash. Want to see Starburst in action? Go to data engineering podcast.com/starburst today and get $500 in credits to try Starburst galaxy. Is Tobias Mesas, and today I'm interviewing Petr Yanda about sync, a data reliability platform focused on leveling up data teams by supporting a culture of engineering rigor. So, Petr, can you start by introducing yourself?
[00:01:00] Petr Janda:
I think. Yeah. Hi, Toby. So my name is Peter. I'm engineer by background. At this point, I spent about 2 decades building different technology solutions. And especially the last 10 years, I spent a lot of energy on building not just kinda typical engineering teams, but also technology, which is spanning engineering and data mainly in, kinda, scale up organization. So I had a chance to scale a few teams somewhere in range of from few people to about 150, which gave me also a lot of, learnings and experiences of building both engineering data systems, together. And, most recently, 2 years ago, I started a company called Sync where I started as a CTO.
And, here we're kinda solving some of the challenges in our own data reliability, which are inspired by a lot of kind of these lessons learned, over the last decade or so. And do you remember how you first got started working in data? Yeah. So I think the it it goes back to, I would say, like, 2014, where I joined, actually, market research company as an engineer. And, there is something, like, special about it because you realize that the entire company exists around data, because market research is fundamentally around collecting data from the market and and and the industry and trying to really understand what's going on. So in that case, we focused on a lot of surveys and website tracking and really, like, putting all these data together.
And back then, 2014, that's kinda premodern data cloud. So you didn't have, all the all the modern cloud technologies. So we had to build a lot of, kind of data processing engineering, solutions ourselves. And so, like, that was the, I would say, like, quite strong transition from, you know, typical engineering to a company where data is at the very heart. And someday, you're gonna spend a lot of energy solving the challenges around data, which which is kind of the fundamental of the company.
[00:03:01] Tobias Macey:
And now in terms of the sync project and business that you're building, can you give a bit more overview about what it is that it does, the problem that you're solving, and some of the story behind how it came to be and why you decided that this is where you wanted to spend your time and energy?
[00:03:17] Petr Janda:
Yeah. So think it goes back to my time at Pleo, which at the time, where I was there as a CTO, was a fintech. Think about about, like, 500 people organization, about, like, 100 engineers, 10, 12 people working in data. And I had, a responsibility for both engineering and, data side of the of the company. And there was this moment, where where we had a few incidents. 1 of them was on the engineering side, and another 1 was on on the data side. When I worked with Techano Engineering, we we saw that incident. I think it was, like, 15, 20 minutes because it was something related to our cards. If if cards transactions at the, financial institutions don't work, it's really high priority, and we basically deal with it almost immediately.
And then I remember the issue which we found a couple days later in our in the data analytics stack. We didn't resolve for it. Like, I think it was more than a week, and that was very frustrating. And I kinda looked at it as, like, it's it's about technology. So how so the the kinda the approach is so vastly different. And so that was kinda the trigger when I felt like there is something what has to be done and, essentially, it started sync. And, like, the the underlying, kind of main mission, which we which we felt about we we should we should go go for is to really bridge like, close this gap and bridge the the tooling from the engineering world and bring it to data analytics world.
And so, essentially, the what we're building towards is that we'll work with data teams who are powering business critical systems as much as engineers. And I was like, we look at we look back and, hopefully, like, almost, like, look at some of these experiences. It's like, yeah. This is almost, like, ridiculous. It it shouldn't happen. And we kinda treat building data systems to to the same rigor as as engineers. So that's kinda, like, the, like, broader picture, and there's a lot of kinda nuances and and technology solutions we have to build to to make that happen.
[00:05:26] Tobias Macey:
In the space of data reliability, data quality, data observability, that is a product category that's been growing in terms of overall investment and companies and solutions for the past 3 to 5 years. And a number of the vendors are focusing on different aspects of that problem space. Some of them are trying to do more of a horizontal view of it. Some of them are focused on point solutions. What are the capabilities that you saw as lacking in that overall ecosystem that you're trying to address, whether it's specific point solutions that we're missing or the overall experience for data teams. I'm just wondering why is it that you saw the need to add another data reliability, data engineering rigor product in some of the solutions that you're trying to provide to solve for that?
[00:06:23] Petr Janda:
So so I think it's about, like, looking back, about 2 years ago, there there really was just a couple of startups with I think all in in a very early stage. And I think at that point, almost everyone was focusing on the problem of detecting issues. So I was like, how can we discover that something actually went wrong? And to a degree, like, this is still, like, a core of of data observability solutions. Right? We have to kinda uncover that that something is not working in a data stack. But where I felt there is, like, lot of space to innovate and a lot of opportunity is actually what happens after that? So once we detect it, that certain table is missing data or that that certain test is is now failing because some business, validation is is not met.
What happens afterwards? Like, what is the workflow from the perspective of finding the right team to deal with that issue, assessing what's even the impact on the company? And is this even an issue we have to deal with right now, or is that something what can what can wait and be dealt with later? And then ultimately driving the whole resolution and bringing the system back to, normal operations and try that communicating with the with the rest of the company. I feel like there's, like, a range of solutions or range of problems which need to be solved. I would say to a degree, even in engineering, it's still, like, being solved even though, like, engineering is, of course, like, way ahead from the perspective of managing business critical systems and and and incident management, etcetera.
But I I believe there's just, like, so many problems to be solved. And ultimately, I especially in, like, today's economy, I kinda don't believe that, we could be a point solution, which this is 1 of these things. So we're very much focused on building a platform which goes end to end from helping customers even set up the right testing strategy from detecting issues and then all the way to the resolution and kind of the entire workflow.
[00:08:29] Tobias Macey:
To that point of incident management, on call management, incident resolution in the operational and infrastructure realm, that is a problem that has been very thoroughly explored. Obviously, there's always room for improvement, but it is something that is part of the default and assumed characteristic of a team who is operating infrastructure. For data teams, that is something that is coming to be more more widely accepted, more widely understood, but there isn't necessarily a clear playbook for any given team to know what constitutes that incident, what does it mean to be on call as a data engineer or as a data platform engineer, What does resolution look like? How do I think about SLAs?
Who do I notify? How often do I notify? I'm just wondering what are some of the ways that you see those concepts being mapped into the ecosystem of data engineering and data systems?
[00:09:28] Petr Janda:
So I guess, like, first first thing to say is that it's relatively new, especially when I when I think of, like, a a traditional data analytics team that that they even think about incident management. I think this is a good thing because it also means that that the team is probably powering something a lot more business critical than they used to before. Because, otherwise, what's the point of, doing incident management in the middle of, evening or night if if actually could have waited till the day after? So I think it's it's almost like I would say it's coming to the to the data platforms. In terms of what actually has to be done, I kinda believe that data systems should be looked at as software.
And in that case, I almost think there is very little that we should do differently than from the from what we already know from engineering. So in that sense, all the gonna process from even declaring an incident and escalations and and the the communication around handling the incidents and towards the rest of the business or customers, I think that should be almost identical. The the difference maybe here is in terms of the nature of the system where if you look at typical data platform in a company, I think 1 very defining factor is that that platform is integrated to almost every system in the business. So that means that you might be dealing with hundreds of sources of data and a 100 of use cases on on top of this data, and the platform sits somewhere in the middle.
When things break, you don't even know if it's inside of the platform or if it's somewhere upstream. And so where we also focus with with Xing is helping understand, like, what is actually going on and what's happening. And I think this is critical part of incident management is when something triggers, how do I go from under like, that that moment to understand this is a business impacting issue that we really have to deal with right now versus later, that I think is sufficiently different in, like, when you look at the traditional software system and when you look at data system, mainly due to the nature of these, like, very rich dependencies which go across the company. But the workflow which follows after that, I think, should be largely similar.
[00:11:58] Tobias Macey:
To the point of incident management, SLAs in particular, and what it means to be on call for a given company, only a portion of that is a technical problem. A lot of it is organizational and cultural. And I'm wondering how you see sync playing a role in that corporate and organizational and cultural transformation that is necessary to build that competency as a team that does incident management, incident resolution where the uptime of the data and the analytics systems is the maybe not the core focus, but a core focus?
[00:12:38] Petr Janda:
Yeah. So this is really, I think, 1 of the the toughest thing to solve for for many data teams. It's almost like how do we do that transition, which definitely is to larger degree also cultural. 1 of the, again, anecdotes I always go back to is when I look at my time in Pleo and then, eventually seeing data teams building a lot of complex systems on top of our production systems data. And there was that point where I took a lineage of that system and put it on the screen in front of my engineering management. I I just saw how surprised almost all of them were that this thing even existed.
And so, like, from that point, I realized that, like, really big part of solving this challenge is actually increasing the transparency across the company of what is actually happening. Because in a way, I I think there is sometimes, like, this bad reputation that almost like, engineers are breaking data and that that that's bad. And I think that that's definitely happening and it's true. And they also have their own kinda agenda in terms of building a product and road maps and very tight deadlines. But I fundamentally believe that if you tell to that engineer before they push code of if you do this, this very critical thing in the business will break. They'll not just say, like, yeah. Don't care. I'm gonna press the button and go. And so so I have my kind of work done. So I think, like, even helping map, again, the what is actually critical in the company and in the data stack, to me, is, like, something what the tooling can help with.
And then once that's mapped, how do I transparently communicate that across the company in the right moments and in the right workloads? So I think there's, like that's where the tooling can really help to do all that work, which otherwise is really hard, where I saw teams in the time of, let's say, outage where they went into, some sort of, lineage solution, and they went model by model to find who is the owner. And then, like, it took really, like, half an hour just to assemble the list date of people they should talk to versus things like that just being completely available with 1 click. And you you almost, like, have all this information to understand what's happening in the company, readily available. I think that's that's definitely 1 1 aspect, and and I think it's all around, like, working on implementing the solution, but putting it in the hands of people so they actually use them.
[00:15:14] Tobias Macey:
On that point of incident management, uptime for data teams, on call rotations, the broader question is, what does it mean for a component or an overall analytics system to be down? When does it merit being paged in the middle of the night versus waiting until business hours? What does an on call rotation for a data engineer or a data platform team to be on an on call rotation? Like, just wondering some of the ways that you're seeing teams tackle that set of questions and some of the ways that you think about that as a product that is trying to support them in facilitating that functionality?
[00:15:56] Petr Janda:
Yeah. So I think, like, 1 of the 1 of the ways how we look at data systems, at at our customers, but in general, is that, like, the key thing we really wanna help them with first is to, like, code codify which parts of the data stack drive a business critical use case. So 1 example could be you might be automating your advertising bidding based on customer lifetime model. Customer lifetime models typically have a lot of inputs, which means they would be pulling data from a lot of parts of the company. And so I'd argue that that that model is 1 of your critical data products.
And I think, like, the first thing is it's it's even helpful to codify it into platform like Xing to say that thing is a critical data product. If it's if it's somehow affected by an issue, this is a p 1 issue, and it has to be escalated accordingly. And then when something happens anywhere in the data stack, which might have impact on this p 1 data product, the person who gets that alert, which might be on completely different side of the company, will get an alert, which is not just saying here is a log record of an issue which happens, but it also automatically does the assessment of what's actually downstream from here. And could this type of issue have an impact on on that, customer lifetime lifetime value model.
And so I think the the the whole challenge is from, again, like, understanding, like, what is critical, which will ultimately help me answer the question of am I looking at the failure which might be bringing down this critical component, or am I looking at the test failure which happens on a model which is just being created yesterday and no one's really using for anything in production? So, like, even differentiating these 2 alerts is must be helpful from the perspective of if I look at this without knowing the data stick by heart, I might not really know the difference. And then in terms of, like, the the actually being on call in data teams, I think this I see this varies a lot across companies I see, where I actually see a fair bit of companies who are actually running, like, the almost, like, engineering, like, incident management systems when there is a escalation to the point that someone gets woken up at night if if if something's broken.
But the most common I see is this kinda, like, in hours on call rotation, which is typically called, like, goalie or some sort of, like, person appointed for that week to be, like, a first first level diagnostics for all the issues coming to the team. And I think this is, like, a good approach in smaller organizations. When I see larger teams with many, and the data teams and many teams contributing to data stack. I tried to almost say, do we really need that? Or can we route the alerts automatically to the relevant people directly? So it actually, in some companies, we managed to reduce that role and almost, like, you know, make the relevant person aware of the issue immediately.
But, again, like, it's it's something what's still coming to the experience. I think everyone is learning. I think it really then should be actually tailored to the business. So I'm I'm kind of reluctant to say there is the right way to do it, and I think every business has to assess it. And I think that the best way I always look at it is is kinda going backwards from this kinda critical systems and understanding, like, okay. If this breaks, do we need to solve it at 2 AM, or does it wait till till the morning? And that's kinda the decision, like, I think every business has to make on their own.
[00:19:51] Tobias Macey:
In the operational realm, there's the analogous use case of different services have different levels of criticality. So you've got different gradations of how severe are particular outages, which maps to the idea of page me in the middle of the night. I don't care what I'm doing to I don't care about this unless it's during business hours, and I can take my time with it. And given your observability functionality that you have in sync and the core focus on data products and data assets versus the individual table or individual pipeline approach. I'm wondering how you see that shift the thinking in data teams around how to try to map those different products to that level of priority of, oh, this product is something that is used every day.
The observability data supports that. Or based on the observability, I can see that is a quarterly report that doesn't matter unless it's the, you know, close of quarter at the end of the month, in which case I do need to address it immediately into some of the ways that that asset centric focus shifts the ways that data teams approach their work.
[00:21:01] Petr Janda:
So, like, the this is quite interesting in a way that even from the point where we started the company, we was, like, always thought and we built the entire system around this notion of data assets, which means that we purposely didn't wanna build a system which, let's say, revolves around tables. And that's, I think, before, solutions like DBT and and and modern data stake and the analytics engineering workflows, I think it was right that, like, a lot of the data stake revolved around tables. But now we have tables and metrics and models, and now we're talking about data products, which is a little bit overloaded term in terms of what it really is, and I think everyone has their own version. But, ultimately, in my mind, it goes back to, again, defining the critical parts of the data stack. I think you mentioned the the concept of tiering, which I see, like, a lot in terms of companies figuring out how do I define which of our models are critical, like a p 1 or p 2, p 3.
By the way, a little bit, skeptical about some of the technical indicators. Right? Because sometimes you could look at observability data and see this is a lot of downstream dependencies. This looks important or there's a lot of queries happening. And then there could be 1 asset used by, like, a CFO for some really critical decision used, like, once a month, and that really shouldn't go wrong. And so, like, I always like to combine some of the technical indicators, but ultimately have the customer say that thing over here is really critical. And that's exactly what we did with the data products. And that's, to me, in a way, the way how we also build it technology wise, we can create data products from group of dashboards or group of models or set of metrics. It doesn't really matter, but it was always around almost, like, ability to let our customer express this group of things is important. The these things together have some sort of meaning which goes beyond their kind of physical manifestation in our data stack.
And we attach certain criticality to this data product. And then when things break, we wanna communicate that to to an engineer related to that. So I'm a really big fan of, you know, data product thinking, and I was, like, really defining it as a this is not, like, a outcome or, like, this is critical output which leads to some sort of outcome in the business. And and so I hope that almost, like, at some point, we will look at tables and it will be the same as, like, files in the containers in engineering system. It's like, yeah, it's there. It's doing its job. But we're not talking about files when we build systems. We're talking about the the system, which does something.
[00:23:57] Tobias Macey:
That's that's a good analogy. I like that idea of tables are just the files. We don't necessarily care about the tables in and of themselves. We only care about them insofar as they are useful for something else.
[00:24:07] Petr Janda:
Exactly. Yeah. Exactly.
[00:24:09] Tobias Macey:
The other interesting aspect of treating these data assets as a product and something that is consumed and relied on by the overall business is that it accentuates the fact that data is a team sport, and it's not just you as a data engineer doing table transformations and pipeline management. It is your efforts are in this broader context of the overall purpose of the organization and how it's going to be used. And I'm wondering how you are seeing that change the ways that data teams approach their work, both technically, but also, more importantly, organizationally, and some of the ways that the rest of the organization is being brought into the work that's being done for that data engineering, data product definitions, how you see data governance may become more to the fore because of the fact that it is a collaborative and cross functional problem and not something that is purely technical.
[00:25:08] Petr Janda:
So I wanna ship actually was 1 of the first thing we focused on solving when we started sync. And, again, reason for that was I've seen exactly this problem where I even as a leader of both side of technology organization, I I realized that it's really hard for me and and for all the teams to even communicate and understand, you know, what's happening across this, increasingly complex data stack. And so especially when you start to see, like, multiple analytics teams, some central engineering team, dozens of engineering teams which are producing data, and commercial teams which are producing data, it becomes really opaque as the as the ecosystem.
And so to me, like, solving for ownership as a concept across this whole structure is really important. And and almost without it, any observability will be almost, like, not actionable. Because, like, if I don't know who the owner is or how other owners are impacted by issues, then how can I really, help this organization solve them? And so to me, the I think, luckily, a lot of companies are realizing that, and and you see a lot of lot of different, different, approaches where, you know, 1 way or another, companies are starting to define who the owner is.
The biggest problem I've seen and still to a degree I see across the industry is that that information about ownership is is bit, let's say, not actionable. So so it could be anything from we maybe are tagging our models in something like DBT, But also, I've seen versions where there is a spreadsheet of saying that that these folders in this project are owned by the team over here, which is probably good as like, to some degree to take on a very high level, but I think this is really hard to action. And so the way how we approached it is that our goal was to bring this to the to the, like, the the the path of solving issues is that this ownership is projected across the entire platform.
And, of course, part of it is understanding that if something happens, the right owner is the first 1 to be notified. But then telling that owner, well, based on on on this issue, we see that these are the teams which are which are owners of assets, which are downstream from this failure. So to me, like, putting on once, like, ownership to the to the concept of lineage, that means that I'm starting to look at observability system from, like, dependencies between teams rather than dependencies between tables or or data assets. So I think that's 1 way we definitely like to work with ownership that we use it as a as a almost like a map of like, we're layering the teams on the data assets.
And then as we do impact assessment or or, like, some different queries across the system, the ownership is always part of it, if that makes sense.
[00:28:19] Tobias Macey:
Digging more into the sync product itself and the technical details of how it integrates with the data systems and the organization. Some of the things that are coming to mind are the lineage tracking and observability that you get from hooking into the data warehouse, looking at the table transformation logs to see what came from where. It also brings to mind the idea of these metadata platforms where you have a cross cutting view of all the different ways that data is being transferred across different system boundaries. And I'm wondering if you can just describe a bit about how sync itself is designed and implemented and the integration points that it has into a company's overall data suite?
[00:29:06] Petr Janda:
So so I guess, like, maybe 1 1 way to explain it would be would be to start at little bit high level where we've built sync around almost like a 3 key concepts where 1 was, which we already discussed, that everything in a data stack is modeled as assets. So whatever if it's dbt model, data warehouse table, BI dashboard, ETL pipeline, all of that is, to us, an asset. Then the second pillar is that we model the relationships between these assets because that it's ultimately allows us to build almost like a map of the entire ecosystem. And the third concept is, something we call executions, which means that every of these assets is doing something, whether it's, transforming data or creating a table or query which is going in front of the user.
And if you think about these 3 concepts and you built your entire experience on top of it, then building integrations becomes a lot easier because for us, like, everything is first class citizen. So whether if it's a warehouse or transaction database or transformation tool, we can model everything into these concepts. And so the first thing we've done, we we focus on on the the heart of data platforms, which is the data warehousing. We invested heavily on into dbt. Reason is obvious that it's becoming almost like a standard for data transformations, which means that we see a lot of teams using it. And then we started expanding the the the coverage into BI world. We're now working a lot on APIs and a push towards, let's say, the data sources and the ways how we can tap into ecosystem, which is upstream of warehouses.
And from the perspective of what does it mean in in terms of integration from a customer perspective, we very much believe that a lot of this should work out of the box with, like, minimal configuration. So for most of the systems we're building, most like the off the shelf connector where what we really need is the, access credentials and a security review. Let's say so we can actually go and connect to these systems. But from that point, everything is automated. And then we felt that in order to build the best possible solution on the market, we should do some kind of strategic investments in the infrastructure level, which means that we, for example, build our own parser for SQL, which understands, lot of, kinda, specific dialects of different warehouses. We understand where exactly in the files the logic for different column is. So that ultimately allows us to build, like, new experiences where we're, for example, starting to blend workflows across lineage and code into a lot more unified experience, which which I at least haven't seen on the market.
And so, like, the the goal really here is to make sure that many of these solutions are automated. And then on top of this, we're building some of the kinda unique capabilities, which ultimately, allows us to go deeper and help the the practitioners to back their systems, to the next level.
[00:32:29] Tobias Macey:
In your work on sync from when you first started working on it to where you are today and working with some of your early customers, what are some of the ways that the scope and goals of the product have changed since you first started working on it?
[00:32:45] Petr Janda:
I think the the hard truth is that they keep expanding. So so we definitely started with the with the approach of potentially being a more point solution. But as you know, the the the market is a bit more demanding nowadays. So I think the biggest change is that we keep kinda expanding the scope, from kinda initial focus on ownership and critical assets into now basically building the entire reliability platform, which has a component of observability as well, largely because that's it's not necessarily because we wanted to build data observability company, but more so because the market started to see us that way.
And so in order to to be competitive and and and win deals, we we have built a lot of that functionality. And I still believe there's a lot of ways to almost, like, redefine what observability is or where it can go. It's still relatively young, and then we're still talking about, like, monitoring and lineage and schema detections, which is all fine. It's, like, important features. But, ultimately, we're we're kinda thinking of, like, where does it expand next? So now that we have, let's say, solid foundation in the in the kind of data ecosystem, what would be the next thing which can be done on top of data observability, which is that kinda what we're thinking about as well.
[00:34:13] Tobias Macey:
For teams who are looking to bring sync into their ecosystem, they want to start using that for all the features that we've been discussing. I'm wondering if you can talk through the workflow of actually getting it set up, starting the onboarding process, and given the breadth of functionality that it supports, maybe what is the first entry point that you see as being either most common or most effective for that broader adoption?
[00:34:41] Petr Janda:
So to a degree, this this really depends on on customer kind of pain point, and we are still at the stage where we we love to work with our customers. So, the the the first step is to get in touch with us. And from that perspective, there really are few different avenues. And because of the the depth of the platform, there are now in a different ways we can lead with the different functionality, if that makes sense. And so I think 1 of the examples of use cases is, are companies who the biggest challenge is is the detection of issues. 11 1 kinda type of companies is typically teams who have to work with 3rd party data. Right? So they they are ingesting data from even different companies, and they don't really have control over testing them. So in that case, the the leading functionality is is anomaly monitoring, which means that we integrate their data warehouse. We discuss where actually the critical aspects are, and then eventually deploy a set of monitors which are, let's say, fitting the the, let's say, the type of issues which might happen with with the data.
In other companies, it's a lot more around uncovering a structure of the system. So we have 1 customer who has multiple DBT projects. Now we're working with the customer, who has multiple SQL mesh projects. And for both of these, the first goal to was to, you know, like, understand end to end picture of this this ecosystem. So so, again, in this case, it's all around, like, lineage and impact assessment of of issues which might happen across these systems. And, again, the the the first step always is integrating into into the data platforms and then, potentially, in this case, onboarding through the use cases around around, like, uncovering the structure of the data stack through lineage, for example.
So it really depends. Of course, like, the the the the common step here is integration to the data platform, but that's also why we've really invested in making these connectors work off the shelf. So there's, like, not really much, if at all, like, manual work in terms of, like, tagging assets, etcetera. This this kinda is all picked up automatically.
[00:37:07] Tobias Macey:
And once a team has sync deployed, integrated, they have all of their assets modeled. They understand what are the data products, who are the end users. I'm wondering if you can talk through what a workflow or an incident resolution process looks like with sync as the hub of that activity.
[00:37:29] Petr Janda:
So 1 of the the so so once we finish the the base level integration, we now have cross system lineage. We might have, like, basic level of monitors deployed. The next typical step is to codify a few of these additional concepts. So 1 of them is codifying ownership. We do that in many ways. The most typical 1 is that we work with metadata from, DBT where if the team already defined Auris in DBT, we simply lift that metadata and and set, let's say, mirror of the structure inside of sync. In other teams, this could be done by specific data sets or folders in the weekly project, so we have a lot of concepts how to do that. Second 1 is setting up data products. So, again, in in a typical deployment, it starts with, like, handful of data products which we focus on. That's where we kinda start the deployment.
And then every customer takes it from their different ways. We have customers with a 100 or so data products because they actually wanted to make them more granular. Some of them stay in the range of, handful. So so it's it really depends. And and so once we have data products defined, we have ownership defined, we typically define alerting, which is kind of mapped to these owners. We can then run, like, more, like, let's say, more powerful incident management. And what that means is that a a typical process is something fails in a data stack, could be either test or could be a monitor, which could come from other system like DBT or it could be RO.
This essentially triggers what we call an issue, which means that issue is, something that recognizes that something failed, but it's not yet clear how critical that is. And so our issues end up, first of all, alerted into business systems like Slack and Microsoft Teams or email. But then we also bring list of all issues into a view we call triage. So you can think of it as this is a list of things which failed. And in this triage, we're giving the person who is responsible to go through that list as much as context as possible in order to quickly assess, is this business critical? Yes or no. Are there critical products impacted? Yes or no. Is there a team which, is important impacted? So all of that information is completely collected into 1 single screen, or someone can go through the list and triage the issues.
In some cases, the decision could be no action needed. This is gonna be, let's say, quietly fixed in the next build. In other case, this could be declared an incident. And once an incident is declared from a subset of issues, this triggers the the incident management workflow. And we were thinking a lot where the boundary between sync and more traditional incident management workflow tools is. And we've decided that essentially post incident declaration, we still wanna give the the data practitioner this kinda single page view where you see the lineage of assets in incident, the list of all the issues which are included, list of teams, list of products, all basically the the kinda impact assessment on 1 screen.
But ultimately, at this point, the incident is linked to external incident management system where, let's say, the the incident management system where, let's say, the the traditional incident management process could happen. In some teams, this is simply a Jira ticket to to deal with. In other teams, this is a PagerDuty or or Opsgenie or a system like that where they actually manage, let's say, wider incidents also from engineering. And so what we really wanna be is that bridge. But the critical thing which we've built is that concept of promoting an issue to an incident. Because what this actually does is that it allows teams to almost, like, reason about all of their, let's say, quality from both perspectives.
1 is on the issue level, which is saying, okay. We have this asset which is firing a lot of issues at us, and maybe we have to do something about it. But also the second level is we have declared some business impact in incidents, which are typically originating from somewhere in the data stack. And this is also very important from analytics perspective for governance because I think that if teams report on issues, it's almost almost like a very negative picture they might be creating. Because to me, it's the same as if engineering would be reporting on every single issue which happens in that system.
But I almost can guarantee that in any sufficiently large system, there is something failing all the time. But most of the time, it either recovers in few minutes or almost immediately. In some cases, it could be left like that because it's not critical. And so creating this obstruction where you differentiate issues and incidents is really important, and that's really, like, a big part of our incident management process in sync. And then again, the the the actual incident is handed over to to the tools which are designed for this, like, PagerDuty, incident IO, etcetera.
[00:42:55] Tobias Macey:
And in your work of building sync, working with these different data teams to understand how they think about their roles, how they fit in the broader organization, how to get everybody working in the same direction. What are some of the most interesting or innovative or unexpected ways that you've seen the sync platform used?
[00:43:14] Petr Janda:
Yeah. So the I I don't know if I have, like, 1 big anecdote, but it I I almost, like, like to be surprised by our customers every now and then, where it's almost like a privilege to have engineers as, as customers because they are creative. So so there's a couple things which come to mind. I think 1 of them was, like, a great surprise where it's it now goes back couple couple months ago where we had 1 of the engineers who connected their warehouse into 2 separate dbt projects. And we didn't even realize that it's gonna work out of the box, where essentially because we resolved lineage from 1 1 project to warehouse, and then from warehouse to another project, it just worked out of the box. So there was definitely, like, surprise that we were very happy with. And other examples are things like, I guess, 1 internal tool we have, which I never thought we would have, exposed to customers, is that we've built almost like a query engine for assets where you can do a query such as find all the dashboards and then find all the assets which are upstream of these dashboards, which are also of a type of, DBT source and have a certain tag.
And we built this internally because we wanted to express some of the concepts which are very hard to do in the UI and through, like, a drop down boxes. But it's some information that it existed somehow leaked to our customers, and they started to write these queries and and, like, build some of the functionality, which, I haven't thought of. So we can use these rules to deploy monitors. So we had the customer who said, deploy monitors on all, DBT sources, but only if there is a certain type of the models downstream. And so that basically, kinda used some of the things which we never really designed in the 1st place. So that's always kinda good surprise. And in that sense, I realized that it's it's really fun to build almost, I guess, more sales let team because you get to work with customers very closely. So you get to uncover also a lot of these interesting use cases they find, themselves.
[00:45:23] Tobias Macey:
And in your experience of building this business, building this product, operating in this ecosystem? What are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:45:35] Petr Janda:
So for me, it would be, like, 2 things. 1 of them we just touched on, which is, sales is actually fun. And, you know, being 2 decades in engineering, it's maybe not the the most, most expected learning I would have, but I really realized it's fun to to work with teams across different companies. And and I realized, so this almost, like, just a different type of problem solving. So this is really fun. And the second 1 is almost, like, confirming that what we believed could be true around data ecosystem is actually doable, and that that relates to how we're actually building sync where under the hood of our technology at the very heart of the system, we have a data warehouse or we have data warehousing technology. We use ClickHouse.
And so what we've done is that we've built entire product around data platform, which means that when customers send data to us, the first thing which happens is it gets stored to ClickHouse, and then all sorts of processes kick off inside of ClickHouse and with our microservices built around it. But I really wanted to challenge that that aspect that that, you know, the the operational systems are built with Postgres and transaction databases, and then there is some other system which is focused on data. And so the learning, I guess, was that before that I never had a chance to do that. It was kind of like a theoretical thing which could be done. But I always had a team where you had a warehouse managed by data team and, operational system managed by engineers. And I felt that always that the, the barrier was a bit artificial. And so we kinda delivered on that where we put that data system at the heart of the company. We, of course, monitor it with sync itself.
So there was, like, really a lot of kind of problems we had to solve. But, ultimately, we have lifted kinda data platform to be operational system, and and that's exactly what we believe should happen across different teams. And maybe final point on this is that I didn't realize until building sync how powerful the concept of testing data actually could be. And the example which consistently comes to mind is that we ingest data from a lot of companies from about 30 different types of systems, such as, like, Looker, dbt, bunch of warehouses, etcetera.
And, again, we are very reliant on these data streams coming to us as a company. And you can't really test with unit tests, from the software perspective that all these streams are actually working. So we also run anomaly monitoring at the ingest of the data warehouse. And we had many, many cases where we notify the customer couple hours after an outage saying, hey. I think you misconfigured your Looker. We're not no longer ingesting data. And in in that way, we're testing data, but we're detecting issues, you know, almost like the operations of our actual product. And so maybe if I was if I ever went back to, running engineering team, I would maybe think about the the power of some of these kind of data testing techniques and how can be how can that be enriching to some of the different ways, engineers test their systems. So that's definitely something I discovered on the way.
[00:49:09] Tobias Macey:
For teams who are trying to get a better handle on their data systems and maybe they're interested in incident monitoring, maybe not, I'm just wondering what are the cases where where sync is the wrong choice?
[00:49:23] Petr Janda:
So I think that if I even look at our deals where we we didn't, we didn't close, I think the the the common pattern is that we realized that that there just wasn't such a need to to bring that engineering rigor or that reliability rigor into the data stack. And so I actually have this kinda qualifying question for for for many companies we speak with, which focused on, like, what what is the most critical use case for data in the company? And I was, like, trying to understand, like, where the team is on the basis of that. And I even think, like, this is very good questions question that data team or data leaders should ask themselves.
Like, what is the most critical thing I am powering in the company? Because that kinda is a good proxy, let's let's say, to the value you provide to the business. And so we basically need to work with the teams where that that question is answered well because we're ultimately helping them. So so we can't really go beyond that. So I saying that's definitely 1, which means that the teams who maybe haven't yet found these business critical use cases, we might end up being a nice to have, which might be too early. Maybe it comes later, but that's definitely the most common is that maybe the the the use cases for data are just aren't there.
[00:50:51] Tobias Macey:
And as you continue to build and invest in and scale the product and the business of sync, I'm just wondering what are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to explore?
[00:51:05] Petr Janda:
So I touched on that a little bit earlier in terms of there are kind of different areas where I think we can take this. And I think this also goes back to this this question of business use cases and business critical use cases where what I hope we will do is is that we will bring the the data reliability problem closer to the business. What that means is that maybe we break a little bit away from being technical data tool, which is focused on detecting issues in data tables and gradually get closer into a tool, which is your platform that is underpinning business workflows.
And so we have examples of this where we've been asked by our customers to send various different alerts into non data teams. And so I really like this trend where where, for example, work with the financial institution where we where we are detecting issues inside of the data warehouse. But we're, let's say, surfacing these issues back in front of operational teams and compliance or in, in, treasury even where we are essentially becoming part of the the critical business workflows. And so I hope maybe that, again, like we discussed that maybe tables are becoming files. And in the same sense, the the data will be still, like, technology anchor of the solution.
But, really, we will gradually talk a lot more around, like, we're solving for reliable business processes, which might or might not be just data team's problem. It might actually be involving wider company. And I think this is super important because the data doesn't originate or doesn't end in the data platform. So it's really it would be really a leap to to make data quality or reliability a problem of the wider company. And and I think there is this lot of new solutions which have to be built in order to make this topic engaging to non data, more operational teams, or maybe they don't necessarily, want to see, like, a deep technical alerts. But they might see they might wanna see very specific type of alerts such as we've detected these issues with your sales data. Go here to the Salesforce and fix it in this and this way. You know? And then suddenly, it's not really data reliability. It's just we're working with the business people and helping them kinda run their business.
[00:53:42] Tobias Macey:
Yeah. Yeah. That's a very interesting point as well as the data ecosystem and the organizational data economy becomes cyclical with things like reverse CTL or operationalizing your data, however you wanna term it, where the data doesn't end in the warehouse. It gets fed back into those operational systems or application systems and then fed back into the data warehouse and enriched, etcetera. And so the fact that you Exactly. Looking at reaching out into those other systems to understand what is the downstream impact of this transformation that I'm making that is going to then feed back into the Salesforce or the HubSpot or the operational application that is feeding back into the warehouse and brings it more back to that cycle of it being a team sport and not a, solo, event.
[00:54:29] Petr Janda:
That's exactly yeah. Like, the the point is to to how can we bring non data teams into the into the mix? Because I think that's where the really, really meaningful change could happen.
[00:54:40] Tobias Macey:
Absolutely. Are there any other aspects of the work that you're doing at sync or this overall concept of incident management, data being a team sport, the organizational transformation involved in bringing data into these business critical use cases that we didn't discuss yet that you'd like to cover before we close out the show?
[00:55:01] Petr Janda:
Yeah. So I guess, like, the like, maybe 1 1, like, a parting thought I have is that 1 of the biggest learnings I have goes back to this this kind of thinking of how we built sync, and I really hope that what we will see is is a lot more of was, like, data platforms and data teams being much much more integrated into, like, wider business processes. This could be anything from, like, data getting much closer to engineering and actually powering a lot more user facing systems, or the notion is is you just outlined in terms of data powering more in a business critical systems. But I guess what I find would be a shame is that if we come if if data practitioners and data teams stay in this in this world of, like, we're we're maintaining company reporting, and and we don't wanna be woken up at night. We don't want really wanna be dealing with all that.
Because I ultimately think this is, like, hindering the value or, like, the potential of the the data in the company. And so I really hope that we will see more data teams going into this business critical world. Of course, there are solution, I guess, and others which will help them to, you know, have the right tooling to play in that, critically. But I also think that's where there's just so much potential to to use data in a new ways in front of customers or in in in in very business critical systems where, it just needs, you know, the the that that higher rigor and higher focus and reliability. So that's what I'm really excited about. I see that happening in companies, and I hope this just happens more and more because that's ultimately where the kind of power of data driven business really is.
[00:56:52] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:57:09] Petr Janda:
Yeah. So I I guess my my point goes back to I think there is enough of technology to do the right things, and and the biggest gap I see is the mindset. So I really hope we see a lot more integration of data teams to the wider organization. I also, by the way, think this goes both ways. So I still think there's a lot of engineers who maybe don't think about data and analytics maybe to the extent that they they maybe should. And so I hope to see a lot more teams, like, being cross functional in that way where I remember how we were deleting, this or how we were removing boundaries between front end and back end and infra teams, and we created this cross functional units.
And I wonder if we should do the same with data where we kinda integrate the organizational structure. We integrate data platforms with engineering platforms into wider technology platforms. And all of that to me sounds like is the biggest kind of barrier. So I think it's less about kind of technology because we have a great storage processing. Now observability, cataloging, all of these solutions, I think, are sufficient. But it's the mindset of kinda seeing data as this kinda, like, this is the thing on the side. It's not, like, operational thing. I think that's the biggest gap, which I think if if that's solved, that's gonna really opening up the potential of where data can
[00:58:40] Tobias Macey:
go. Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing on sync and the ways that you're thinking about the cross functional aspects of data and how it impacts the organization and the broader business case. So appreciate the time and energy that you folks are putting into that, and I hope you enjoy the rest of your day.
[00:58:59] Petr Janda:
Thanks for having me.
[00:59:07] Tobias Macey:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed for. Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash. Want to see Starburst in action? Go to data engineering podcast.com/starburst today and get $500 in credits to try Starburst galaxy. Is Tobias Mesas, and today I'm interviewing Petr Yanda about sync, a data reliability platform focused on leveling up data teams by supporting a culture of engineering rigor. So, Petr, can you start by introducing yourself?
[00:01:00] Petr Janda:
I think. Yeah. Hi, Toby. So my name is Peter. I'm engineer by background. At this point, I spent about 2 decades building different technology solutions. And especially the last 10 years, I spent a lot of energy on building not just kinda typical engineering teams, but also technology, which is spanning engineering and data mainly in, kinda, scale up organization. So I had a chance to scale a few teams somewhere in range of from few people to about 150, which gave me also a lot of, learnings and experiences of building both engineering data systems, together. And, most recently, 2 years ago, I started a company called Sync where I started as a CTO.
And, here we're kinda solving some of the challenges in our own data reliability, which are inspired by a lot of kind of these lessons learned, over the last decade or so. And do you remember how you first got started working in data? Yeah. So I think the it it goes back to, I would say, like, 2014, where I joined, actually, market research company as an engineer. And, there is something, like, special about it because you realize that the entire company exists around data, because market research is fundamentally around collecting data from the market and and and the industry and trying to really understand what's going on. So in that case, we focused on a lot of surveys and website tracking and really, like, putting all these data together.
And back then, 2014, that's kinda premodern data cloud. So you didn't have, all the all the modern cloud technologies. So we had to build a lot of, kind of data processing engineering, solutions ourselves. And so, like, that was the, I would say, like, quite strong transition from, you know, typical engineering to a company where data is at the very heart. And someday, you're gonna spend a lot of energy solving the challenges around data, which which is kind of the fundamental of the company.
[00:03:01] Tobias Macey:
And now in terms of the sync project and business that you're building, can you give a bit more overview about what it is that it does, the problem that you're solving, and some of the story behind how it came to be and why you decided that this is where you wanted to spend your time and energy?
[00:03:17] Petr Janda:
Yeah. So think it goes back to my time at Pleo, which at the time, where I was there as a CTO, was a fintech. Think about about, like, 500 people organization, about, like, 100 engineers, 10, 12 people working in data. And I had, a responsibility for both engineering and, data side of the of the company. And there was this moment, where where we had a few incidents. 1 of them was on the engineering side, and another 1 was on on the data side. When I worked with Techano Engineering, we we saw that incident. I think it was, like, 15, 20 minutes because it was something related to our cards. If if cards transactions at the, financial institutions don't work, it's really high priority, and we basically deal with it almost immediately.
And then I remember the issue which we found a couple days later in our in the data analytics stack. We didn't resolve for it. Like, I think it was more than a week, and that was very frustrating. And I kinda looked at it as, like, it's it's about technology. So how so the the kinda the approach is so vastly different. And so that was kinda the trigger when I felt like there is something what has to be done and, essentially, it started sync. And, like, the the underlying, kind of main mission, which we which we felt about we we should we should go go for is to really bridge like, close this gap and bridge the the tooling from the engineering world and bring it to data analytics world.
And so, essentially, the what we're building towards is that we'll work with data teams who are powering business critical systems as much as engineers. And I was like, we look at we look back and, hopefully, like, almost, like, look at some of these experiences. It's like, yeah. This is almost, like, ridiculous. It it shouldn't happen. And we kinda treat building data systems to to the same rigor as as engineers. So that's kinda, like, the, like, broader picture, and there's a lot of kinda nuances and and technology solutions we have to build to to make that happen.
[00:05:26] Tobias Macey:
In the space of data reliability, data quality, data observability, that is a product category that's been growing in terms of overall investment and companies and solutions for the past 3 to 5 years. And a number of the vendors are focusing on different aspects of that problem space. Some of them are trying to do more of a horizontal view of it. Some of them are focused on point solutions. What are the capabilities that you saw as lacking in that overall ecosystem that you're trying to address, whether it's specific point solutions that we're missing or the overall experience for data teams. I'm just wondering why is it that you saw the need to add another data reliability, data engineering rigor product in some of the solutions that you're trying to provide to solve for that?
[00:06:23] Petr Janda:
So so I think it's about, like, looking back, about 2 years ago, there there really was just a couple of startups with I think all in in a very early stage. And I think at that point, almost everyone was focusing on the problem of detecting issues. So I was like, how can we discover that something actually went wrong? And to a degree, like, this is still, like, a core of of data observability solutions. Right? We have to kinda uncover that that something is not working in a data stack. But where I felt there is, like, lot of space to innovate and a lot of opportunity is actually what happens after that? So once we detect it, that certain table is missing data or that that certain test is is now failing because some business, validation is is not met.
What happens afterwards? Like, what is the workflow from the perspective of finding the right team to deal with that issue, assessing what's even the impact on the company? And is this even an issue we have to deal with right now, or is that something what can what can wait and be dealt with later? And then ultimately driving the whole resolution and bringing the system back to, normal operations and try that communicating with the with the rest of the company. I feel like there's, like, a range of solutions or range of problems which need to be solved. I would say to a degree, even in engineering, it's still, like, being solved even though, like, engineering is, of course, like, way ahead from the perspective of managing business critical systems and and and incident management, etcetera.
But I I believe there's just, like, so many problems to be solved. And ultimately, I especially in, like, today's economy, I kinda don't believe that, we could be a point solution, which this is 1 of these things. So we're very much focused on building a platform which goes end to end from helping customers even set up the right testing strategy from detecting issues and then all the way to the resolution and kind of the entire workflow.
[00:08:29] Tobias Macey:
To that point of incident management, on call management, incident resolution in the operational and infrastructure realm, that is a problem that has been very thoroughly explored. Obviously, there's always room for improvement, but it is something that is part of the default and assumed characteristic of a team who is operating infrastructure. For data teams, that is something that is coming to be more more widely accepted, more widely understood, but there isn't necessarily a clear playbook for any given team to know what constitutes that incident, what does it mean to be on call as a data engineer or as a data platform engineer, What does resolution look like? How do I think about SLAs?
Who do I notify? How often do I notify? I'm just wondering what are some of the ways that you see those concepts being mapped into the ecosystem of data engineering and data systems?
[00:09:28] Petr Janda:
So I guess, like, first first thing to say is that it's relatively new, especially when I when I think of, like, a a traditional data analytics team that that they even think about incident management. I think this is a good thing because it also means that that the team is probably powering something a lot more business critical than they used to before. Because, otherwise, what's the point of, doing incident management in the middle of, evening or night if if actually could have waited till the day after? So I think it's it's almost like I would say it's coming to the to the data platforms. In terms of what actually has to be done, I kinda believe that data systems should be looked at as software.
And in that case, I almost think there is very little that we should do differently than from the from what we already know from engineering. So in that sense, all the gonna process from even declaring an incident and escalations and and the the communication around handling the incidents and towards the rest of the business or customers, I think that should be almost identical. The the difference maybe here is in terms of the nature of the system where if you look at typical data platform in a company, I think 1 very defining factor is that that platform is integrated to almost every system in the business. So that means that you might be dealing with hundreds of sources of data and a 100 of use cases on on top of this data, and the platform sits somewhere in the middle.
When things break, you don't even know if it's inside of the platform or if it's somewhere upstream. And so where we also focus with with Xing is helping understand, like, what is actually going on and what's happening. And I think this is critical part of incident management is when something triggers, how do I go from under like, that that moment to understand this is a business impacting issue that we really have to deal with right now versus later, that I think is sufficiently different in, like, when you look at the traditional software system and when you look at data system, mainly due to the nature of these, like, very rich dependencies which go across the company. But the workflow which follows after that, I think, should be largely similar.
[00:11:58] Tobias Macey:
To the point of incident management, SLAs in particular, and what it means to be on call for a given company, only a portion of that is a technical problem. A lot of it is organizational and cultural. And I'm wondering how you see sync playing a role in that corporate and organizational and cultural transformation that is necessary to build that competency as a team that does incident management, incident resolution where the uptime of the data and the analytics systems is the maybe not the core focus, but a core focus?
[00:12:38] Petr Janda:
Yeah. So this is really, I think, 1 of the the toughest thing to solve for for many data teams. It's almost like how do we do that transition, which definitely is to larger degree also cultural. 1 of the, again, anecdotes I always go back to is when I look at my time in Pleo and then, eventually seeing data teams building a lot of complex systems on top of our production systems data. And there was that point where I took a lineage of that system and put it on the screen in front of my engineering management. I I just saw how surprised almost all of them were that this thing even existed.
And so, like, from that point, I realized that, like, really big part of solving this challenge is actually increasing the transparency across the company of what is actually happening. Because in a way, I I think there is sometimes, like, this bad reputation that almost like, engineers are breaking data and that that that's bad. And I think that that's definitely happening and it's true. And they also have their own kinda agenda in terms of building a product and road maps and very tight deadlines. But I fundamentally believe that if you tell to that engineer before they push code of if you do this, this very critical thing in the business will break. They'll not just say, like, yeah. Don't care. I'm gonna press the button and go. And so so I have my kind of work done. So I think, like, even helping map, again, the what is actually critical in the company and in the data stack, to me, is, like, something what the tooling can help with.
And then once that's mapped, how do I transparently communicate that across the company in the right moments and in the right workloads? So I think there's, like that's where the tooling can really help to do all that work, which otherwise is really hard, where I saw teams in the time of, let's say, outage where they went into, some sort of, lineage solution, and they went model by model to find who is the owner. And then, like, it took really, like, half an hour just to assemble the list date of people they should talk to versus things like that just being completely available with 1 click. And you you almost, like, have all this information to understand what's happening in the company, readily available. I think that's that's definitely 1 1 aspect, and and I think it's all around, like, working on implementing the solution, but putting it in the hands of people so they actually use them.
[00:15:14] Tobias Macey:
On that point of incident management, uptime for data teams, on call rotations, the broader question is, what does it mean for a component or an overall analytics system to be down? When does it merit being paged in the middle of the night versus waiting until business hours? What does an on call rotation for a data engineer or a data platform team to be on an on call rotation? Like, just wondering some of the ways that you're seeing teams tackle that set of questions and some of the ways that you think about that as a product that is trying to support them in facilitating that functionality?
[00:15:56] Petr Janda:
Yeah. So I think, like, 1 of the 1 of the ways how we look at data systems, at at our customers, but in general, is that, like, the key thing we really wanna help them with first is to, like, code codify which parts of the data stack drive a business critical use case. So 1 example could be you might be automating your advertising bidding based on customer lifetime model. Customer lifetime models typically have a lot of inputs, which means they would be pulling data from a lot of parts of the company. And so I'd argue that that that model is 1 of your critical data products.
And I think, like, the first thing is it's it's even helpful to codify it into platform like Xing to say that thing is a critical data product. If it's if it's somehow affected by an issue, this is a p 1 issue, and it has to be escalated accordingly. And then when something happens anywhere in the data stack, which might have impact on this p 1 data product, the person who gets that alert, which might be on completely different side of the company, will get an alert, which is not just saying here is a log record of an issue which happens, but it also automatically does the assessment of what's actually downstream from here. And could this type of issue have an impact on on that, customer lifetime lifetime value model.
And so I think the the the whole challenge is from, again, like, understanding, like, what is critical, which will ultimately help me answer the question of am I looking at the failure which might be bringing down this critical component, or am I looking at the test failure which happens on a model which is just being created yesterday and no one's really using for anything in production? So, like, even differentiating these 2 alerts is must be helpful from the perspective of if I look at this without knowing the data stick by heart, I might not really know the difference. And then in terms of, like, the the actually being on call in data teams, I think this I see this varies a lot across companies I see, where I actually see a fair bit of companies who are actually running, like, the almost, like, engineering, like, incident management systems when there is a escalation to the point that someone gets woken up at night if if if something's broken.
But the most common I see is this kinda, like, in hours on call rotation, which is typically called, like, goalie or some sort of, like, person appointed for that week to be, like, a first first level diagnostics for all the issues coming to the team. And I think this is, like, a good approach in smaller organizations. When I see larger teams with many, and the data teams and many teams contributing to data stack. I tried to almost say, do we really need that? Or can we route the alerts automatically to the relevant people directly? So it actually, in some companies, we managed to reduce that role and almost, like, you know, make the relevant person aware of the issue immediately.
But, again, like, it's it's something what's still coming to the experience. I think everyone is learning. I think it really then should be actually tailored to the business. So I'm I'm kind of reluctant to say there is the right way to do it, and I think every business has to assess it. And I think that the best way I always look at it is is kinda going backwards from this kinda critical systems and understanding, like, okay. If this breaks, do we need to solve it at 2 AM, or does it wait till till the morning? And that's kinda the decision, like, I think every business has to make on their own.
[00:19:51] Tobias Macey:
In the operational realm, there's the analogous use case of different services have different levels of criticality. So you've got different gradations of how severe are particular outages, which maps to the idea of page me in the middle of the night. I don't care what I'm doing to I don't care about this unless it's during business hours, and I can take my time with it. And given your observability functionality that you have in sync and the core focus on data products and data assets versus the individual table or individual pipeline approach. I'm wondering how you see that shift the thinking in data teams around how to try to map those different products to that level of priority of, oh, this product is something that is used every day.
The observability data supports that. Or based on the observability, I can see that is a quarterly report that doesn't matter unless it's the, you know, close of quarter at the end of the month, in which case I do need to address it immediately into some of the ways that that asset centric focus shifts the ways that data teams approach their work.
[00:21:01] Petr Janda:
So, like, the this is quite interesting in a way that even from the point where we started the company, we was, like, always thought and we built the entire system around this notion of data assets, which means that we purposely didn't wanna build a system which, let's say, revolves around tables. And that's, I think, before, solutions like DBT and and and modern data stake and the analytics engineering workflows, I think it was right that, like, a lot of the data stake revolved around tables. But now we have tables and metrics and models, and now we're talking about data products, which is a little bit overloaded term in terms of what it really is, and I think everyone has their own version. But, ultimately, in my mind, it goes back to, again, defining the critical parts of the data stack. I think you mentioned the the concept of tiering, which I see, like, a lot in terms of companies figuring out how do I define which of our models are critical, like a p 1 or p 2, p 3.
By the way, a little bit, skeptical about some of the technical indicators. Right? Because sometimes you could look at observability data and see this is a lot of downstream dependencies. This looks important or there's a lot of queries happening. And then there could be 1 asset used by, like, a CFO for some really critical decision used, like, once a month, and that really shouldn't go wrong. And so, like, I always like to combine some of the technical indicators, but ultimately have the customer say that thing over here is really critical. And that's exactly what we did with the data products. And that's, to me, in a way, the way how we also build it technology wise, we can create data products from group of dashboards or group of models or set of metrics. It doesn't really matter, but it was always around almost, like, ability to let our customer express this group of things is important. The these things together have some sort of meaning which goes beyond their kind of physical manifestation in our data stack.
And we attach certain criticality to this data product. And then when things break, we wanna communicate that to to an engineer related to that. So I'm a really big fan of, you know, data product thinking, and I was, like, really defining it as a this is not, like, a outcome or, like, this is critical output which leads to some sort of outcome in the business. And and so I hope that almost, like, at some point, we will look at tables and it will be the same as, like, files in the containers in engineering system. It's like, yeah, it's there. It's doing its job. But we're not talking about files when we build systems. We're talking about the the system, which does something.
[00:23:57] Tobias Macey:
That's that's a good analogy. I like that idea of tables are just the files. We don't necessarily care about the tables in and of themselves. We only care about them insofar as they are useful for something else.
[00:24:07] Petr Janda:
Exactly. Yeah. Exactly.
[00:24:09] Tobias Macey:
The other interesting aspect of treating these data assets as a product and something that is consumed and relied on by the overall business is that it accentuates the fact that data is a team sport, and it's not just you as a data engineer doing table transformations and pipeline management. It is your efforts are in this broader context of the overall purpose of the organization and how it's going to be used. And I'm wondering how you are seeing that change the ways that data teams approach their work, both technically, but also, more importantly, organizationally, and some of the ways that the rest of the organization is being brought into the work that's being done for that data engineering, data product definitions, how you see data governance may become more to the fore because of the fact that it is a collaborative and cross functional problem and not something that is purely technical.
[00:25:08] Petr Janda:
So I wanna ship actually was 1 of the first thing we focused on solving when we started sync. And, again, reason for that was I've seen exactly this problem where I even as a leader of both side of technology organization, I I realized that it's really hard for me and and for all the teams to even communicate and understand, you know, what's happening across this, increasingly complex data stack. And so especially when you start to see, like, multiple analytics teams, some central engineering team, dozens of engineering teams which are producing data, and commercial teams which are producing data, it becomes really opaque as the as the ecosystem.
And so to me, like, solving for ownership as a concept across this whole structure is really important. And and almost without it, any observability will be almost, like, not actionable. Because, like, if I don't know who the owner is or how other owners are impacted by issues, then how can I really, help this organization solve them? And so to me, the I think, luckily, a lot of companies are realizing that, and and you see a lot of lot of different, different, approaches where, you know, 1 way or another, companies are starting to define who the owner is.
The biggest problem I've seen and still to a degree I see across the industry is that that information about ownership is is bit, let's say, not actionable. So so it could be anything from we maybe are tagging our models in something like DBT, But also, I've seen versions where there is a spreadsheet of saying that that these folders in this project are owned by the team over here, which is probably good as like, to some degree to take on a very high level, but I think this is really hard to action. And so the way how we approached it is that our goal was to bring this to the to the, like, the the the path of solving issues is that this ownership is projected across the entire platform.
And, of course, part of it is understanding that if something happens, the right owner is the first 1 to be notified. But then telling that owner, well, based on on on this issue, we see that these are the teams which are which are owners of assets, which are downstream from this failure. So to me, like, putting on once, like, ownership to the to the concept of lineage, that means that I'm starting to look at observability system from, like, dependencies between teams rather than dependencies between tables or or data assets. So I think that's 1 way we definitely like to work with ownership that we use it as a as a almost like a map of like, we're layering the teams on the data assets.
And then as we do impact assessment or or, like, some different queries across the system, the ownership is always part of it, if that makes sense.
[00:28:19] Tobias Macey:
Digging more into the sync product itself and the technical details of how it integrates with the data systems and the organization. Some of the things that are coming to mind are the lineage tracking and observability that you get from hooking into the data warehouse, looking at the table transformation logs to see what came from where. It also brings to mind the idea of these metadata platforms where you have a cross cutting view of all the different ways that data is being transferred across different system boundaries. And I'm wondering if you can just describe a bit about how sync itself is designed and implemented and the integration points that it has into a company's overall data suite?
[00:29:06] Petr Janda:
So so I guess, like, maybe 1 1 way to explain it would be would be to start at little bit high level where we've built sync around almost like a 3 key concepts where 1 was, which we already discussed, that everything in a data stack is modeled as assets. So whatever if it's dbt model, data warehouse table, BI dashboard, ETL pipeline, all of that is, to us, an asset. Then the second pillar is that we model the relationships between these assets because that it's ultimately allows us to build almost like a map of the entire ecosystem. And the third concept is, something we call executions, which means that every of these assets is doing something, whether it's, transforming data or creating a table or query which is going in front of the user.
And if you think about these 3 concepts and you built your entire experience on top of it, then building integrations becomes a lot easier because for us, like, everything is first class citizen. So whether if it's a warehouse or transaction database or transformation tool, we can model everything into these concepts. And so the first thing we've done, we we focus on on the the heart of data platforms, which is the data warehousing. We invested heavily on into dbt. Reason is obvious that it's becoming almost like a standard for data transformations, which means that we see a lot of teams using it. And then we started expanding the the the coverage into BI world. We're now working a lot on APIs and a push towards, let's say, the data sources and the ways how we can tap into ecosystem, which is upstream of warehouses.
And from the perspective of what does it mean in in terms of integration from a customer perspective, we very much believe that a lot of this should work out of the box with, like, minimal configuration. So for most of the systems we're building, most like the off the shelf connector where what we really need is the, access credentials and a security review. Let's say so we can actually go and connect to these systems. But from that point, everything is automated. And then we felt that in order to build the best possible solution on the market, we should do some kind of strategic investments in the infrastructure level, which means that we, for example, build our own parser for SQL, which understands, lot of, kinda, specific dialects of different warehouses. We understand where exactly in the files the logic for different column is. So that ultimately allows us to build, like, new experiences where we're, for example, starting to blend workflows across lineage and code into a lot more unified experience, which which I at least haven't seen on the market.
And so, like, the the goal really here is to make sure that many of these solutions are automated. And then on top of this, we're building some of the kinda unique capabilities, which ultimately, allows us to go deeper and help the the practitioners to back their systems, to the next level.
[00:32:29] Tobias Macey:
In your work on sync from when you first started working on it to where you are today and working with some of your early customers, what are some of the ways that the scope and goals of the product have changed since you first started working on it?
[00:32:45] Petr Janda:
I think the the hard truth is that they keep expanding. So so we definitely started with the with the approach of potentially being a more point solution. But as you know, the the the market is a bit more demanding nowadays. So I think the biggest change is that we keep kinda expanding the scope, from kinda initial focus on ownership and critical assets into now basically building the entire reliability platform, which has a component of observability as well, largely because that's it's not necessarily because we wanted to build data observability company, but more so because the market started to see us that way.
And so in order to to be competitive and and and win deals, we we have built a lot of that functionality. And I still believe there's a lot of ways to almost, like, redefine what observability is or where it can go. It's still relatively young, and then we're still talking about, like, monitoring and lineage and schema detections, which is all fine. It's, like, important features. But, ultimately, we're we're kinda thinking of, like, where does it expand next? So now that we have, let's say, solid foundation in the in the kind of data ecosystem, what would be the next thing which can be done on top of data observability, which is that kinda what we're thinking about as well.
[00:34:13] Tobias Macey:
For teams who are looking to bring sync into their ecosystem, they want to start using that for all the features that we've been discussing. I'm wondering if you can talk through the workflow of actually getting it set up, starting the onboarding process, and given the breadth of functionality that it supports, maybe what is the first entry point that you see as being either most common or most effective for that broader adoption?
[00:34:41] Petr Janda:
So to a degree, this this really depends on on customer kind of pain point, and we are still at the stage where we we love to work with our customers. So, the the the first step is to get in touch with us. And from that perspective, there really are few different avenues. And because of the the depth of the platform, there are now in a different ways we can lead with the different functionality, if that makes sense. And so I think 1 of the examples of use cases is, are companies who the biggest challenge is is the detection of issues. 11 1 kinda type of companies is typically teams who have to work with 3rd party data. Right? So they they are ingesting data from even different companies, and they don't really have control over testing them. So in that case, the the leading functionality is is anomaly monitoring, which means that we integrate their data warehouse. We discuss where actually the critical aspects are, and then eventually deploy a set of monitors which are, let's say, fitting the the, let's say, the type of issues which might happen with with the data.
In other companies, it's a lot more around uncovering a structure of the system. So we have 1 customer who has multiple DBT projects. Now we're working with the customer, who has multiple SQL mesh projects. And for both of these, the first goal to was to, you know, like, understand end to end picture of this this ecosystem. So so, again, in this case, it's all around, like, lineage and impact assessment of of issues which might happen across these systems. And, again, the the the first step always is integrating into into the data platforms and then, potentially, in this case, onboarding through the use cases around around, like, uncovering the structure of the data stack through lineage, for example.
So it really depends. Of course, like, the the the the common step here is integration to the data platform, but that's also why we've really invested in making these connectors work off the shelf. So there's, like, not really much, if at all, like, manual work in terms of, like, tagging assets, etcetera. This this kinda is all picked up automatically.
[00:37:07] Tobias Macey:
And once a team has sync deployed, integrated, they have all of their assets modeled. They understand what are the data products, who are the end users. I'm wondering if you can talk through what a workflow or an incident resolution process looks like with sync as the hub of that activity.
[00:37:29] Petr Janda:
So 1 of the the so so once we finish the the base level integration, we now have cross system lineage. We might have, like, basic level of monitors deployed. The next typical step is to codify a few of these additional concepts. So 1 of them is codifying ownership. We do that in many ways. The most typical 1 is that we work with metadata from, DBT where if the team already defined Auris in DBT, we simply lift that metadata and and set, let's say, mirror of the structure inside of sync. In other teams, this could be done by specific data sets or folders in the weekly project, so we have a lot of concepts how to do that. Second 1 is setting up data products. So, again, in in a typical deployment, it starts with, like, handful of data products which we focus on. That's where we kinda start the deployment.
And then every customer takes it from their different ways. We have customers with a 100 or so data products because they actually wanted to make them more granular. Some of them stay in the range of, handful. So so it's it really depends. And and so once we have data products defined, we have ownership defined, we typically define alerting, which is kind of mapped to these owners. We can then run, like, more, like, let's say, more powerful incident management. And what that means is that a a typical process is something fails in a data stack, could be either test or could be a monitor, which could come from other system like DBT or it could be RO.
This essentially triggers what we call an issue, which means that issue is, something that recognizes that something failed, but it's not yet clear how critical that is. And so our issues end up, first of all, alerted into business systems like Slack and Microsoft Teams or email. But then we also bring list of all issues into a view we call triage. So you can think of it as this is a list of things which failed. And in this triage, we're giving the person who is responsible to go through that list as much as context as possible in order to quickly assess, is this business critical? Yes or no. Are there critical products impacted? Yes or no. Is there a team which, is important impacted? So all of that information is completely collected into 1 single screen, or someone can go through the list and triage the issues.
In some cases, the decision could be no action needed. This is gonna be, let's say, quietly fixed in the next build. In other case, this could be declared an incident. And once an incident is declared from a subset of issues, this triggers the the incident management workflow. And we were thinking a lot where the boundary between sync and more traditional incident management workflow tools is. And we've decided that essentially post incident declaration, we still wanna give the the data practitioner this kinda single page view where you see the lineage of assets in incident, the list of all the issues which are included, list of teams, list of products, all basically the the kinda impact assessment on 1 screen.
But ultimately, at this point, the incident is linked to external incident management system where, let's say, the the incident management system where, let's say, the the traditional incident management process could happen. In some teams, this is simply a Jira ticket to to deal with. In other teams, this is a PagerDuty or or Opsgenie or a system like that where they actually manage, let's say, wider incidents also from engineering. And so what we really wanna be is that bridge. But the critical thing which we've built is that concept of promoting an issue to an incident. Because what this actually does is that it allows teams to almost, like, reason about all of their, let's say, quality from both perspectives.
1 is on the issue level, which is saying, okay. We have this asset which is firing a lot of issues at us, and maybe we have to do something about it. But also the second level is we have declared some business impact in incidents, which are typically originating from somewhere in the data stack. And this is also very important from analytics perspective for governance because I think that if teams report on issues, it's almost almost like a very negative picture they might be creating. Because to me, it's the same as if engineering would be reporting on every single issue which happens in that system.
But I almost can guarantee that in any sufficiently large system, there is something failing all the time. But most of the time, it either recovers in few minutes or almost immediately. In some cases, it could be left like that because it's not critical. And so creating this obstruction where you differentiate issues and incidents is really important, and that's really, like, a big part of our incident management process in sync. And then again, the the the actual incident is handed over to to the tools which are designed for this, like, PagerDuty, incident IO, etcetera.
[00:42:55] Tobias Macey:
And in your work of building sync, working with these different data teams to understand how they think about their roles, how they fit in the broader organization, how to get everybody working in the same direction. What are some of the most interesting or innovative or unexpected ways that you've seen the sync platform used?
[00:43:14] Petr Janda:
Yeah. So the I I don't know if I have, like, 1 big anecdote, but it I I almost, like, like to be surprised by our customers every now and then, where it's almost like a privilege to have engineers as, as customers because they are creative. So so there's a couple things which come to mind. I think 1 of them was, like, a great surprise where it's it now goes back couple couple months ago where we had 1 of the engineers who connected their warehouse into 2 separate dbt projects. And we didn't even realize that it's gonna work out of the box, where essentially because we resolved lineage from 1 1 project to warehouse, and then from warehouse to another project, it just worked out of the box. So there was definitely, like, surprise that we were very happy with. And other examples are things like, I guess, 1 internal tool we have, which I never thought we would have, exposed to customers, is that we've built almost like a query engine for assets where you can do a query such as find all the dashboards and then find all the assets which are upstream of these dashboards, which are also of a type of, DBT source and have a certain tag.
And we built this internally because we wanted to express some of the concepts which are very hard to do in the UI and through, like, a drop down boxes. But it's some information that it existed somehow leaked to our customers, and they started to write these queries and and, like, build some of the functionality, which, I haven't thought of. So we can use these rules to deploy monitors. So we had the customer who said, deploy monitors on all, DBT sources, but only if there is a certain type of the models downstream. And so that basically, kinda used some of the things which we never really designed in the 1st place. So that's always kinda good surprise. And in that sense, I realized that it's it's really fun to build almost, I guess, more sales let team because you get to work with customers very closely. So you get to uncover also a lot of these interesting use cases they find, themselves.
[00:45:23] Tobias Macey:
And in your experience of building this business, building this product, operating in this ecosystem? What are some of the most interesting or unexpected or challenging lessons that you've learned personally?
[00:45:35] Petr Janda:
So for me, it would be, like, 2 things. 1 of them we just touched on, which is, sales is actually fun. And, you know, being 2 decades in engineering, it's maybe not the the most, most expected learning I would have, but I really realized it's fun to to work with teams across different companies. And and I realized, so this almost, like, just a different type of problem solving. So this is really fun. And the second 1 is almost, like, confirming that what we believed could be true around data ecosystem is actually doable, and that that relates to how we're actually building sync where under the hood of our technology at the very heart of the system, we have a data warehouse or we have data warehousing technology. We use ClickHouse.
And so what we've done is that we've built entire product around data platform, which means that when customers send data to us, the first thing which happens is it gets stored to ClickHouse, and then all sorts of processes kick off inside of ClickHouse and with our microservices built around it. But I really wanted to challenge that that aspect that that, you know, the the operational systems are built with Postgres and transaction databases, and then there is some other system which is focused on data. And so the learning, I guess, was that before that I never had a chance to do that. It was kind of like a theoretical thing which could be done. But I always had a team where you had a warehouse managed by data team and, operational system managed by engineers. And I felt that always that the, the barrier was a bit artificial. And so we kinda delivered on that where we put that data system at the heart of the company. We, of course, monitor it with sync itself.
So there was, like, really a lot of kind of problems we had to solve. But, ultimately, we have lifted kinda data platform to be operational system, and and that's exactly what we believe should happen across different teams. And maybe final point on this is that I didn't realize until building sync how powerful the concept of testing data actually could be. And the example which consistently comes to mind is that we ingest data from a lot of companies from about 30 different types of systems, such as, like, Looker, dbt, bunch of warehouses, etcetera.
And, again, we are very reliant on these data streams coming to us as a company. And you can't really test with unit tests, from the software perspective that all these streams are actually working. So we also run anomaly monitoring at the ingest of the data warehouse. And we had many, many cases where we notify the customer couple hours after an outage saying, hey. I think you misconfigured your Looker. We're not no longer ingesting data. And in in that way, we're testing data, but we're detecting issues, you know, almost like the operations of our actual product. And so maybe if I was if I ever went back to, running engineering team, I would maybe think about the the power of some of these kind of data testing techniques and how can be how can that be enriching to some of the different ways, engineers test their systems. So that's definitely something I discovered on the way.
[00:49:09] Tobias Macey:
For teams who are trying to get a better handle on their data systems and maybe they're interested in incident monitoring, maybe not, I'm just wondering what are the cases where where sync is the wrong choice?
[00:49:23] Petr Janda:
So I think that if I even look at our deals where we we didn't, we didn't close, I think the the the common pattern is that we realized that that there just wasn't such a need to to bring that engineering rigor or that reliability rigor into the data stack. And so I actually have this kinda qualifying question for for for many companies we speak with, which focused on, like, what what is the most critical use case for data in the company? And I was, like, trying to understand, like, where the team is on the basis of that. And I even think, like, this is very good questions question that data team or data leaders should ask themselves.
Like, what is the most critical thing I am powering in the company? Because that kinda is a good proxy, let's let's say, to the value you provide to the business. And so we basically need to work with the teams where that that question is answered well because we're ultimately helping them. So so we can't really go beyond that. So I saying that's definitely 1, which means that the teams who maybe haven't yet found these business critical use cases, we might end up being a nice to have, which might be too early. Maybe it comes later, but that's definitely the most common is that maybe the the the use cases for data are just aren't there.
[00:50:51] Tobias Macey:
And as you continue to build and invest in and scale the product and the business of sync, I'm just wondering what are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to explore?
[00:51:05] Petr Janda:
So I touched on that a little bit earlier in terms of there are kind of different areas where I think we can take this. And I think this also goes back to this this question of business use cases and business critical use cases where what I hope we will do is is that we will bring the the data reliability problem closer to the business. What that means is that maybe we break a little bit away from being technical data tool, which is focused on detecting issues in data tables and gradually get closer into a tool, which is your platform that is underpinning business workflows.
And so we have examples of this where we've been asked by our customers to send various different alerts into non data teams. And so I really like this trend where where, for example, work with the financial institution where we where we are detecting issues inside of the data warehouse. But we're, let's say, surfacing these issues back in front of operational teams and compliance or in, in, treasury even where we are essentially becoming part of the the critical business workflows. And so I hope maybe that, again, like we discussed that maybe tables are becoming files. And in the same sense, the the data will be still, like, technology anchor of the solution.
But, really, we will gradually talk a lot more around, like, we're solving for reliable business processes, which might or might not be just data team's problem. It might actually be involving wider company. And I think this is super important because the data doesn't originate or doesn't end in the data platform. So it's really it would be really a leap to to make data quality or reliability a problem of the wider company. And and I think there is this lot of new solutions which have to be built in order to make this topic engaging to non data, more operational teams, or maybe they don't necessarily, want to see, like, a deep technical alerts. But they might see they might wanna see very specific type of alerts such as we've detected these issues with your sales data. Go here to the Salesforce and fix it in this and this way. You know? And then suddenly, it's not really data reliability. It's just we're working with the business people and helping them kinda run their business.
[00:53:42] Tobias Macey:
Yeah. Yeah. That's a very interesting point as well as the data ecosystem and the organizational data economy becomes cyclical with things like reverse CTL or operationalizing your data, however you wanna term it, where the data doesn't end in the warehouse. It gets fed back into those operational systems or application systems and then fed back into the data warehouse and enriched, etcetera. And so the fact that you Exactly. Looking at reaching out into those other systems to understand what is the downstream impact of this transformation that I'm making that is going to then feed back into the Salesforce or the HubSpot or the operational application that is feeding back into the warehouse and brings it more back to that cycle of it being a team sport and not a, solo, event.
[00:54:29] Petr Janda:
That's exactly yeah. Like, the the point is to to how can we bring non data teams into the into the mix? Because I think that's where the really, really meaningful change could happen.
[00:54:40] Tobias Macey:
Absolutely. Are there any other aspects of the work that you're doing at sync or this overall concept of incident management, data being a team sport, the organizational transformation involved in bringing data into these business critical use cases that we didn't discuss yet that you'd like to cover before we close out the show?
[00:55:01] Petr Janda:
Yeah. So I guess, like, the like, maybe 1 1, like, a parting thought I have is that 1 of the biggest learnings I have goes back to this this kind of thinking of how we built sync, and I really hope that what we will see is is a lot more of was, like, data platforms and data teams being much much more integrated into, like, wider business processes. This could be anything from, like, data getting much closer to engineering and actually powering a lot more user facing systems, or the notion is is you just outlined in terms of data powering more in a business critical systems. But I guess what I find would be a shame is that if we come if if data practitioners and data teams stay in this in this world of, like, we're we're maintaining company reporting, and and we don't wanna be woken up at night. We don't want really wanna be dealing with all that.
Because I ultimately think this is, like, hindering the value or, like, the potential of the the data in the company. And so I really hope that we will see more data teams going into this business critical world. Of course, there are solution, I guess, and others which will help them to, you know, have the right tooling to play in that, critically. But I also think that's where there's just so much potential to to use data in a new ways in front of customers or in in in in very business critical systems where, it just needs, you know, the the that that higher rigor and higher focus and reliability. So that's what I'm really excited about. I see that happening in companies, and I hope this just happens more and more because that's ultimately where the kind of power of data driven business really is.
[00:56:52] Tobias Macey:
Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you you and your team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:57:09] Petr Janda:
Yeah. So I I guess my my point goes back to I think there is enough of technology to do the right things, and and the biggest gap I see is the mindset. So I really hope we see a lot more integration of data teams to the wider organization. I also, by the way, think this goes both ways. So I still think there's a lot of engineers who maybe don't think about data and analytics maybe to the extent that they they maybe should. And so I hope to see a lot more teams, like, being cross functional in that way where I remember how we were deleting, this or how we were removing boundaries between front end and back end and infra teams, and we created this cross functional units.
And I wonder if we should do the same with data where we kinda integrate the organizational structure. We integrate data platforms with engineering platforms into wider technology platforms. And all of that to me sounds like is the biggest kind of barrier. So I think it's less about kind of technology because we have a great storage processing. Now observability, cataloging, all of these solutions, I think, are sufficient. But it's the mindset of kinda seeing data as this kinda, like, this is the thing on the side. It's not, like, operational thing. I think that's the biggest gap, which I think if if that's solved, that's gonna really opening up the potential of where data can
[00:58:40] Tobias Macey:
go. Alright. Well, thank you very much for taking the time today to join me and share the work that you and your team are doing on sync and the ways that you're thinking about the cross functional aspects of data and how it impacts the organization and the broader business case. So appreciate the time and energy that you folks are putting into that, and I hope you enjoy the rest of your day.
[00:58:59] Petr Janda:
Thanks for having me.
[00:59:07] Tobias Macey:
Thank you for listening. Don't forget to check out our other shows, podcast.init, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com. Subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction to the Data Engineering Podcast
Interview with Petr Yanda: Background and Experience
Overview of Sync and Its Mission
Challenges in Data Reliability and Observability
Incident Management and On-Call Strategies for Data Teams
Criticality of Data Assets and Incident Prioritization
Data Products and Asset-Centric Focus
Technical Details of Sync and Integration Points
Evolution of Sync and Market Demands
Onboarding and Workflow with Sync
Incident Resolution Process with Sync
Innovative Uses and Lessons Learned
When Sync Might Not Be the Right Choice
Future Plans and Exciting Projects
Final Thoughts and Closing Remarks