X-Ray Vision For Your Flink Stream Processing With Datorios

Hello, and welcome to the data engineering podcast, the show about modern data management.

This episode is supported by CodeCommence,

an original podcast from Red Hat. As someone who listens to the data engineering podcast, you know that the road from tool selection to production readiness is anything but smooth or straight.

In code comments, host Jamie Parker, Red Hatter and experienced engineer, shares the journey of technologists from across the industry and their hard won lessons in implementing new technologies.

I listened to the recent episode, transforming your database, and appreciated the valuable advice on how to approach the selection and integration of new databases and applications into the impact on team dynamics.

There are 3 seasons of great episodes and new ones landing everywhere you listen to podcasts. Search for Code Comments in your podcast player or go to data engineering podcast dot com/codecomments

today to subscribe.

My thanks to the team at Code Comments for their support. Your host is Tobias Macy, and today I'm interviewing Ronan Korman and Stav Elkayem about pulling back the curtain on your real time data streams by bringing intuitive observability to Flink.

So, Ronan, can you start by introducing yourself?

Yeah. So I'm Ronan. I'm 53,

married with 3 wonderful children,

and I really, really like technology

from all kinds.

Yeah. Occupied with thinking, learning technology,

and amazed of how this market is, you know, just moving us and intriguing us.

Yeah. And data is 1 of the things that, I really love

both from my background. I will tell about it, in a second, but

data is something very serious.

And, Staab, how about yourself?

Yeah. So

my name is stab. I'm the vice president of marketing here at the Dorys. My background,

starting with

technical

technical go to market

and the stars b to b focused companies.

And yeah, here at the Datorio's

started out as the 10th

employee, I think,

right about right about the beginning. And

just glad to be here.

And going back to you, Rana, do you remember how you first got started working in data?

Data, you know, comes back. I've been serving in the Israeli security agencies

for about 30 years,

and I'm doing collecting data and analyzing data

for a very long time,

data was actually our life.

And I think that, if normal data

was the the bread,

real time data was the the butter.

So if with data, you just

could know things.

With real time data, you could react to things.

And I think that that was the way of, you know,

actually understanding the power of data and the differences between

the kinds of data,

but also the challenges with data. Real time data is very fragile,

very hard to handle.

So you understand, you know, how to handle that data,

in operational

missions. And I think that the other things is that

we had to change the data used from day to day to

day basis.

I think it's kind of the equivalent of being relevant

in the market.

And you usually think of data pipelines as something that you do once and they never change.

But we had to develop

systems and way of thinkings and tools

in order to adapt

the data that we have into a constantly changing

missions

to build data products, new data products that change

sometimes daily in order to meet the demand for that specific mission. And I think that's that's the background

of, you know, getting into that data world,

but it has a lot of influence about

the foundation on which Datorios were built.

And, Stav, do you remember how you got started working in data?

Yeah. I think for me, I was always curious about

how to get things better,

and how to get them better fast.

And I think that's that pretty much summarize the real aim of data from a business and marketing perspective.

We are and when I say we, I mean, the the the business side of the

the the company, the the organization.

We're huge data consumers.

Usually,

when,

when a pain,

kinda even it hits in the mails when someone

feels this, initial challenge of,

how to get over something

as fast as possible. This is where they usually come across data challenges, and these are heavily late. And,

by that, I mean,

I felt

like, like an observer

regarding the date the data domain,

for quite a long time. So for me joining

a company

that advocates and promotes especially real time data,

was very much exciting.

And so

digging into Detoreos, I'm wondering if you can just start by giving an overview about what it is and some of the story behind how it got started and why it is that you decided to invest your time and energy on this problem.

Yeah. So I started it.

Deciding to deal with real time data was easy.

We appreciated the

real time data, and the founding team

was, you know, totally aligned that we are going to do something with real time data.

We were amazed

how many times in the early days we heard the the phrase, we don't do re real time data.

So the we saw a market that is

vastly laid back

of, you know, not even doing real time data, but understanding the power

of real time data. And that, you know, gave us a huge

push

to to deal with that,

domain.

And the other thing I I I talked about is the ability to make

development

and life cycles around data faster.

So, actually, Datoyo started as a company that developed

real time processing engine.

And on top of it, an observability solution that combine the ability to process written data,

but to give that development experience on top of that.

We did that for a time. And as you know,

we're doing our things.

We saw that the real time data started materializing,

and that's where Flink started emerging,

and we saw that it's actually becoming the de facto solution for

processing real time data. And we made the decision to stop with the developing our own processing engine

and to use the Apache Flink as our core processing engine and but continue to to do the observability solution

on top of it. And I think that

looking back, it was a huge,

very important decision.

We see Flink

as a a major

building block in real time data,

but also

in,

real time AI. I think it's

the ability to process data

to that, you know, models in real time. That's the whole thing, and Flink is very strong dial there. And, yeah, and we we focus around observability

around that to make that, you know, experience, visibility,

quality

much more fluent and, you know, more productive

and better quality.

And you mentioned data observability

in particular in the context of Flink. And data observability

in general has been gaining a lot of attention and adoption

for the past few years.

Most of that has been focused on the batch processing, data warehouse side of things,

and to some extent,

data pipelining, data orchestration. And I'm wondering what you see as some of the unique challenges that are posed by streaming data in general and Flink in particular.

Yeah. That's a great question.

I I think the problem with Flink with Flink is that it's a black box.

There's an amazing

contrast between, you know, its processing powers

and the ability to understand what is going under the hood.

This creates a huge barrier

for developers to, you know,

really understand

what

is going on with the development and the result of the development. And and as the pipeline become more complex, that complexity, you know, grows very very fast, fast app.

Now let's take 1 of the Flink's

major powers, its state of processings.

Flinks actually hide state from the user. So you're you're taking the the most powerful thing in, say, in Flink,

but you can't actually, you know, understand the bag, the the pollution

of state inside that engine. So that's a a little, example. And I think that there are other problems with Flink when these other distributed

solutions

that it becomes

very complex to understand

when things start to go sideways.

Let's take, for example, a

situation when you're starting to use

your memory consumption goes up or your throughput has problems.

So it can be, you know, the your clusters need more resources,

or it can be your code is lagging back or you have some kind of bug in your code,

or your state handling and your checkpoint

building and you don't have an end to end ability to observe

Flink from all directions,

it is becoming very, very hard to pinpoint that problem and solve it, you know, when the sirens start

shouting.

And in production, it means downtime.

So that's, you know, the the drawbacks of fling

and the importance of, flexibility.

Complexity

comes with the need of, observability.

I think another layer is

data quality.

I think that

quality comes with context.

And if you can't see

and understand the context of your data,

you can't

produce high quality

data products.

I think it's relevant to

any data Python, any data solution,

but let's talk before about AI. I think that AI is very very sensitive to

missed context

on missed quality.

And when putting fling in AI solutions,

I think the ability to see the context of the data and analyzing where it starts shifting

is very very important and we try not to be there and give that solution to those problems.

You mentioned some of the complexity

in Flink specifically because of the way that it manages statefulness.

I'm wondering too how much of the challenge is purely because of the fact that you're dealing with unbounded streams of data rather than being able to treat everything as an isolated context as a single batch and that you have to have this continuous

view of data and

monitor its quality and understand the ways that it's changing and being mutated?

Yeah. So

it's both.

First of all, you know, the the complexity of the infrastructure is 1 thing I talked about it. But I think that dealing with real time arm and data is like dealing with, you know,

doing mechanical

adjustments to a running train.

You know, with best processing, you can stop the data or rerun the data. The data will be the same. You can continue doing that as many times as you want. With real time data, you can, you know, you can catch the data, retain data, and rerun it, but the data doesn't stop flowing.

So

that builds the complexity

of, you know, lagging back with processing,

especially in production

when something goes wrong, okay, you can deal with the problem,

but your older building factor leg data that you didn't treat in the meantime, and that builds the need, you know, to fix the problem

and the ability to process all the data that piled up and goes back again to pinpointing the problem very fast, fixing it very fast in order to, you know, reach that running train

and, you know,

building relevance and, you know, accurate data all the time.

The other interesting aspect of data quality in the streaming context is that the specific measures of quality, I imagine, are different. Whereas

in a data warehouse, you can say, okay. Well, this is the distribution of data. These are the, you know, allowable ranges. You have an anomaly because this value is outside of that range,

but you're dealing with a static set of data. Whereas within streaming, a lot of the data, I'm sure, is issues of out of order, time stamps,

missing data, or late arriving data. And you say, oh, well, here's an anomaly because this time stamp is out of sequence, but it's actually still valid. And I need to figure out how to process that,

figuring out appropriate windowing of the data,

checkpointing of the data. I'm wondering if you can talk to some of the specific aspects of streaming data and some of the ways that that changes the measures of quality and the types of checks that you're monitoring for.

Yeah. I think I I think that the main you know, if you want to explain the difference,

it's like the the difference between riding a car and, flying a plane. There's another dimension.

You know? The height, the speed,

and I I think that

let's talk about breakpoints.

You know, the the usual the the the basic ability,

to debug your sessions.

But a breakpoint breaks the stream. It's a breakpoint.

If you take streaming data and with the breakpoint, you actually

disturb the flow of data. So the actual ability to debug the data

affects that that free flow and correct flow of data. Time that time window changes the way it will be with the break breakpoint and without the breakpoint.

So

I think that there's even you

know it's incomparable

ways of describing the problem between you know that kind of data

and that kind of data.

It goes back to the amount of data, the about of its unbowed that is continue flowing,

the dimension of window

and context because, you know, window is on lot of it's about time. It's sometimes about sessions and sessions have, as you said, the the

it's it's it's not the same it's the same session if 1 event came before the other or it was the other way around.

The the solution or the result

will be totally different. So it's very, very dedicated

and very, very influence of anything you do.

Yeah. And this this puts on the table huge, huge,

challenges.

Yeah.

Complex and, you know, exciting.

1 of the interesting aspects of streaming data and streaming systems is that for many years now, there have been advocates saying that it's really the only true way to work with data and that batch systems are really just a special case of streaming and that you're taking

snapshots.

But despite that, batch is still the predominant method of working with data because it's logically and conceptually simpler to deal with,

but you don't get a lot of the benefits that you have from real time. And I'm curious how the lack of visibility into the the flow of data in Flink in particular, but more generally too, has impacted the ways

that teams think about

when and where and how to apply streaming systems and whether to even attempt to adopt streaming because of this additional layer of complexity?

Yeah. That that's a great question.

I I think it's at the end of the day, when you're implementing

Flink in production,

you you put your your company's revenue and reputation on the line,

and you want to make sure that you get high quality data

at all time.

So being able to understand

what is happening,

having the feeling of control,

the recognition that you'd be able to solve problems when they occur, and they will occur. There's no system that hasn't got problems,

has huge impact on, on adoption.

And I think the lack of, observability to Flink slow the adoption rate, of Flink. I think it sometimes seems risky,

and I think that that's exactly what Datois comes to do to reduce that risk

and let more companies,

you know,

enjoy the power of,

real time data,

basically because it's a market master. I think that users won't accept an experience

that is not real time anymore. So there are there will be always, you know, solutions that batch are good enough,

but you can't imagine Uber, you know,

not watching that car coming to you. You can't imagine, you know, doing a transaction and not getting a real time feedback for that transaction.

So companies

will have to go to real time because of users' demands.

But I really, really understand the risk

they are, you know, potentially taking, and that's exactly what the tool has come to do, reduce that risk.

Data lakes are notoriously complex.

For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst is an end to end data lake has platform built on Trino, the query engine Apache Iceberg was designed

for. Starburst has complete support for all table formats, including Apache Iceberg, Hive, and Delta Lake. And Starburst is trusted by teams of all sizes, including Comcast and DoorDash.

Want to see Starburst in action? Go to data engineering podcast.com/starburst

today and get $500 in credits to try Starburst Galaxy, the easiest and fastest way to get started using Trino.

There's also a difference in

I don't know if it's if skill set is the right term for it, but you certainly have to have experience working in streaming systems to understand

the challenges that they present.

And I'm wondering how that too has impacted

the

general adoption of streaming as the default mode of operation

and some of the ways that you see that changing over the years as streaming becomes

more prevalent,

more,

demanded by customers and end users, but also as the technology has evolved to become more

operationally capable

and just the the general awareness of its capabilities and awareness of how to operate it has grown.

Yeah. I think in this in this manner, I think that something good is happening because there's a kind of evolution.

I think that there are there is maybe a 1, 000

ways to and to

to, to say what streaming data is.

Everybody is streaming data, but you ask somebody what is streaming data, you'll get 1, 000 answer to that question. And I think that is a kind of evolution of, you know,

doing batch processing in a more rapid way. After that, doing micro batches.

So I think that the market

and developers

started to understand

the potential and the risk and the the complexity of real time data step by step,

but not not doing the real time real data,

but doing, you know, real time processing

with other systems. And I think that in that manner,

the the

developers especially

are, you know, going up this the the the stack of understanding,

real time processing,

and I think that it then it goes back to the specific understanding of link itself,

but not, you know, the fundamentals of understanding the challenges of real time data or the conceptual understanding

of real time data. So I think that the market is building itself in a good way,

to adopt,

real time data. But, again, it's risky, and that risk needs to be, you know, reduced.

You mentioned too the

requirement for

streaming data in the context of real time AI systems.

AI has definitely been gaining

massive attention, if not necessarily adoption. And I'm wondering how the

data hungry aspect of AI and the types of applications that it's being applied to have increased the demand for streaming systems

and the role that Flink has been playing in that market?

Yeah. So

I think AI is moving very fast from offline application to real time applications.

We also the chat chat gpt for all demo

of, you know, showing if the king is in the palace or not, and that will be the standard. Nobody will use

models that were trained 2 years ago and not 1 hour ago.

So we will have to use we'll need to use,

AI applications

which are up to date to the second,

and that,

will need to have

a real time processing of data,

inside that,

AI infrastructure.

And I think that Flink is becoming a real,

you know, relevant building block in that, you know,

system

because of its processing abilities, before its processing

performance,

and before because of its flexibility of dealing

with

complexities around different kinds of data and use cases.

Yeah. And I think that Clink will be a major building block in, you know, building fresh data

for real time AI.

And now digging into

Dittorios itself, I'm wondering if you can talk to some of the design and implementation

of the product and some of the ways that the the scope and architecture have evolved from when you first started working on it.

Yeah. So

it started, like I said, by the post developing a processing engine and build that

dev experience and visibility

on top of it, and then we adopted Apache Flink and, you know, continued developing the observability

layer.

And we decided

not to do, you know, another managed

Flink solution.

I think it's a crowded market,

great players around it,

Confluent, Verrica, and others. And we decided

by market validations

to divide our implementation into 2 parts.

The first part is to be as close as possible

to the way the development and DevOps team already work.

We understood that companies won't change because Datoios came to the market.

So we implement

our solution in 2 parts. The first part is adapting ourselves to the existing

Flink environment and developing the environment that the company already has. So we take your Flink and adapt it to, you know, extract the data that Vittorius gives around data tracing, metrics,

and other observability solution.

We don't touch your,

development cycle. We don't touch your CICD.

You just, you know, do whatever you do whatever you did, before time, we adapt to that,

and we collect that data and metadata

and send it to a SaaS solution

where you consume

the observability

tool and investigation tools. So we are enjoying the power of cloud for fast moving and developing functionalities and features,

but aligned with you with your existing

way of developing

and, you know, debugging

and,

managing the production,

you know,

putting the lowest constraints as possible

on

you. So that's the the concept of doing that. And between that, we are taking concerns about, you know, PII

PII,

constraints, data security. We encrypt the data end to end.

So, you know, no data is exposed to the toius

with end to end encryption

and, you know, being, you know,

as much as possible as you as a industry standard

in way to handle data

and to lower the friction as, you know, as much as possible.

And then for

people who are using tutorials,

they have gone from, I'm running Flink. I have this deployment system. I'm using it, but I have questions to, okay, I've got tutorials

deployed and integrated. Now I can actually answer those questions. I'm wondering if you can talk to some of the ways that they're using that information, some of the types of questions that they're asking,

and some of the ways that

the fact that they can now get answers to those questions has led to deeper and more nuanced questions as they continue to

evolve their usage and adoption of Flink?

Yeah. I think we see that in, you know, the old cycle

of

code

development and, you know,

implementing,

it starts with development. When you develop

code,

you get immediate response

to the data layer of that code.

So when you run your code the first time, you see actual and immediate respond

of your data

in a visual way, and that's where the proven, you know, metrics to show that it shortens the

dev cycle by about 40%

and, you know, it ties the quality which shows you, you know,

you can always see and if you are bidding, you know, your code step by step, you validate the first step, you get the right data, continue that,

flow on the time, you get much farther much faster development cycles

and very high quality of development. So it starts with that. So now you need to, you know, debug the data and test the data because before it goes put to production.

And when you are running the test data

and staging data, you can actually see if the results are correct. If they are not correct, where is the problem?

Pinpoint it, go back to your code, and again,

shorten the the the test cycle and debugging cycle. And in production, we talked about it before.

When something goes wrong, it makes you very easy to pinpoint

the problem because we are aligning all metrics

around a common line, which is time.

Flink is real time. Time is the, you know, the the way that data flows,

and you can, you know, see that your CPU usage is very high. You can you know immediately see what data was processed in that in specific

time to see if it's a data problem or just you know lack of resources

so the user can get, you know, benefits

from all that life cycle of its data code and data pipeline

but by getting, you know, the same

visibility of data but for different questions.

So, yeah, I think that the the the ROI

of using

us is very high, you know, for the whole life cycle

of, developing.

And and I think that this way of thinking was it's it's quite new to us. We started by, you know, let's

make, development productivity higher.

But

we we talked with the community and they said, no. No. No. The the the big money is in production.

Development is very important. Testing and debugging is very important. You must touch

production too. And, yeah, we acted to that and added production,

support too. It's complex because there is a balance between

our

influence of performance

and the amount of verbosity we give. We have a nice solution of controlling that verbosity

according to different metrics that you put threshold,

through. Yeah. But we try to give benefits around usage of tutorials around

any aspect of developing data pipelines.

Detoreos, as you said, started its life as a custom built stream processing engine. You decided to shift focus and build on top of Flink.

Flink is definitely

the market leader in the streaming

ecosystem, but it's not the only player. And I'm wondering how you are thinking about the architecture of tutorials,

how much to invest in the specifics of Flink versus being able to generalize across other streaming engines if and when they start to gain

architecture and infrastructure of Dittorios to be able to expand to those use cases?

Yeah. That's amazing question because yesterday, we had the manager meeting, and we talked about it.

I think that, you know, it's an early stage

startup. You need to be focused.

And I think that right now, we are focused on doing the best observability solution for Flink,

But we always don't don't look behind that hill that we are talking, and we we want to,

in time expand

the observability

solution

to, you know, the the more,

downstream,

applications with real time data. We see

a lot of uses of observability

around lakehouse

like in Kaimon

and Iceberg and I think that having and we talked about context, having the context of data

not only, you know,

in data pythons alone, data processing alone and then in the WERs alone, but having the context of the end to end stream of data

will give the ability

to even

increase the ability

to produce

high quality data. So, yeah, the terse right now is doing observability for fling, but the terse will stay as the observability solution for real time data.

In terms of the

adoption and

use of tutorials, I'm wondering if you can talk to some of the most interesting or innovative or unexpected

understanding

of how more understanding of how it's operating?

I can say only 1 thing. We we are hearing people say, we never saw that.

People see things that they never saw,

and I think that tells all.

You know, we we are working with, you know, very,

very strong developers of Fling,

and they tell us, we know never saw that this window behaves like that.

You know, I think that says all. So if you're if you're adding an added value

to the user,

you are you are on the right, you are on the right course.

And in your work of building and growing the notorious product and the business, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?

The first 1 is validation, validation, validation.

I think that we tend to fall in love with what we think is the good idea,

but

in many cases,

it's not the right thing to do. And I think that I'm just listening to the the power of community users

and validating your thoughts.

It's, you know, the number 1, lesson,

I learned. Maybe staff staff can talk about it, but I think that

the market

challenges around

the users of, you know, developers

and data developers

is is massive.

It's not usually marketing.

And I think around that, the power of community,

community leaders,

you know, combining between

community leaders, data leaders, and marketing

is something that is very, very challenging.

Maybe start to

say a few words about that.

Yeah. Absolutely. Absolutely.

It's

first of all, going back to the the previous question, I must say that it wasn't much of a challenge here from the marketing side to come up with the

the reference for what we do. We just we just tell people who we do x-ray X-ray for it's like an X-ray machine for Flink, basically.

It gives you the the inside deep look of everything.

So that wasn't the challenge. I think what was challenging

is and and that was a very

solid assumption we had back back at the beginning of the project.

We knew that this is going to be a community

based

solution,

because the engines at the core

of Flink are, in our opinion, the people who build it. So we started out building these. We call that we call that yellow pages. So we started

out basically mapping

the entire,

the entire community around Fling going

from influencers

throughout committers and PMCs.

We've been reaching out to former

CEOs of known companies within the data domain. We've we've been reaching out to hundreds

of people within the Sling community

before we started

specking

this solution.

So

we had,

in my opinion, over

3 hun 33100,

400

hours of calls,

accumulated and analyzed before we started out, before we

we started developing anything. And that that that is that is very much, I think,

goes back to the power of Sling.

It's the people.

And if you listen to them, you get good results. That's pretty much

it. It's not these

it's the the way I see it. It's not about what we think,

as much as what they think, and we we we we we try to put that at the core of things.

And for people who are

considering whether to use Flink at all or who are already using Flink, what are the cases where Dittorios is the wrong choice?

If they're using Flink,

it's not the wrong choice. I think that

our mission is to make

making the couple of notorious and Fling, you know, the norm. I think that's, you know,

why walk in the dark when you have a flashlight?

And I think because we are, you know,

very

community

oriented,

We have a free tier for most use cases.

So

just, you know,

why walk in the dark?

So if you are using Flink,

I think you should at least try.

And as you continue to build and iterate on the product and coevolve with the Flink ecosystem while keeping an eye on the rest of the streaming market? What are some of the things you have planned for the near to medium term or any particular projects or problem areas you're excited to explore?

As I said before, I think it's basically starting to look downstream,

see the solutions that are connected to Flink

both around, you know, AI and the visibility needed around the specific use cases in real time there

and around the new

solutions for ingesting real time data,

which I think

are very,

complex and will have their challenges.

And trying to be to give an end to end observability view on all that stream of data, you know, from the source,

to the destination,

giving the ability to to tune the system

in the right way

to make sure that in the connections between the different solution you don't lose quality or performance.

And I think that, yes, staying with observability,

but expanding our span

into

other solutions in that we were streaming data,

not staying on the on the Flink itself.

Are there any other aspects of the Flink streaming ecosystem, the challenges

of observability

for streaming data and Flink, or the work that you're doing at tutorials that we didn't discuss yet that you'd like to cover before we close out the show?

We we talk about the the adoption of, real time data, and I think that's

I'm I'm putting all my effort and all my words to to try, you know, going back to the beginning of what I say I said that, you know,

we

we the army, we, you know, transform real time data to human life.

It it as simple as that, you know, And I think that's no. Not in that perspective,

but I think that there's a huge connection between

real time data and revenue.

I think to give,

the ability

to use real time data, to give the needed experience,

to connect

people interactions

and things, what is called IoT,

to build the you know, connect the digital representation of people and things

together

in a real time manner and to build new products out of that. It's a thing that's, you know,

it's it's a must have and, you know, I'm

it's not moving at the the

speed that I think that it should be. I think that Flink, you know, can help that, but it's not you know, it's it's about the human decisions and not technological. You said the technology is there. So, yeah, if I can use your platform is to, you know,

encourage

companies to try streaming data. I think that

you don't need to jump, you know, to the water

full size,

start a use case,

if you're not in production,

see the challenges,

grow as you go, you know, shift the balance between batch processing,

and streaming data. You don't need to leave that, you know, legacy way of thinking. 1 of the advantages of Flink, for example, that it can do both batch processing

and real time processing. So, you know, even on the same platform, it can do both. So that's basically what I can say about that. Yeah. And we are, you know, in that ecosystem

of, you know, giving the best

aid that we can.

Alright. Well, for anybody who wants to get in touch with you and follow along with the work

that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

Wow.

I think the data

domain is too rigid.

I think that

companies still think about it, you know, and the the concept of measure 10 times, cut once,

but post data solutions and data itself move

much faster than that.

And I think that

you can't treat, you know, your

data infrastructure decisions as a big decision.

I think that it should be, you know, more oriented to

consuming services and changing them,

not building, you know, architectures that are

100 precise

fitness

and uniqueness,

something that is more general but more flexible,

and I think that that could

give companies the ability,

first of all, to save money.

There are huge processes

of, you know,

thinking and

and decision making around the, data infrastructure when, you know,

cloud

environment and cloud culture,

you just, you know,

change services and connect them to each other. And I think that if that will be an enabler,

you know,

standards will start to appear,

things would have to connect to each other, and, you know,

that data domain will be, you know, more mature. I think it's now it's

it's too broken,

things

are very hard to connect to each other, but I think it starts with company that think of data infrastructure

as a very big decision.

I think that needs to be changed.

I think maybe the other thing is that

data engineering

tools

need to move forward as fast as a a software development

and DevOps tools have evolved.

I think that data engineers lack

some of the tools that would make their

everyday job more productive and easier.

Some of it is observability, but it's not all of that. I think that there's a difference between software development

and data development.

Of course, data development uses software, but

it's another layer of development, and I think that it misses the specific tools that data engineers need. And, yeah, both of those things are, you know, connected.

The data domain need to, you know,

grow and, you know, pass that grade of

old school DBAs

and go to more

cloud thinking,

as a server thinking,

and making things more

more rapid.

And I think that flexibility

is another word of relevance and revenue.

Alright. Well, thank you very much for taking the time today to join me, share the work that you've been doing on Detoreos and your experience

of supporting the Flink ecosystem, bringing more visibility into what's happening there.

So appreciate all the time and energy that you're putting into

improving the observability

of streaming systems, and I hope you enjoy the rest of your day. Thank you,

Thank you very much. Thank you, Sebastian. It was

fun.

Thank you for listening.

Don't forget to check out our other shows, podcast dot in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning Podcast,

which helps you go from idea to production with machine learning. Visit the site at dataengineering

podcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts atdataengineeringpodcast.com

with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links