Bridging Code and UI in Data Orchestration with Kestra

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

Your host is Tobias Macy. And today, I'm interviewing Anna Geller about incorporating both code and UI driven interfaces for data orchestration. So, Anna, can you start by introducing yourself?

Yes. Of course.

I'm Anna Geller. I'm a data engineer and technical writer

turned

product manager. I worked in many data engineering roles,

including,

consulting,

engineering,

and later also DevRel.

And currently, I worked as a product lead at Kestra.

And, yeah, that's the subject of today's, podcast.

And do you remember how you first got started working in data? Yes. So I I think I started working with data during internship at KPMG,

processing data for a year end audits.

So there there was a lot of

Excel spreadsheets and queries to SQL Server.

Yeah. That was how I started. I actually also studied,

kind of data engineering,

as as my master. So, yeah. In terms of the scope of this conversation,

can you start by giving your definition of what constitutes data orchestration and what is necessary for a system to be able to orchestrate data effectively?

Yeah. So it's it's always,

a bit difficult to agree,

in the industry on definitions.

The way I see data orchestration is that it's,

automated coordination

of workflow nodes that touch data.

This means that, essentially,

any workflow nodes that interact with data, whether they produce data or consume data,

they all fall into this category.

I think one misconception

I see, that many people associate the orchestration

only with, ETL and analytics.

And instead, I think that,

we should see it a bit more as a broader concept that, covers

how DITA moves,

across your entire business.

So I think,

every company has their, internal APIs that need to exchange data.

You need to react to events, like, sending an email, and

maybe update inventory anytime there's a new shipment. You need to process data across ERP,

CRM, PLM,

all kinds of internal systems,

and and you often needs need to do that in real time,

rather than in just nightly ETL jobs. Yeah. So I think

the distinction is,

whether you want to automate workforce for the entire IT departments

with multiple teams, environments,

internal systems,

or whether you just do it for the data team.

And another aspect of the

challenge of trying to really pin down what data orchestration means and what you should use to execute those workflows is that

in the technical arena and in organizations,

there are numerous different

scheduling systems,

workflow systems,

automation systems,

in particular, things like CICD for software delivery.

There is a scheduler in Kubernetes and other container orchestrators.

There are things like CRON and various other time based scheduling or event based systems such as Kafka

or different streaming engines.

And a lot of times,

because something already exists within the organizational context,

when a new task or requirement comes up, the teams will naturally just reach for what they already have even if it's maybe not necessarily designed for the specific task at hand. And I'm wondering if you can talk to some of the ways that those tendencies can lead to anti patterns and some of the limitations

in the approach of using what they already have for data specific workflows.

Yeah. So I I believe there is a lot of overlap of, functionality,

between,

all those,

CICD,

scheduling

and orchestration tools. If you if we think about it, they all have a trigger. Right? So for example, when a new PR pull request is is open or merged,

you need to do something. They all have a list of jobs or tasks to run,

when when some event is received.

They also all have states. So they are all state machines in the end. If if a given step fails, you want to maybe restart,

the entire run from a failed state.

And also many CIS tools,

maybe in the data space, we don't realize it, but they also have things like, notifications on failure.

They have ways to maybe pause

after

build step

to to validate if the build was correct and to approve or reject deployment. Right?

So there's quite large overlap and it's I think it's quite natural for companies to,

instead of directly looking at,

considering

dedicated orchestrator,

that they first try to use what they have and see if they can expand it to to use cases like,

data workflows,

automation of microservices,

or automation of business processes.

I think the limitations

usually show up, when you have

true dependencies,

across

workflows, across repositories,

even across teams and infrastructure.

And also when you start running workflows at scale, because then you just lack visibility.

It's kind of the same as with AWS Lambda.

When you have tons of those different functions, at some point, you are just confused. You you have no overview,

of what is what is actually the health of my platform. And, let's take GitHub action as one concrete example. GitHub actions is great, but the moment you have complex dependencies

or custom infrastructure requirements,

GitHub actions start becoming maybe not the right solution. For example, you want to run this job on ECS Fargate

and run this job on Kubernetes

and run this another job on my on prem machine,

to connect to my on prem database

to perform some, data

processing. Then you have patterns like, run this job only after those 4 jobs complete successfully,

or run things at scale. And you want to manage concurrency.

You want to manage multiple code bases, from multiple different teams.

Already managing secrets

across all those multiple repositories as you would have to do with GitHub actions can become a bit painful when you have, like, multiple teams that maybe you want to reshare them. This kind of visibility

and governance

at scale is something where I I believe you you may consider, like, a true orchestra orchestrator.

Another

challenge in the opposite direction is that teams that do invest in data orchestration will say, again, I already have something for doing orchestration. Why don't I also use that for CICD or whatever other task automation I have? And I'm curious what you have seen as some of the challenges

in that opposite direction of using a data orchestrator for something that is not a data driven

workflow?

It depends on what we, in the end, consider as data orchestrator

because many data orchestrators, they will not be able to perform this task, like, triggering a CICD pipeline,

to deploy some containers.

For example,

dbt cloud. If you consider dbt cloud to be an orchestrator,

you will not be able to,

maybe start some Terraform apply from dbt. It's obviously not not this use case. For Python orchestrators, like, you know, all the airflow,

and all tools in this space, I I think it's more feasible,

but it can be a bit clunky to run to orchestrate CI from, just from Python,

because mostly in CI, what you do is you run CLI commands.

You want to,

maybe if you do it from airflow, you would need to have some HTTP sensor

that listens to,

some event webhook, maybe after your pull request was merged or something like this. So it it it would be feasible, but it can be quite

quite clunky and, not not easy to maintain. In Kestra, we try to make this pattern,

really easy

given that you can simply you add a list of tasks with your CLI commands,

then you add a webhook trigger that can react maybe to your pull request event. And then it's it's very simple. I actually have one quote. I don't know if I should just, like, read it out loud with one user who who is doing a CICD in Kestra, and he mentioned that it it was really refreshing.

It's so simple yet powerfully flexible.

It really does allow you to create pretty much any flow you require. I have been migrating our pipelines from GitHub Actions to Kestra, and it's been so simple to replicate the logic. The ability to mix and match plugins with basic shell scripting or script from a language is just amazing.

I think it's, possible we have, some,

good testimonies that kind of, like, prove that the transition was fairly seamless.

Another element of data orchestration

is the

way in which it's presented and controlled. There have been a number of generations of data orchestration,

each focusing on the specific problems of the overall ecosystem at that time.

And one of the main dichotomies that has existed throughout is the question of whether it's largely

a UI driven or a low code approach where you're dragging and dropping different steps and connecting them up in a in a DAG or whether it's largely a code driven workflow where that also has some degrees

of how code heavy it is or maybe it's a YAML description of what the different tasks are. Maybe it's pure code where a lot of times that will lock you into a particular language environment.

And I'm wondering what you see as some of the main motivators for those UI versus code driven workflows at the technical and the organizational level.

The main motivation

to combine,

code and UI driven parents is is to close the mark to close the the market gap. The way, we see the the orchestration and automation,

tool market is that on the one hand, you have all those, code only frameworks

often to requiring you to build your workflows in, Python, JavaScript, or or Java. And on the other spectrum, you have all those, drag and drop ETL or automation tools. And in both of those categories, there are there are many solutions you can pick from.

There are bunch of, pie like, orchestration frameworks. There are a bunch of no code, drag and drop solutions, but there are very few tools in the middle. And this is the the gap that Kestra tries to fill. And in general, we believe that Kestra is the best among low code orchestration solutions.

And if we if we make this claim that we are we are the best, so why why are we the best? Most tools in this no code UI,

space, you would first build something in the UI, and they will create a dump of JSON schema, and they will call it code. So

in the end,

I believe, what what Kestrel does differently is that with every new feature,

we start with code and API first, and all those UI components come later. And as a result, the YAML definition is readable. It has full auto completion,

syntax validation.

You have great UX from in terms of that you have a built in documentation,

revision history,

Git integration so that you you can iteratively

start building everything in the UI. You can then push it to Git when you are ready, and you cover this whole spectrum

of being able to

to have this, like, nice intuitive UI to to iteratively build workflows without

compromising the engineering benefit of a framework. To kind of maybe summarize this is that existing solutions are I usually either too rigid, like all the no go tools, or they are too difficult,

like all the frameworks, you know, to some extent,

with Kestra, you have all the benefits of a code based orchestration framework

without the complexity of a framework. So you don't have to,

deploy and, package, your code. You can just go to the UI.

You quickly edit it. You run it to check if it's working, and you are you are done in in just a few minutes.

One of the challenges

of having

a low code interface,

even if there is a code driven workflow available, is that it imposes necessary constraints to be able to ensure that even if you do have a code element, you're able to visually represent it for people who are using that UI driven approach.

And a lot of times, I've seen that lock the

tool chain into a specific

technology stack where maybe it is UI driven. It will generate code for you, which you can then edit, and it will translate that back to the UI, but only if you're running on Spark or only if you're running on Airflow.

And I'm wondering if you can talk to some of the ways that that by modality and the the requirement to be able to move between those different interfaces

and maintain parity between them imposes constraints as far as

the interfaces or the workflow descriptions or the types of tasks or runtime environments that you're able to execute with? There there are no constraints in terms of, what you can orchestrate in terms of, technology you want to integrate it with. The only constraint is that Kestra has built in syntax validation,

which means that the API doesn't allow you to save the flow if it's invalid.

So this is,

one constraint.

There there are obviously tons of benefits, with this. There are no surprises at run time because the flow is validated,

during its creation,

at the build time. If you have invalid, let's say, indentation

in your in your Kestra YAML, Kestra won't let you save that flow. And in contrast, like, we can maybe compare it to, like, how it's handled in Python because I believe your audience, a lot of them use, tools like airflow. So with a DAG defining a Python script, your workflow logic can be potentially more flexible, but a wrong indentation

in your Python script will be detected at run time. So in the end, it's it's more flexible, but also it's more fragile. And in the end, as with pretty much everything in technology, it comes to the trade off of, constraints and guarantees that we can we can offer. With Python, you can have potentially

a bit more flexibility

in how you define this this workflow logic,

but at the risk of having additional runtime issues if something is is incorrect. And you have also this downside that you have to actually package

and deploy that code, with the benefits of, being in Yamo. Kestrel is a bit more constrained,

but it's also portable and self contained. It's it's quite painless to deploy.

It's validated at build time, and you can be sure that everything is working. So, yeah, the the pretty much the only constraint is that you cannot save an invalid flow.

Data migrations are brutal. They drag on for months, sometimes years, burning through resources and crushing team morale.

DataFold's AI powered migration agent changes all that. Their unique combination of AI code translation and automated data validation has helped companies complete migrations up to 10 times faster than manual approaches.

And they're so confident in their solution, they'll actually guarantee your timeline in writing.

Ready to turn your year long migration into weeks? Visit dataengineeringpodcast.com/datafolds

today for the details.

So in order to explore

a little bit further as far as the constraints and benefits,

I think it's also worth discussing what the overall architecture

of Kestra is and some of the primitives that it assumes for those orchestration tasks. And then we can dig more into

some of the ways that you're able to use those primitives to build the specific logic that you need. So if you can just give a bit of an overview about how Kestra is implemented and some of the assumptions that it has about the level of granularity of the tasks and the types of inputs and outputs that it supports.

Yes. So maybe let's start with the with the architecture.

Kestra started with the architecture that that relies on Kafka and Elasticsearch,

and it was really great in terms of, scalability,

no single point of failure. But at the same time, it made it more difficult for new users to, get started with the product and to to explore it. Many of,

listeners probably know maintaining Kafka in production can be difficult. So that's why Kestra

edits the the architecture

with JDBC back end, in the open source version.

This means that,

you can use, Postgres, MySQL, SQL Server, or h two as your database. And on top of that, you have

the typical server components you can expect from from orchestration tool, which is

you have executor,

scheduler, web server, and, workers. All of those components can be,

scaled

independently

of each other because, all of those are kind of like microservices.

And, yeah, you can if you need more more scheduler, more executor, you can just increase the number of replicas in your Kubernetes deployment and everything just works. So this is from the, let's

say, DevOps back end back end perspective of the architecture

in terms of user experience.

Kestra really relies, heavily on the API.

We are API first product.

This is not,

orchestration framework that you would just define your code, run it locally, then deploy the code instead,

everything interacts through API.

So the restrictions in terms of tasks and triggers you can do are restricted by the plugins that you have in your Kestra instance. You can have as many plugins as you want. By default, Kestra comes prepackaged

with all plugins, so you don't need to install anything. This is kind of the main benefit you you already get with orchestration platform like Kestra that there's no need to pip install every dependency that you need to,

use all those kinds of different integrations.

Everything is by default prepackaged.

And

if you need a bit more flexibility,

you can cherry pick which plug ins are included.

So let's say you are AWS shop. Like, you you don't use, Azure GCP. You don't want those extra plug ins for those other cloud vendors. You you simply don't include them in your plug ins directory in Kestra, and you just, like, cherry pick the plug ins that you that you need. On top of that, you can build your custom plug ins.

The entire process is fairly easy. You have a template repository that you can, simply

fork and build your code on top. Then you build your your jar file included in the plugins directory, and then you have the custom plugin. Then in terms of

governance that you can have on top of this,

as Kestra administrator,

you can set plugin defaults

for each of those,

plugins that you added, to, for example, ensure that everybody is using the same AWS credentials.

Or you or if you want to globally enforce some pattern that maybe everybody should use, this way of working those properties, you can enforce them on a all globally using plugging defaults.

And this pluggable infrastructure

has some constraints in terms of that. If you don't have plug in for something, you will not be able to use it. But the the benefit is, yeah, you have a lot of governance.

It's scarce really well with, more plug ins that you can always add. And we also

have the the possibility to create custom script tasks. So

if some plugin is missing and you don't want to

touch Java to to build custom plugin, you can do that, for example, in, Python, r, Node. Js.

You can write your custom script, and you can just run it as a container.

That's kind of like how Kestra can,

support all those different kind of integrations.

And so in terms of the level of granularity

of the tasks or the data assets that you're operating over,

what are the assumptions of Kestra as far as

the, I guess, scale of data, the types of inputs and outputs, and

in particular,

the

level of detail that you're able to get to as far as what a given task or plug in is going to execute and how that passes off to the next task or plug in? That's mostly

coordinated through inputs and outputs. So each workflow can have as many inputs as you want in, all inputs are strongly typed. So you can say, okay. This plug in is a bullion. This plug in should be integer, and this plug in is a select. So you you can only select the value from the drop down. Maybe this input is multi select, so you can only have one of the predefined values.

You can have JSON, URL, all kinds of different inputs, and that's that's, already the benefit that they are strongly typed. So the end user who may not be as technical,

will know what are the value values they can input into the workflow. Then the communication between

tasks to pass data between each other is mostly

operating in terms of, metadata and internal storage.

If you want to pass some data objects directly,

you can do that. If your if your plug in, specifies that some data should be, output. Indirectly,

you also have input files and output files for for script task. So you need to explicitly declare

that, let's say, this Python task should output,

those those two files or maybe all adjacent files, and then they will be captured and automatically persisted in Kestra's internal storage.

You can think of internal storage as s three bucket.

It can be s 3, GCS, etcetera, or just local storage.

People familiar with airflow can think of internal storage as airflow's, x comms

without the the complexity of having to do, like, x comm, push and pull. So, yeah, that's that's how

how tasks can pass data between between charter, and you can even pass data across workflows. I think this is huge for governance. We have many many users who use, for example, subflows

to compose the workforce in a more modular way that you can have, one parent flow that triggers multiple processes, and each of them is comprised into,

subflows.

And the subflows can output some data as well, and they can pass it between each other, so that you have this way of exchanging data between,

different teams, different projects,

without having to hard code any dependencies and without having to rely on implicitly stored files somewhere,

locally. Another

trend that's been

growing in the data orchestration space is the idea of rather than

data as tasks, treating data as assets where

one task might produce multiple assets.

The canonical example largely being dbt where you might have one dbt

command line execution that produces

tens or hundreds of different tables as an output and being able to track those

independently,

particularly if there are downstream triggers that depend on one of those tables being updated or materialized. And I'm wondering how Kestra

addresses or some of the ways that Kestra is thinking about that level of granularity

in terms of a task producing

multiple different outputs or assets as a result.

Yeah. That's that's totally feasible.

Each task can output as many results as you want to. Maybe I I wouldn't recommend to outputs like 1,000 files because maybe the UI could break potentially. But, overall, you can output as many things as you wish, and, Kestra is

doesn't introduce,

any restrictions in terms of, like, what your specific outputs can be. There is one really great feature that people really appreciate in Kestra, which is, outputs preview. So it's, your task run returns, maybe CSV file or JSON file. You can easily preview it in the UI, so that you know if if the data format is right, if everything looks looks good. In the same way, if something fails, you can maybe preview the data

and, see, okay. Maybe what I have in this downstream task is

some some error in my code. Like, maybe you didn't capture some edge cases. You can redeploy

your workflow. So, essentially, you create a new revision and you can rerun it for only for this,

new, last task. This is a feature called replay. It's super useful for, like, failure scenarios.

If you have, if you process data and you have some things that are unexpected,

and you don't want to rerun, like, all those previous things. Right? Because everything else worked. Only this single thing, didn't didn't work. So you can very easily, reprocess things that don't work simply by fixing the code and pointing the execution to the updated revision.

In terms of the audience that you're targeting, given the fact that it has this UI and code driven approach, I'm wondering how you think about who the target market is, the types of users, and some of the ways that that dual modality

appeals to different team or technical boundaries across the organization?

Yeah. That's that's, that's a great question.

Our target audience currently are mostly

engineers who build internal platforms.

So usually, you would build some workflow patterns and you want to expose some workflows maybe to to less technical users to external stakeholders.

We have lots of architects, software architects coming to to Kestra to support them in replatforming.

This usually means they want to maybe move from on prem to cloud, or there's also this completely reversed pattern. There are many companies who now these days move from cloud back to on prem because of some additional compliance reasons. So, yeah, a lot of people using Kestra are those, platform builders who then expose those workforce to to less technical users for a variety of use cases. Kestra is not focused on exclusively on data pipelines. We also support

infrastructure and API automation,

business process orchestration.

You have things like approval workflows. Mhmm. One very common scenario is that there are some

IT automation tasks that, for example, provision

resources,

and some DevOps architect or manager needs to approve if those resources can be deployed. So you you have this approval process implemented in Kestra that the right person can approve the workflow to continue. We have also all those, event driven, data processing use cases,

where you can have events. You receive events, for example, from, Kafka,

SQS,

Google pops up, and you want to trigger some microservice

in response to this event.

That's also perfect use case for Kestra. So it's not restricted to data pipelines. And I would say state is data orchestration because you you react to some data changes in the business, and you want to run some data processing in response.

As a listener of the data engineering podcast, you clearly care about data and how it affects your organization into the world.

For even more perspective on the ways that data impacts everything around us, you should listen to Data Citizens Dialogues, the forward thinking podcast from the folks at Calibra.

You'll get further insights from industry leaders, innovators, and executives in the world's largest companies on the topics that are top of mind for everyone.

They address questions around AI governance, data sharing, and working at global scale among others.

In particular, I appreciate the ability to hear about the challenges that enterprise scale businesses are tackling in this fast moving field.

While data is shaping our world, DataCitizens Dialogues is shaping the conversation.

Subscribe to DataCitizens

Dialogues on Apple, Spotify, YouTube, or wherever you get your podcasts.

At the organizational level too, I'm interested in some of the ways that Kestra is able to

implicitly bridge

these different workflows

without the different teams needing to

know

every detail of what the available data is and how it's produced, where, for instance, I have a workflow that's

the

available

data is and how it's produced, where, for instance, I have a workflow that's taking a file from an SFTP server, processing it, generating some table in the data warehouse as a result, and then somebody else's workflow depends on the contents of that data warehouse table, does some analysis, produces some data visualization or a report generation, execution of the next task without the person who controls each tasks having to

explicitly communicate between them or requiring that the workflows are directly built on top of each other? So Kestra supports that pattern, using

flow trigger.

This was in fact, like, one of the most popular patterns

already from from the beginning of of the product. The use case typically looks as follows. You have multiple teams that

don't have

tight dependencies between each other. So you would say, run this flow

only

when those 3 other workflows from different teams have successfully completed within the last 24 hours. So you can easily define it as,

as a condition

for this flow,

so that it only runs after those preconditions are true. And you can additionally add conditions for the actual data. So you can say

only if this data returned maybe 200 status code or if this data has this number of rows,

do something, like trigger this workflow. Kestra doesn't introduce, like, any new concept for this. We already have the concept of triggers.

So implementing this those kinds of patterns is a matter of

explicitly

declaring in your YAML what are expectations

to trigger this workflow. And you can explicitly list

all of those,

flow execution that should complete it should be completed within this given time frame, and then it will run. So I think it's like in in the mindset is quite similar to

how many other, like, also data orchestrator

orchestrators do do that, but without

restricting it directly to only being data. Another aspect of what you're building with Kestra

is the fact that it is an open core product with a paid service available on top of it. And I'm wondering if you can talk to some of the ways that you think about what are the elements that are available

as the open source, what are the things that are paid, and some of the ways that you think about the audiences

across those 2 and how you are working to keep the open source elements of it sustainable

and active.

That's that's the challenge every open core company is asking themselves every day. I'm pretty sure.

We have, this framework

that

all features that are about

security,

scalability,

and governance,

they all go into the enterprise edition and all features that are single player,

core core orchestration capabilities, they go into into the open source version. And that's how we try to

balance.

There is I believe there is no single answer. Every company tries to find the best solution. What we found out so far is that, we have some prospects, some, people coming to the to to Kestra who would prefer to have fully managed service. And currently, Kestra doesn't offer that. We have open source and, self hostable enterprise solution. So that's something we'll be working on next year. It will be a big priority, especially to enable

even more people to try the product,

see how it's working, and including

just trying even those paid enterprise grade features, without having to, like, first maybe talk to sales and start official POC.

And as you have been building and working with Kestra, what are some of the most interesting or innovative or unexpected ways that you've seen it

applied? Yeah. Some of the interesting

is, we we have one solopreneur who was automating their entire business with with Kestra, including,

payment automation,

categorizing,

customer, customer support tickets, using,

OpenAI.

So,

super interesting use case and great to see that, Kestra can can be applied in, for cell burners.

For more surprising and unexpected,

I would expect

more people to be able to write custom code. And what we have found out is that

there are many, many users who purely use our plugins.

If they need to have some transformations,

they would often just add custom

pebble expressions. So this is like Python Jinja, where they transform some data on the fly without, writing,

dedicated code, like extra Python functions. So, yeah, I was frankly a bit surprised. Like, sometimes it seemed to me personally easier to maybe write custom code for for this aspect, but I see users just prefer to just keep things simple, just simple transformation

function, and,

move to the next task. I also was a bit surprised

how many users actually

leverage the low code, aspect of Kestrel. Our default UI is to is to use the code interface, so you need to write your workflow. We have, beautiful, like, auto completion, syntax validation,

when you just type things from the UI. But menus are still explicitly opt in to the topology view and just add things from the low code UI forms. So that's one aspect was which was also surprising to me. And, overall, I think it's always surprising to see how broad spectrum of users are coming to us. We have some who, as I mentioned, just prefer to keep things simple. They only use our plugins. And there are other people who just write custom code for everything. So, like, every task is maybe, Ruby or JavaScript

or Python task. So the spectrum is really wide, and it's really interesting to see this.

Another aspect that I forgot to touch on earlier is given that Kestra is by default a platform service,

what does the local development experience look like for people who are maybe iterating on a script or trying to test out a workflow before they push it to their production environment?

Yeah. I I believe the local development is is really great. We have feedback from one user who mentioned that writing workflows in Kestra is fun, which is,

unheard of in in the world of orchestration that building more first can be fun. So, essentially, to to get started, you, you run a single Docker container. You open the UI, and you hit a single button to create a flow. From here, you add your ID for the flow, the namespace to which it belongs,

the list of tasks that you that you want to orchestrate, and the triggers. So whether this should run on schedule or based on some event when new file arrives in history, etcetera. And then when you start typing your tasks, you get this auto completion,

built in documentation.

You have also blueprints that will guide you through examples on how to leverage some, usage pattern. So I believe the local development experiences is really unique

to Kestra. And as I mentioned, some use even consider this fun, which is very refreshing.

And in your own work of building and using and communicating about Kestra, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process? One of the most challenging or interesting lesson we've learned is that, following this the common, VC advice of, you know, just start with a niche, then,

lend in an extent. This I think this approach didn't work for Kestra as as good as we would wish. So at 1st, Kestra targeted,

mostly

data engineering for analytics use case. And over time, we we expanded to operational teams, and we focus on engineers who are building, orchestrating custom applications,

reacting to events,

building scheduled backup jobs, or building infrastructure,

with Kestra. And this is where the adoption really took off. So the lesson learned is,

you you don't need to follow the VC advice always. Sometimes,

following your your own vision can can be better. Then in terms of product building,

some lesson learned is that we were trying to use Versus code editor within Kestra. We,

within one release, we, launched embedded Versus code editor within the UI. Over time, we found it it was really difficult. And in the end, it was much easier to build our custom editor than keep maintaining the one from Versus code because you have so little control over how everything looks like, how how we interact between this, Versus code extension and the UI. Yeah. So I think this was something that was surprising. We thought it would be easier. We also thought that Versus Code would be more open and not as restricted.

So if you want to, for example, use GitHub Copilot in your in your product, you you cannot do that. It's it's really restricted

to Microsoft only. And for individuals or teams who are evaluating

orchestration engines, they're trying to decide

what fits best into their stack. What are the cases where Kestra is the wrong choice?

Yeah. So Kestra is the wrong choice if you build stateful workflows that implicitly

depend on side effects

produced by other tasks or by other workflows. To give you an example,

let's say you have 1 Python function

that writes data to a local file. And there is another task in another workflow

that tries to read this local file. Technically speaking, if you use worker group feature in Kestra, you could make this work.

But we consider this

implicitly stateful approach a bad practice.

We prefer that you declaratively configure

that this task outputs a file, and this file will then be persisted internal storage. And then it can be accessed

transparently

by the task or even by the flows.

In general, we we try to bring

infrastructure as code best practices,

to all workflows. So we assume that you you your local development environment

should be the same as what you were in the end doing production.

So if in prod, you usually run things in a distributed fashion. So you cannot guarantee

that those two tasks will run on the same worker to access this local file.

That's why we we consider this an anti and anti pattern, and each execution in Kestra is by default considered stateless.

And only if your tasks explicitly output some results,

those results are persisted and can be, can be processed.

And as you continue to build and iterate on and explore the market for

orchestration engines in the data context. What are some of the things you have planned for the near to medium term or any particular problem areas or projects you're excited to dig into?

Yeah. We we are we are really

excited about the,

the feature we are,

we will be releasing on December 3rd, this year, and this will be apps. This will allow you to build custom applications

directly from Kestra.

So you can treat your your workflows as a back end, and you build custom UIs

directly from Kestra.

So let's imagine that,

you want to

have some business stakeholders

who want to request some data. They can go to the UI. They can select from the inputs what type what type of data they want to request.

Then your your workflow

can fetch

and process and transform

all the data in the way this end stakeholder

needs it.

And it can then output this data directly from this, like, custom application.

So this eliminates this need, you know, and often I think as data engineers, we we know this use case where,

stakeholder comes in and ask, like, could you fetch this data for me? I just need this report.

So effectively,

they can fully self self serve with this approach.

Similarly, if you have,

patterns that need approval.

Right? So,

let's say somebody wants to,

request compute resources.

You

can fill those inputs in a custom form,

then this will go to the, manager or to the DevOps engineer who can look at the request.

They can approve it, and then you can maybe see the result. So in the end, those custom applications, I think this will be feature that will unlock tons of different use cases, and we are very excited about this one.

Similarly,

since we follow this approach of everything as code,

we are building a feature which is custom dashboards.

So you can build custom dashboards that,

visualize how your execution data should look like, and you can do that as code.

So similarly to how you have worked for azimuth, you also have your custom dashboard,

azimuth,

which which you can version control. You can track revision history.

This is also another feature that will be,

launched

in December. And long term,

in terms of what is on our road map, it's,

cloud launch.

We need this, fully managed service as I mentioned before, and also some improvements to,

human loop.

I think, to accommodate to AI driven world where AI generates some data, you need to have,

reliable human need the loop processes that

where human can approve

the output generated by AI. So that's also something that we that we work on even more. Are there any other aspects of the work that you're doing at Kestra or the overall space of

UI and code driven orchestration that we didn't discuss yet that you'd like to cover before we close out the show? No. I think we've we've covered a lot of ground.

Thank you so much for inviting me to the show. It's been great. Yeah. I'm very grateful. Well, for anybody who wants to get in touch with you and follow along with the work that you and the rest of the Kestra team are doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

We briefly mentioned the topic of everything as code.

And

so far, DBT has brought this approach to analytics.

We see BI tools are catching up. So, slowly, you can start building

dashboards as as code, which,

can follow the same engineering practices. I think we are still far away from the world where you can really have everything in the data engineering process, managed as code, and I think we we should probably close this gap at some point. Alright. Well, thank you very much for taking the time today to join me and share the work that you and the Kestra team are doing on bridging the gap between code and UI driven workflows and expanding beyond data only and ETL only

execution. So appreciate the time and energy that you're all putting into that, and I hope you enjoy the rest of your day. Thanks so much.

Thank you for listening, and don't forget to check out our other shows.

Podcast.netcoversthepythonlanguage,

its community, and the innovative ways it is being used, and the AI Engineering Podcast is your guide to the fast moving world of building AI systems.

Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. Just to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.