Bringing Feature Stores and MLOps to the Enterprise at Tecton

Hello, and welcome to the Data Engineering

podcast, the show about modern data management.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need

to run a bulletproof data platform. Go to data engineering podcast.com/linode

today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Do you want to get better at Python?

Now is an excellent time to take an online course. Whether you're just learning Python or you're looking for deep dives on topics like APIs, memory management,

async and await, and more, our friends at Talk Python Training have a top notch course for you.

If you're just getting started, be sure to check out the Python for absolute beginners course. It's like the 1st year of computer science that you never took compressed into 10 fun hours of Python coding and problem solving.

Go to data engineering podcast dotcom/talkpython

today and get 10% off the course that will help you find your next level. That's data engineering podcast.com/talkpython,

and don't forget to thank them for supporting the show.

Your host is Tobias Macy. And today, I'm interviewing Kevin Stumpf about Tekton and the role that the feature store plays in a modern MLOps platform. So, Kevin, can you start by introducing yourself?

Yeah. For sure. And first of all, thanks very much for having me. Excited to be on here. I'm the cofounder

and CTO of Tekton,

which is the first enterprise

grade feature store.

And do you remember how you first got involved in the area of data management?

So a couple years ago, I was 1 of the tech leads of Michelangelo,

which

is Uber's centralized end to end ML platform for operational ML.

And it got started in 2015

where we built this centralized system to make it really easy for data scientists and data engineers to

get an machine learning model into production

end to end, really from the entire workflow of cleaning your data,

turning the features, training the model, evaluating it,

deploying the production, and serving the production.

That's what this centralized platform allowed all the data scientists and data engineers to do,

and it led to this Cambrian explosion of machine learning at Uber and eventually led to, like, thousands of models that are running in production to drive things like Uber's ETAs,

Uber Eats, restaurant recommendations, and a bunch of other

use cases.

From that work, you have built out the Tekton platform. You mentioned that it's an enterprise grade feature store. So I'm wondering if you can just describe a bit more about what it is that you're building at Tekton and the motivation for turning it into a business?

So what we learned with Michelangelo at Uber and also

what we doubled down on as we talked to more enterprises was

really the understanding that building and deploying

operational machine learning applications is really hard.

And operational machine learning is really

ML apps that are running in production

that typically drive

products that the customer directly interacts with. It's a model that need to make decisions within just a couple of milliseconds

and that need to run at really, really high scale,

just different to offline machine learning models.

And what we've recognized is that building and deploying these applications is really hard because operational ML really consists of

3 different

components,

like an application, an ML model, and data.

And let me give you an example here. Let's say

an, ML operational ML use case is an Uber Eats recommendation.

You've got the application, which needs to make a prediction, and that will be the Uber Eats application or the back end.

Then you've got the model, which makes actually the prediction based on

a bunch of data about the user and their preferences.

And then that leads to the 3rd component, which is the data or oftentimes referred to as the features, which are the predictive signals

that the model actually makes a prediction on to indicate you or recommend different restaurants to you. And what we've recognized is that

building and deploying applications

like the mobile app or a back end microservice is a fairly solved problem.

Even

building and training ML models and getting them into production has become an increasingly solved problem

with MLOps platforms.

But what has not really been solved yet is the data problem for machine learning, meaning how do you turn raw data into features that is then feed to

a model that's running in production

or that you feed where you feed the data and the features consistently to

your ML model training pipeline

that builds the model in the first place. And

at Uber with Michelangelo, we solved the model and the data problem together,

and the data problem itself we solved with Michelangelo's component, which we call the feature store, And that is exactly what we focus on with Tekton,

solving the data problems and all of the data problems around

machine learning.

And before we dig too much into Tekton itself and the ideas and components of a feature store, I'm wondering if you can just give a bit more detail about what a feature actually is in the context of machine learning

and maybe a particular illustration

of how it differs from just a discrete data point?

Yeah. Absolutely. So a feature can really be

interpreted as

high density

information

that provides a signal to ML model to an ML model to make a prediction on, and it would typically be derived off of raw data.

It's it could be aggregated raw data.

For instance, let's say for an Uber Eats recommendation, it would be, like, 1 feature that is very predictive of which restaurants to recommend to you is

what is the type of cuisine

that you've most frequently

ordered from in the last 30 days,

or what is the restaurant that you've just clicked on 10 minutes ago.

Those are all derived

pieces of information

that allow the ML model

to come up with statistical correlations

that allow it then to make a prediction for what you may really like to like to purchase next.

To drive that home a bit more, the raw data that those

particular features might be pulling from is,

you know, you have click tracking events that are coming in through something like segment or your customer data portal.

And for the case of what purchases you've had in the past 30 days, you're looking at discrete order data so you can see what is the restaurant and then what is

aggregation

of those raw data points into that derived descriptor that you're that you were mentioning.

That's exactly right.

Taking more into the feature store itself, can you discuss the different elements that go into that that gives the ability to take that raw data and build these derived aggregates that can be then fed into a machine learning model?

So a feature store really manages all of your data for machine learning,

and it makes it really easy for you to run data pipelines that transform the raw data and turn it into feature values, and that raw data could come from, like, your Kafka stream or your Kinesis stream. It could come from your data warehouse, ARP Snowflake or Redshift, or it could come from your data lake, s 3, and whatnot.

A feature store allows you to manage these transformations, to manage these data pipelines, to automatically orchestrate them and execute them.

And then it also manages the storage of these feature values

where it typically would make it possible to store those feature values for offline consumption and for online consumption.

The offline consumption is necessary to drive batch predictions, like offline predictions,

and to drive training processes that typically happen offline over large amounts of data.

And then the online store

is the 1 that is used to serve feature values in production to make these ultra low latency and very high scale types of predictions.

And that's then where the 3rd part of the feature store comes in, the serving of this feature data to make that feature data consistently available for training purposes and for inference purposes.

And on the training side, it's typically very important that a feature store allows you to time travel to any point of time in the past

and give a data scientist a snapshot of what the world looked like at any moment of time in the past

so that

the model can learn

based on these individual data points from the past, what did the world look like and what happened to them with that information

eventually

train the ML model. And so taken together,

the feature so that makes it really easy to productionize new features

for a data scientist without requiring a ton of support from data engineers and whatnot.

It automates the feature computation, the backfills, and all of the logging around it.

And then and, of course, also makes it very easy to share and reuse those features once you've contributed them to a feature store.

It also tracks things like the feature versions, the lineage, and all the metadata around it.

And then finally, it monitors the health

of those feature pipelines in production.

And that is generally what, like, feature stores as a as

to what

and to what we've layered on besides that, like, core capability and and feature set off feature store.

Yeah. Definitely interested in digging a bit more into Tekton and some of the discrete elements that go into the feature store and how it shapes the interim data teams and the life cycle of machine learning. But before we get to that, I'm wondering if you can just give a bit of an overview about

how you characterize

the current state of the landscape for feature stores because I know that it's a fairly recent category of product. And so I'm interested in understanding a bit

more about your views about the level of maturity

and the type of adoption that they're seeing right now.

It's definitely an up and coming category

that I'd say we first coined with Michelangelo

a couple of years ago.

And since we first blogged about it in 2017,

we've seen how a lot of other tech companies and other companies have built their own in house feature stores. Like you've seen things

at Airbnb

where they've built zipline,

booking.com build their own feature store, Facebook has its own feature store,

And so there are a lot of in house developments, particularly at the large engineering powerhouses.

Now looking at what's been happening in the last 3 years, besides Tekton,

there are other startups in the space. There's logical clocks and there's stream sequel to name 2 examples.

There

is now

since this morning, actually, very app timing, AWS has

released its own feature store,

which seems to really be a feature repo that allows you to store feature values and serve them in production.

You still need to take care of the data preparation, the data cleaning, and all of the running of the data pipelines and whatnot yourself, but they do give you a way to to store feature values and serve them consistently to training pipelines and to models that are running in production.

In terms of the overall life cycle of machine learning operations,

the feature store is definitely a core component of it. And you mentioned some of the other aspects as far as model serving and deployment and

their monitoring capabilities that are necessary there. I'm wondering if you can just give a bit of an overview about how the Feature Store fits

and what are the additional components that are necessary to have a fully fledged machine learning operations life cycle?

So the entire ML workflow typically consists of multiple stages.

It begins with fetching the raw data,

then cleaning the raw data and doing your feature engineering work to turn it into your derived

feature values,

then you train your model. Once you've got your trained model, you evaluate it. You back test it and see if it is any good.

Then you manage the model artifact,

meaning that you store the artifact for reproducibility

purposes, for versioning and whatnot.

Then you deploy the model into production where you have it behind, say, a microservice,

and then that microservice is able to actually

make predictions in production

and to monitor.

Now all of these different workflow steps,

they typically require you to have MLOps

platform,

which allows you to train your model, manage the artifact, deploy it, and serve it in production,

and then you need, like, a feature store to solve the data problems around machine learning that I highlighted earlier.

And so with an ML pop ops platform

and a feature store, you have a pretty good setup in order to go through the entire ML life cycle end to end.

Some of the other interesting aspects of a feature store

is the

maintenance of the actual pipeline of deriving those features

and some of the challenges that might come about as far as resource contention where you want to ensure that everybody's able to build out the features that they want and that the feature stays fresh, but that you also don't

overtask the platform because of maybe some unoptimized code that's actually being used to aggregate the information where perhaps you're going with a brute force loop as opposed to a more mature algorithm that might be able to achieve the same output

with fewer cycles necessary. And so I'm curious how

you approach some of those types of challenges and things like Tecton for being able to ensure that you have this shared resource that's scalable and accessible,

particularly for engineers who maybe aren't focused on the performance characteristics, and they're just trying to

achieve a certain outcome, and they might be going for a naive approach and just how that factors into the overall

development life cycle of these features.

Yeah. Definitely. So Tecton is is a cloud native platform, and it takes full advantage of

the horizontal

scaling

opportunities in the cloud. So with with TechTown, you don't have this noisy neighbor type problem that you've just described.

And then the way that we avoid this is that

the different

data pipelines that generate 1 or multiple features,

they're actually all run on independent

data processing clusters.

And so what we would do is we would

spin up individual

on AWS, for instance, individual EMR jobs to then

run Spark jobs, which process

the, the feature values for a given feature pipeline.

And that 1 Spark cluster is completely independent of any other Spark clusters

that may be producing

other types of feature values at different freshnesses

or that may be producing feature values off of a stream. So all the different pipelines are really completely independent from each other, which allows

for avoiding the noisy neighbor problem,

avoids the the resource contention that you mentioned.

And so for

data scientists or analysts who are actually building out these feature pipelines and trying to define the data aggregation

to create these concrete features,

what does the workflow look like, and what is their interface for being able to actually define those pipelines and manage them?

So at Tekton, we're big believers that features

need to be managed

as code,

which allows you to bring all the DevOps best practices

to machine learning and specifically to machine learning data.

And because of that fundamental belief, the interface to defining and managing your features

is actually happening in

Python files

that are laid out in directory structure of your choosing,

where you use Tekton's

Python SDK

to define individual

feature groups.

And so you define various different metadata for features, like the name of the feature,

the entity it's assigned to, like a user or a transaction or something like that, the owner in the company who's responsible for it, And then you also define, of course, the transformation code itself,

which could be Pandas transformation code or it could be SQL or it could be PySpark code,

and that is all defined

in these Python files on disk. And what that allows you to do is, of course, to back these Python files up in a Git repository,

which gives you all the typical Git abilities, like doing code reviews and whatnot.

And then when you're happy with the definitions of your features, you would use Tekton's

CLI

in a very similar style to Terraform,

run Tekton plan

to

look at all the Python files and all the feature definitions

and

which is now your feature definition goal state and compare it to what's currently running in production. What are all the feature pipelines that you've already configured

that are running day in and day out?

And then this plan shows you

what is the delta of your change, like, what are new features you're about to create? What are existing features you're about to modify that may actually impact

models that are running in production?

And it prevents you actually from making these changes unless you know exactly what you're doing.

It is also able to show you expected costs, like, is that actually a change you you want to be making because it could be spinning up pretty expensive data pipelines.

And then when you're happy with the plan that you're seeing, then

you run Tekton apply in the CLI and actually apply these changes to the production system.

And now Tekton would be

running these pipelines for you on an automatic basis

and store the feature values for offline and online service. And what that tooling allows you to do is to integrate it with your existing CI and CD pipelines.

So as earlier mentioned, you can use code reviews to have somebody actually approve your changes to the feature repository.

You can write and run unit tests. You can write and run integration tests to ensure that the feature transformations actually all make sense and are valid.

And then your CICD pipeline

would use the CLI to actually roll out those changes to production.

And then if you wanted to, you could even monitor changes.

Anything looks fishy, you could kick off an automatic rollback and whatnot,

and that is what the the primary interface looks like to contribute new features to the feature store.

Now on the consumption side,

we

have 2 main APIs. 1 is, of course, the API for

the online serving in production,

which is a gRPC or a REST interface that you can query to fetch feature values

in very low latency and a very high scale.

And then there is Python SDK that you can use in your

Jupyter Notebooks, for instance, or on your laptop to fetch historical values from Tactile's offline feature store to generate your training dataset above which you're going to train your ML model.

Digging a bit more into Techton itself, can you discuss how the overall platform is architected and some of the ways that the design has evolved since you first began working on it?

So the platform itself is microservice oriented architecture.

The brains of Tekton

run on Kubernetes,

and the main technologies that we use in our stack are go for all of the low latency serving,

and then we use Kotlin for all of the non latency

sensitive back end processes.

The Python then the SDK

itself is implemented in Python,

and some of the major

changes that we've made a little over 2 years since we've gotten started are

on

the

customer facing front. 1 big change that we made was that, initially,

the feature transformations

could only be expressed using Tecton's DSL.

That is something that we did at Michelangelo beforehand as well, and it worked really well, but it is somewhat constraining.

It works for a lot of different use cases, but it doesn't work for all use cases. And so we extended that interface to allow customers to also express features using SQL and PySpark directly or just your Python pen does really the code.

Then another change that we made was that

initially,

we said, hey, Tekton itself really

manages the end to end feature life cycle, and that means that it manages both the feature transformations and all the data pipelines, and it takes care of the survey

of feature values.

But there are some customers who are actually quite happy running their own data pipelines using Airflow, using DAX or whatever it is,

And they mostly wanna use Tekton

to serve feature values online or offline or to leverage its monitoring abilities

or have its central catalog. And

what we then did was actually separate out the transformation capability and storage capability, so the customers really have the choice of only using the storage and the serving without the transforms

or use the entire platform fully integrated

where Tecton runs the transformations and pipelines for you as well. 1 of the biggest changes that we made

earlier

this year was adding this DevOps capability to Tekton that I've earlier mentioned where,

initially, you could create features only through a Python SDK in a notebook in an imperative style.

But we said that in order to really support the DevOps best practices,

you should be able to define your features in a declarative style

in files that you can check into Git where you can have the entire source of truth of your feature store be laid out in in a file system and manages and get And that's when we've moved over to supporting this declarative framework, adding the CLI to run these rolls out and whatnot.

You invest so much in your data infrastructure,

you simply can't afford to settle for unreliable data.

Fortunately, there's hope In the same way that New Relic, Datadog, and other application performance management solutions ensure reliable software and keep application downtime at bay, Monte Carlo solves the costly problem of broken data pipelines.

Monte Carlo's end to end data observability platform monitors and alerts for data issues across your data warehouses, data lakes, ETL, and business intelligence.

The platform uses machine learning to infer and learn your data,

identify data issues, assess its impact through lineage,

and notify those who need to know before it impacts the business.

By empowering data teams with end to end data reliability,

Monte Carlo helps organizations save time, increase revenue, and restore trust in their data.

Visitdataengineeringpodcast.com/monte

carlo today to request a demo and see how Monte Carlo delivers data observability across your data infrastructure.

The first 25 people will receive a free limited edition Monte Carlo hat.

And were there any particular lessons that you learned in the process of building out Michelangelo

and interacting

with the users at Uber that were useful? And and what are some of the assumptions that you built up as a result of that work that proved to be invalid as you exposed the work at Tekton to a broader range of use cases and industries?

1 of the things that was invalid is the 1 that I just mentioned where

at Michelangelo, it's totally fine to

leverage a lot of the DSL to express future transformations,

and we saw that by interacting with much broader set of of

customers with Tekton,

we need to give customers more flexibility,

allow them to

write their transformations using Pandas code or PySpark code or SQL code

and not just constrain them to DSL.

On the lessons that we learned at at Michelangelo that definitely

proved to

be true

was

that having a

centralized

platform

which can standardize these workflows

really is a tremendous

gain of efficiency

because data scientists and data engineers, they don't need to

make as many

decisions about how do I define a new feature, how do I run, how and where do I find a new

data pipeline to run my

feature transformation code? There's just 1 central place that you go to where you know how to use the system and it takes care of all the heavy lifting for you.

And another thing that we also learned, actually, just running Michelangelo in production and managing all of its SLAs was

how important it is to

use

cloud native infrastructure as much as possible.

So

for instance, with Tekton, our key value store is AWS, for instance, Dynamo.

Instead of having Tekton manage its own key value store

so that we can actually offload this burden to a managed service, which

deals with supporting

strong SLAs.

And

at

Uber, for instance, we

had a had a separate team

within the company, which was managing entire Cassandra clusters

teams that

would require a a key value source running in production.

Digging back into

the

workflow

of the teams who are working with the feature store, both on the data engineering and data producing side and also on the analysis and data consuming side.

How do you see the feature store as changing the

relationship

between those different roles within the data organization?

A feature store really fundamentally empowers the data scientists.

They're now able to

create new features

on their own and go from idea to having them in production

without having to Whereas,

beforehand,

if

a

data

scientist

has

an

idea for a Whereas beforehand,

if a data scientist has an idea for a new feature, they would have to typically write up feature engineering code in their Jupyter Notebook,

test it offline, and then eventually throw this Jupyter Notebook over the wall to a data engineer or an ML engineer.

And then that productionization

work where they would typically reimplement

the the pipeline in, like, Java or some other production ready language

would have to be prioritized.

The data scientists would have to wait several weeks or sometimes months to actually get this into production.

And by the time everything's productionized, the data scientists may even just already be in a different team or have left the company altogether.

And so with a feature store, data scientists are now really fundamentally

empowered to go end to end

from

turning the raw data into features,

turning those features into a trained model,

deploying the model using their MLOps

platform into production, and then

serving the features in production without having to depend too much on other teams and their priorities.

On the side of

actually connecting to the raw data, I'm wondering if you can dig a bit more into how Tekton and maybe Feature Store is generally interface

with things like a data warehouse or a data lake or some source of streaming data and how you approach

the

unification

of the interface for the end user for being able to just treat all of those different data sources as a generic store of raw data?

Yeah. So Tecton

itself integrates with

data source that are themselves already basically

consolidated

data source. So for instance,

we a feature source shouldn't be confused with a

massive data integrator that integrates across various different MySQL or Oracle databases

across tons of different teams and organizations in a company.

It really integrates

with existing consolidated data sources like your data warehouse, like your centralized data lake, and your centralized

streaming infrastructure, whether it's Kafka and Kinesis.

And then those different data sources are onboarded onto TechTown,

and

the data scientists can

then, without having to even know whether they're fetching raw data from a stream or from a data warehouse,

write very simple SQL code or PySpark codes that only deals with

the raw data transformation into the feature values that they care about.

And then they can rely on Tecton to

actually go back to the appropriate data source

to fetch the raw values.

To give you 1 example here is if you develop a

feature which

which turns raw data from a stream into feature values, you typically have to be able to also run backfills.

And so if you create a new feature or you wanna be able to bootstrap your feature store with historical feature values.

Where do you get the feature? Where do you get the raw data from in order to actually

process and create these historical

feature values.

The streaming infrastructure typically doesn't preserve all of the historical raw data, so you have to go back to a data lake or a data warehouse which

persists all of the historical raw data.

And with Tekton, you're actually able

to tell the system to

look in, say, a hive table for the historical raw data and to look in a Kinesis stream or a Kafka stream for the online data. And And so as you create a new feature,

Tekton would automatically go back to that Hive table, load all the historical raw data,

bootstrap the feature store with the process feature values, and then have a seamless handoff to now run this streaming data pipeline,

which processes the raw

data from your Kinesis stream or your Kafka stream.

Digging a bit more into the actual

manifestation of the feature, you have this set of code that says perform these processes

on the raw data to be able to create this

derived

attribute where, going back to our earlier example,

we have the order history of the last 30 days of this particular user. And so I know that they prefer Chinese food

For the actual

training of the model and for its online operation,

how does that actually manifest in terms of the API? Are they sending a parameter to the feature to say, here's the customer ID. Now give me back their preferred

style of cuisine,

or is it something where

it uses

this piece of code and then generates all the pairings of customer ID

and preferred cuisine

based on the historical data. And then as new information flows in, it keeps those values updated.

On the training side, there is a Python SDK where you say, hey, Tekton.

Give me historical feature values for this set of features.

And then you can say you can optionally filter those feature values to a set of user IDs or transaction IDs or whatever entity of your features are associated with. You can also filter the features that you want to the feature store to return to you

by a time range.

And you can also, of course, say, hey. Don't filter it by anything for these 17 different features. Just give me all the historical data that you have since the dawn of time and for all of the

users or transactions or whatever it is that you've ever ingested into the feature store. And then

the feature store would return a data frame to you on which you do your training to generate your model. And then later on, when your model is running in production,

you need to now fetch feature values for a given entity, for a given instance on which you wanna make a prediction. So for instance, for in the example of the Uber Eats recommendation,

you have

a user who opens up the app, and that user is associated with a unique ID,

and then your back end system,

which hosts the model and wants to make a prediction, will then call out to Tekton and say, hey. Give me

feature values

for

user x, y, and z.

And you would then specify, like, which features do you actually care about. Is it the cuisine that this user has most frequently

ordered from in the last 7 days,

or is it what that user has just most recently clicked on?

And then going back to another element that you brought up in terms of managing the life cycle of machine learning operations is the concept of monitoring

where I know that there is monitoring that you would do on the actual actual deployed model to determine things like concept drift. But I also know that with feature definitions,

there's the possibility of drift there and that you wanna monitor those. So I'm wondering if you can just

give a bit of an overview about what types of information you're looking at when you're monitoring a feature within a feature store and some of the signals that you might be looking at to determine

when you might need to update the definition of a model or redefine it entirely?

So typically, what I wanna make sure is that

the statistics

of your features

do not change too much over time as your model is running in production.

So for instance, for numerical features, you would look at the mean or the standard deviation and stuff like that and make sure that

those stay within

certain bounds, typically within the bounds

that you've observed when you generated the training dataset in the first place.

Because if there's massive drift across any of these statistics, then that would typically mean that you should retrain your model because

the world could have just changed and it just looks different now. And you've trained your model on the state of the world and what it looked like a couple weeks, months, whatever it is ago.

And if the statistics and the data don't look

anymore like what they did when you first trained the model in the first place, then your model may just make really poor

predictions. And so it's super important to

not just monitor

the predictions themselves and to check whether they're drifting,

but to monitor upstream

also how are the features themselves behaving and how are they changing over time.

Because if you imagine that a model depends, say, on a 1000 features or

so, if you were to only

monitor the the predictions themselves,

if anything looks off here, it would be very hard to tell

why does anything look off,

what's the root cause for it,

and you may even

notice way too late that something has changed because it's, like, oftentimes very minuscule, very tiny changes that may only be happening on, like, say, 1 or 2 features

out of the 1, 000 features. And so you really wanna be looking at all of the individual features in production

and remain confident that the distributions and the statistics of them are not changing too drastically over time.

Another aspect

of the feature life cycle is the idea of versioning where you do have the original definition. You maybe then decide based on your monitoring that you need to tweak some of the parameters for

this derivation function, and so then you redeploy it.

If you have a machine learning model that's in production that was originally trained against the first definition of that feature, or maybe you're creating a slightly different

function that is a slightly different intent.

How do you associate

the models that are running in production with the particular version of the feature that is necessary? So doing things like version pinning and then managing releases and rollbacks for the actual feature code itself?

So in Tekton, feature definitions themselves are immutable.

Once you create a new feature and you deploy to production, it has a unique version associated with it and a unique hash associated with it. And then typically, before you use features in production,

you

create what we call a feature service. And a feature service is really a set of multiple different features that a model in production depends on. Say it's a selection of 10 different features,

which

points at 10 different features and their respective immutable versions.

And this feature service now provides

a unique API endpoint

that your model and the microservices that host the model in production

would query in order to fetch the feature values.

And now if you say create a new or unchanged

feature,

that would be a new modified feature

with its own unique version.

And then you would create a unique separate API endpoint,

which now points at that modified set of features.

And once you've trained your model and once you've deployed it, you now change

that model to fetch feature values from this new updated API endpoint.

For the actual process of getting the models from training based on the generated feature data into production and running against live data that's being generated from your online feature definitions,

what is the integration path for you at Tekton for being able to work with the broader ecosystem of platforms that actually

manage the serving and monitoring of the models themselves and maybe some concrete examples of a workflow

from

defining the model,

training it on the feature definition from Tekton, and then actually getting it into production on hosted platform or a self managed platform?

So the interfaces

that Tecton exposes to integrate with other platforms are very generic, which makes it super easy

to use Tectonic

in your modeling platform of choice or in your production system of choice. And so to give you a concrete example,

say you use Databricks notebooks or Jupyter notebooks

or EMR notebooks to train your ML models.

Here, you would PIP install and use Tacton's SDK,

which is configured to connect to a specific Tacton cluster,

and then they're used to SDK to generate training data.

And then within your model, you would say use scikit learn or TensorFlow

to train your model,

then you would use something like MLflow

to package up that model artifact

and to make it deployable.

You also use something like MLflow as your

model registry.

And then from the notebook, you would communicate with your

model serving system, whether it's Algorithmia or Seldon or SageMaker to actually deploy

your wrapped up model bundle too.

So that you now have a model endpoint that is available in production.

And then

that model endpoint would have to be called by some microservice, which actually wants to make a prediction.

Microservice

would integrate

with Tekton's

gRPC

or REST API to fetch feature values for a given set of features,

and then pass those features

into

the request to the model endpoint on, say,

Algorithmia or SageMaker to make the prediction.

In terms of the actual

uses of Tekton and some of the

example workflows that you've seen people build, what are some of the most interesting or unexpected or innovative ways that you've seen it used?

1 interesting use case we didn't foresee ahead of time was that we have some customers who don't only use

Tecton to manage

their data pipelines for machine learning purposes,

but also

for heuristic driven applications

that are running in production

that want to query

some feature about a user and then make a heuristic driven decision based on that. And in those instances, you don't need to fetch

the

historical

data in an offline setting because you're not instead of training in a ML model, we actually have somebody hand implement

heuristics that are running directly in, say, a microservice

And then there, they just integrate with Tekton's online service system to fetch the

fresh feature values for a given user or whatever they wanna make a prediction on.

And in terms of your experience

of building out Tekton and working with customers and turning it into a business, what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?

1 of the things that we've learned over the time

was

that, of course, lots of different enterprises are very different.

Their preferences

may

be vastly different than you have, for instance, some companies who really, really care about open source, and they really care about being able to

customize

whatever system they're going to use in production

or they wanna be able to just manage and host it on their own.

And other companies,

they really just wanna rely on a company like Tech Ton and on an enterprise software which comes with governance,

which comes with strong SLAs, which comes with support and whatnot.

And we really wanna be able to serve the entire market

and serve both ends of that spectrum.

And because of that, we've partnered up

with Feast a couple of weeks ago. We've we've published an announcement about that last week

where the main creator of Feast, which is an open source feature store, which was created at Gojek,

has decided to join

Tecton,

and he has contributed

fees to the Linux Foundation,

and we're committed to investing significant resources

to

make fees really the best open source feature store out in the industry

and give enterprises the choice whether they wanna

use their whether manage their own open source feature store, run it on their own, maybe customize it, or whether they wanna rely on an enterprise proprietary product like Tekton.

And so for people who are considering

using a feature store and they're looking at open source or managed solutions or trying to understand if a feature store is the right choice for them, what are the cases where Tecton might be the wrong choice?

Yeah. Definitely. So Feast is an excellent choice for you if you

care a lot about

hyper

customizability where you can go in there, you can read the code, you can make changes to it, you can fork it, and we really also care about just running it on your own infrastructure, maybe even on prem, and you wanna manage and scale it yourself, Techton is the right choice when you care about having a hosted offering,

if you care about having strong SLAs and support.

And, also, of course, if you care a lot about

governance

and enterprise

security

and ACLs

and things like disaster recovery and high availability,

that's when you would come to TechTown.

And as you look to the near to medium term of the Tekton platform and the business, what are some of the things that you have planned for the future?

Over the next couple of months, we'll be announcing our support

of Tekton, not just for AWS, but we'll also be releasing it on

gcp as well as Azure.

Besides

that, we'll also

make it much easier for analysts to develop their own features,

specifically analysts who are not super comfortable with PySpark

or let's say Python and Pandas, but folks who really care much more about having a very intuitive and simple to use UI.

And then finally,

today, Tekton supports

batch transformations

and streaming transformations as well as real time transformation that are

executed on demand in production when you wanna get a feature value.

And we are going to invest much more heavily in that area where in the future, you'll be able to execute entire tags

of real time features that are executed

in production and that can depend on each other.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

I think that really, for me, comes down to end to end data quality

monitoring and lineage tracking in an organization.

Very similar to how you have things like Jaeger to do end to end request tracing in a distributed microservice

system.

I think we need something like that to track the life of a piece of data. Where does it originate from? Which systems have touched it? How does the distribution of it change over time,

who are, say, the owners of the different pipeline steps,

and something like that gets especially interesting when you're crossing the

analytical offline and the operational online stack.

Well, thank you very much for taking the time today to join me and discuss the work that you've been doing at Tekton and building out feature stores. It's definitely a very interesting area and 1 that I'm excited to see a lot of development on recently. So thank you for all the time and effort you've put into that, and I hope you enjoy the rest of your day. Thank you very much.

For listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links