Analytics Engineering Without The Friction Of Complex Pipeline Development With Optimus and dbt

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode.

With their new managed database service, you can launch a production ready MySQL,

Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs.

Go to data engineering podcast.com/

linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks.

By the time errors have made their way into production, it's often too late and the damage is done. DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests.

DataFold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production.

No more shipping and praying. You can now know exactly what will change in your database.

DataFold integrates with all major data warehouses as well as frameworks such as Airflow and DBT and seamlessly plugs into CI workflows.

Visitdataengineeringpodcast.com/datafolds

today to book a demo with DataFold.

Your host is Tobias Maci. And today, I'm interviewing Nandam Kartik about his experiences building analytics projects with DBT and Optimus for his clients at Sigmoid. So, Nandam, can you start by introducing yourself?

Hi, everyone. I'm working as a senior

engineering manager at Sigmoid,

and I have been working in the data space for close to 4 years. And prior to that, I worked in a few gaming companies and also in b 2 b products.

And do you remember how you first got started working in data?

Yeah. So this happened about 4 years ago.

At the time, my experience

has been on the product.

Around that time, I was working into data,

like, gaming companies.

I've also been working as a full stack in company,

taking care of a couple of products end to end. So I have had the opportunity

to work with a business

manager,

trying to understand the business requirement for the product and then take care of the end to end life cycle

of delivering on that. Like, right from

adding new features, deploying them to production,

fixing live issues that were there on the game, and also improving performance.

So that gave me a flavor of, you know, what it

feels like to work on complete product end to end. So when this opportunity came,

there are a couple of things that really excited me. So 1 is, of course, the big data

domain, which was new at the time.

That was 1. And the other is the opportunity

to lead

projects end to end. So because I had a flavor of

how it is

when you work on projects end to end, so the scale that I understood when I work on these projects is much bigger.

So that those are the 2 reasons why, you know, I got, you know, interested in the role. And that's how I got, you know, landed in the data.

And in terms of the work that you're doing at Sigmoid, can you give a bit of background about what the company is and some of the types of projects that you're involved in?

So Sigmoid is

a data engineering and analytics company.

It's act off as a product company, building analytics product engine

and also a front end for it and slowly evolved into a consulting company as well. So currently,

we take up analytics projects.

We

work on providing,

building custom solutions

based on the client requirements.

So requirements would be from the data space. So any kind of data kind of project, we kind of, like, understand the problem and build custom solution.

A few examples are

cloud migration. So when there are some customers who are on prem and want to migrate to cloud. So that is 1 type of projects that we have. We also work on MLOps kind of projects. We have also

worked on developing data models and productionizing them. We have also ETL pipelines end to end and also work on governance.

So different areas, like, in the data space.

When you start to engage with some of the different clients that you work with, I'm wondering what are some of the core challenges that they're facing when they reach out to you, and if there's any sort of commonality in terms of the

stage of their kind of data maturity or particular industries or geographies that you tend to work with.

We

we serve mostly North America. We have also

worked with companies in South America and a few in Europe, and, of course, in India as well. So we have clients from

everywhere in the world.

In terms of the reasons why, you know, clients come to us is the kind of clients that come to us, some of them are early

in their

data journey, where they are trying to look for an expert

to come in, understand the problem,

and build the foundation

to building their data lake. And so that down the line, a lot of analytics and,

intelligence, like, AI projects can be built on top of it. Some clients

have already enough maturity. They have they're already somewhere in the journey.

And they're they're looking for

experts like us to come in, take up understand the problem, and deliver it quick and, you know, with the best quality. So

so it's kinda, like, some little bit, you know, early in the journey where we, you know, help them a lot in terms of building it. And some clients, we offer the data engineering

expertise in building solutions.

For the conversation today, we're focusing on some of your experience of working with these clients to build out different DBT projects and then using another utility called Optimus to be able to handle the orchestration of that workflow.

And

in general, the overall paradigm of extract load transform with DBT being the transformation step has become fairly widespread and widely adopted particularly

for analytics focused projects. And I'm wondering if you can give some examples of the types of projects that you've built with this approach and some of the types of analytics or types of questions that customers are trying to address when they engage with you and when you're working with them?

About 1 of the projects where I have used, DBT,

the company

was into mining, and they have

different mining sites

located across the world.

And

at the mining sites, they have a lot of equipment and there are a lot of sensors

on them which

generate events,

which get collected by a system. And those

events are used to kind of, like, understand

any kind of issues being on the site to monitor the performance of the various equipment

efficiency and all.

And there was an

existing Excel based reporting

that used to happen at every site. And

what when we started on this project, what we wanted to do is make that whole process more end to end automated

and also

add more best practices on top of it. Because

the reports

are site specific. So there are reports, Excel based reports, and they are very specific to site. So

there are about, like, 10, 12 sites, each site having their own, site specific format

and logic for the report.

And it is also created

frequently, every few weeks, manually as well. So that also introduces the, you know, error part, human error part into it. When we picked up this project, so that was the state of how the reporting was done.

And

and we wanted to standardize. Number 1, we wanted to automate

the whole reporting and also introduce the visual layer. So So because these are all Excel based Tableau reports,

like, bringing in the visual

aspect to it gives a lot more, you know, understanding of the data as well. Right? It brings in a new perspective.

So 1 was to automate, and the other was to visualize the data. And 3rd was to create some global reports as well. So which are more at a global level. So somebody

who is sitting and wants to look at all the sites' data.

So that would basically be the global reports.

So these are the 3 requirements that we had, and

the events data that we were using

was coming into BigQuery on Google Cloud Platform

as raw data. And we have used dbt

to write SQLs,

which would process that data and generate

tables

back to BigQuery again, which are then queried to visualize

the performance of basically, like, calculate different KPI metrics, which would help in analyzing the performance of the site equipment.

As far as the overall kind of patterns or

structures or project architecture that you work through, I'm wondering if there are any kind of core practices that you use as a basis across the board? And as you work with different customers, what are some of the

types of changes or types of customizations

that you have to add in as the particular requirements become bespoke for a given organization or use case?

So in this particular project that I just explained,

the way we were using DBT,

we were using it

to perform, like, transformations where we already have the data in BigQuery.

So some of the patterns of how we have used TBT and also

trigger these jobs on a daily basis,

etcetera. This we have achieved by Dockerizing

Dockerizing the code and running it on Google Cloud Engine,

Google Kubernetes.

And the way we were triggering them was based on schedule. So

the tables that I was talking about, which was used in reporting,

new data gets received

every hour, and we run these jobs every year. So we have a schedule based trigger, and based on the schedule, an event gets generated. And this event

is something that a process that is running as a daemon in the Kubernetes cluster

picks up this event, and based on the message that is there in the event, it recognizes

which

flow to kind of trigger.

So

accordingly, we kind of run the dbt command with some parameters which would trigger a specific flow. So the pattern here is dbt is mainly used for transmission,

right, SQL and transform to create a job. And then we kind of, like, use other services on the cloud platform,

listen

to any triggers. The trigger trigger is scheduled. Based on the schedule, we have a mechanism,

you know, to,

you know, pass the event and trigger. And once the event completes again, we have a notification which can then trigger another job.

For the dbt code, I'm wondering,

as you start to

work with these different customers and get them kind of up to speed on the workflow of writing DBT and building with it, what are some of the, I guess, sharp edges or white spaces in the dbt utility itself that you start to encounter

and some of the ways that you've had to address some of

the issues or shortcomings as you work through?

So I have been using DBT, and DBT as a tool is also evolving.

For a version of DBT, there may be some features that are not present which may be required in the project.

I've encountered 1 issue that I will talk about. So in this project

that I was talking about, so DBT works very well when you have a very simple,

you know, requirements or simple inputs to the pipeline to run. Like, if you want so DBT allows you to so when you run DBT command,

it runs for all the tables

that you have as a target. Now you can, of course, choose a specific table as a target. So you can call dbt run and then the specific table name. But

when you have more complex requirements, like if you want to trigger the dbt pipeline

based on some inputs like start date, end date, and some kind of boolean parameters,

then

it won't support.

So for some versions of DBT. So at the time when I was using DBT,

this feature was not available. So I had to compromise on that feature.

And this requirement of passing start date, date, or any other user

variables to the dbt command

will be required

based on your requirement.

So for example, typically, when we run the pipeline, we run for previous day as an example. We get the data for previous day and we run dbt

to process the data. Whenever there are any data correction that are happening to all the data,

in order to fix it, you will have to run the pipeline again

for the entire duration or the window of data where you have bad data which is corrected.

Now in order to fix it, we'll have to run the pipeline again.

And

if you don't have these kind of additional capabilities

built in, like, where you can specify a start date and end date. Right? If the issue was with data for the last 10 days.

Right? If you don't have this capability feature, then you'll have to run DBT, like, 10 times. In fact, that may also not work depending on how you have configured your DBT pipeline.

But having this feature of providing

start date and end date, where start date can be, like, 10 days before and end date can be, like, yesterday. So you can at once

run for all the previous 10 days and fix the data. So when you want advanced capabilities and advanced features on your pipeline,

dbt may not support. I encountered this with a version of dbt at the time. And this feature was available in the future releases.

So

things to carefully look out for is whenever we are using this tool, we need to understand the use cases that we are looking at and the capabilities that we're looking at and evaluate the latest version of DBT to see that all the capabilities that you are looking to use are offered by DBT latest version. If not, then those are some of the features that you'll have to compromise.

RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control.

Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enables you to automatically send data to hundreds of downstream tools.

Sign up for free today at dataengineeringpodcast

dotcom/rudder.

In order to handle the orchestration

of some of these projects, I know that you're using a utility called Optimus. I'm wondering if you can give a bit of overview about what that project is and some of the story behind that and the role that it plays in this dbt workflow.

So Optimus is more of a wrapper on top of Airflow.

It is a custom

tool built on top of Airflow.

It is more of an orchestration tool where typically

when

you want to build any kind of pipelines

or jobs, we use Airflow

connectors to build them. Now, Optimus has a wrapper on top of Airflow.

Kind of, like, makes it easier and makes it configuration driven.

So as an example,

this particular project that I was involved.

So Optimus is built

based on this concept of configuration driven jobs.

And when you compile this, it translates into Airflow DACHs, which get

deployed in Airflow Server.

So the biggest advantage of another thing like this

is it makes it easy not just for engineers

to build orchestration tools and set up the jobs

because it is configuration driven. And most of the jobs that we typically deal with are moving data from a table to another table.

So once we have the raw data from storage,

like in case of Google, so data is in Google third storage. So once you load the data into a raw BigQuery table,

from there onwards, all the transformation happens

on BigQuery from table to table. So majority of the transformation logic

is done through SQL.

Now with tools like Optimus,

which provide a plugin like bqtobq,

it makes it configuration driven. So you'll have to specify a BigQuery table

as a source, and you'll have to specify a target table as your destination or target,

and you write your SQL.

So you don't have to worry about what is Airflow, what is DAG, how to write an Airflow DAG, how to do the deployment. So it basically,

you know, makes it very easy and

increases

the, you know, kinds of skill set people that can create jobs, production ready jobs. It's not just for engineering. It is also enabling data analysts

and other,

you know, experienced people to also write the jobs.

As far as the implementation, I'm wondering if you can talk to some of the design of the Optimus tool itself and some of the ways that you thought about

how to

design the interfaces

to allow for more of these roles to be able to interact with

Typically,

when

we

don't

have tools like Optimus and, Typically, when we don't have tools like Optimus and

if you want

to productionize any kind of job like Nestle that may come from an analyst,

etcetera.

As a data engineer, you will have to figure out all the dependencies

for the job

and, you know, create airflow tags and deploy them. With tools like Optimize,

a lot of it is taken care. So there is a particular pattern that we follow in terms of how we name the tables,

etcetera.

And

when

I define a job,

Optimus

underneath is able to

identify the dependent tables automatically.

And based on the automatic detection of dependencies,

it creates an airflow tag

when we compile this code. So this, again,

is the intelligence

when you follow the patterns of Optimus and when you write a job, it automatically

figures out the dependencies

and deploys

the Airflow tags in Airflow.

Again, this design

automates

lot of the,

you know, engineering specific

workloads.

Right?

And

allows to define the job based on config.

So in addition

to this job, so there are, of course, lot of CICD steps. So whenever we have any optimus job written, there are, of course, a lot of test cases that run and also converting the optimus config

jobs

into Airflow tag happens behind the scenes when we deploy the solution to production. The compilation happens and Airflow tags,

The whole of Optimus code

is compiled. And again, airflow tags are generated.

Those tags are used to replace the airflow.

As far as the kind

of tactical and organizational

practices

that you

build up and encourage your clients to use as you're working through the implementation of these DBT projects. I'm

Wondering if you can describe some of the ways that that combination of dbt and Optimus

influences

the ways that you think about how to structure the teams and structure the work that's being done?

Recently, there is a trend where, you know, a lot of customers or a lot of clients, a lot of organizations

are using data warehouse tools quite a lot, and this is primarily because most of these warehousing tools are becoming very capable

and performant.

So 3 to 4 years ago, you know, these tools were not that popular and capable, and there were other technologies

that were, you know, doing the job. And recently, these tools are becoming more and more mature. We have Snowflake,

which separates

storage and compute.

Right? So

because these tools are becoming more performant and we are seeing a good adoption by

organizations,

SQL is becoming more of a popular,

you know, scripting language

to, you know, write transformations.

So it basically is

enabling

a lot of data analysts to write

analytic

based queries

and easily, you know, productionize them as well.

With

data warehousing tools becoming more and more performant and capable,

most of the organizations

recently are moving towards using them more, and SQL is becoming more, predominant.

Now with tools like DBT,

which makes the SQL code more modular and also follow the software life cycle. It is very easy

to write the code which is production ready. So prior to this prior to this kind of technology and then the tools,

when we want to productionize

something,

there is a collaboration between

different teams like business analysts and data engineers or data scientists

where

information exchange happens, that is a lot of knowledge transfer happens, and data engineers are responsible to production as a code. Data

warehousing tools became more popular in tools like PPT and Optimus,

filling lot of the automations and creating lot of these wrappers on top of typical tools,

it is becoming very easy

to write

the production ready code at once. So definitely, there is a huge benefit

in using this technology

and this way of, you know, developing

pipelines.

So

this is some of the trend that comes in the organizations.

For teams who are

building with DBT, I'm wondering

what you have found to be some of the

useful

kind of heuristics

or strategies for understanding

when dbt is the right set of tools to use to solve a particular

analytical challenge and when you need to go a different route where maybe you need some more kind of custom coding or custom development or more

complex transformations or pipelines to be able to achieve a particular outcome?

When we have a typical

transformation,

like,

simple to complex, which can be solved using SQL, we can do it with DBT.

And when you have

advanced

processing requirements or advanced techniques to be applied where we have to use a different technology

like Python or PySpark.

So that's where DBT does not help. So normal data processing is something that we can do, but on top of it, if you want to

run any kind of, like, Python code or any kind of, like, spark code, etcetera, you'll have to, like, you know, move away from DPT. Right? So

works very well when you are doing

SQL based transformations. You can use DBT

to make the code more manageable and

write once and easily productionize it. But when you have more advanced requirements and more complex requirements

for processing data, enriching data, applying more

AI technologies,

DBT does not help there.

Optimus,

as I said, is more of an orchestration wrapper.

We can create new plugins. So we have plugins to load the data from storage to BigQuery,

transfer the data from BigQuery table to BigQuery.

We can even export the data from a BigQuery table to back to file system or even cloud storage. So since this is written on top

of Airflow,

based on the automation requirement, we can create plugins. And the advantage is you're creating the plugin once and it is configuration

driven. So you're not

repeating

the same work again and again. You're creating a plugin. You just reuse it.

And it's easy to kind of, like, again,

reuse what is there and then quickly develop things and franchise.

Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend. Io, 95%

reported being at or overcapacity,

With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation.

In fact, while only 3.5%

report having current investments in automation,

85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation,

orchestration, and observability.

Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark and can be deployed in AWS, Azure, or GCP.

Go to data engineering podcast dot com /ascend and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $5, 000 when you become a customer.

In terms of the

applications

of this combination of DBT and Optimus and some of the types of projects that it's enabled you to do or some of the ways that it has empowered some of the different organizations that you've worked with. I'm wondering

what are some of the most interesting or innovative or unexpected ways that you've seen them applied?

So in DVT, in 1 of the projects, I'll give you a little bit of background about the project.

It's a product with a UI

where

investors who wants to invest in companies

would want to visit your

website

to kind of, like, understand about different

companies.

Right? So you you may want to, as an investor, you may want to compare an organization.

Like, understand about the organization, understand about the revenue,

and then,

you know, how do you what what kind of skill set people you have in the organization,

how the company has grown over the years. So you wanted to understand

about a company.

And in addition, you also want to compare 2 or 3 organizations side by side and trying to compare the revenues, skill set, or different things. So

to build this product,

so we have USBT

in a slightly interesting way.

So to bring the data

about the organization so that we can provide all of this information to the end user,

we

have been taking the LinkedIn data and

that LinkedIn data is,

processed

and stored into different Snowflake tables.

You have a jobs table. You have a revenue table. You also have, you know, employee table. You also have the issue of template. So all of the different entities,

data that can come out of LinkedIn was

all created from the raw data, stored it in the Snowflake tables.

So to do this

transformation and processing, we have used EBT.

That's 1. And whenever a user, who is typically an investor,

who comes to the website,

logs in, and

then is searching for a company.

Right? And whenever a user triggers

from the UI

to query about a company,

the request in the back end is also automated using Airflow and using DBT.

And here, the flow would basically query all the entities

that were recently

refreshed with the latest data from Dbt.

And

user specific tables

are populated

using the source

based tables.

So this flow is also done using DBT. So 1 is to

refresh

the data every week from LinkedIn is 1 to keep the data latest. And based on the user

commands or triggers from the UI, again, we have orchestrated

the generate traditional data,

specific to the user requirement using Airflow NPT

and populating the user specific tables.

So this is 1 of the ways that we can use DBT. It's not just for ETL. We have also used DBT

to serve user

requests as well from the website.

It's, of course, not very quick. It takes a few minutes for the request to be served. But here in this case,

that wait time is acceptable.

And as you have been working with your customers and working with DBT some more, I'm curious if there are any

strategies or

approaches to how to think about

structuring DBT projects or

ways

to kind of manage some of the iteration or specific

configuration approaches that have been most useful for being able to

maybe optimize build times or

reduce error rates or introduce useful tests as you iterate on these projects?

So in the DBT project that I have worked on, so we have a project template. So we have created a template for DBT project.

If we need to spin off a new project using DBT, we already have the boiler template

on how to use it. That is 1 pattern that we have seen that we have used successfully.

The other is also,

DBT allows you to write test cases. So this also helps

in any time there is a coaching.

DBT also allows us to write test cases and run them as part of the CICD. So this again helps

in quickly iterating on any improvements that we're doing on the, DBT based products.

And

similarly, in Optimus as well,

again, have a CICD pipeline. So whenever we have any

code change and it does not meet the

specifications of how we need to define jobs or the configurations are incorrect.

So every time we do a code commit, automatically,

the CS50 Python kicks in and it here.

So these kind of typical

practices that we have followed

in both the projects have been helpful,

you know, using this technology and then, you know, finding out any errors.

In your experience of working through these projects, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process of combining Dbt and Optimus or helping companies to

establish and iterate on their analytical approaches with these tools?

So 1, as I said, is, of course, the version of DBT. So

it's important. So 1 of the limitation that I've seen is that you'll have to evaluate the tool, look for what are your needs, and ensure that the DBT offer those capabilities. An example that I've given before. Right? The start date, make end date kind of example.

And

that is 1 limitation. And

Optimus

has the other plugin that I was talking about. So this is a plugin

where which is a wrapper on Airflow, but is built

for Google Cloud Platform.

So if you want to use Optimus on a different cloud platform, I mean, we can take the concepts, but we have to write the plug ins again for that cloud platform. The Optimus is open source, and it has been built on Google Cloud platform.

So if you are using a different cloud, then you'll have to kind of, like, extend it and put in some effort, try to manage.

That is 1 limitation, I would say. It's open source. If you're on Google Cloud, you can use it. But if you're not, then you'll have to extend the plugins.

And DBT, as I said, is an evolving tool. Like, more and more features, more and more capabilities are being added.

You know, before using it, we have to, you know, take it with a word of caution.

For people who are

starting to, I guess, iterate on their

overall analytical approach,

whether they have an existing sort of data workflow or if they're trying to build something from scratch? What are the cases where

Optimus and or DBT are the wrong choice?

DBT is wrong choice if you're not using SQL.

And DBT works for some of the popular data warehousing tools,

like Snowflake, Redshift,

Google Cloud Platform, and a few others. If you are using a non popular

warehousing tool,

then dbt may not support it. So it becomes a wrong choice if if you're not with the popular data warehousing tools and not using SQL.

Also, if you are using a different technology

stack

for your ETL

pipelines for building your enterprise lake, like, for example, if you're on AWS and you're using Glue

to process it, data from s 3 and write to s 3 and expose the data to Athena.

Again, changing your tech stack to use tools like Redshift and that use it could be a huge step.

It, again, depends at what stage of your journey you're in on your

cloud and and your data platform. If you're in early stage, you can. It depends on the type of data also that you're dealing with. Typically, for structured

and semi structured

data, we can load it into the warehouse

and then

use s twin so dbt can be used. But if you have more unstructured data,

then, of course, we won't be able to use this technology.

In terms of your

kind of

predictions or forward looking assessment for the

evolving target of what constitutes best practices for analytics projects, I'm wondering what you see as some of the,

I guess,

influences that might impact change or some of the ways that the

kind of evolving set of best practices is going to

continue to change or shift in the near to medium term?

A few years ago, I have seen a lot more technology

options tools to process big data

as we are seeing with the warehousing tools becoming more performant and capable.

Tools like DBT and Optimus

have come in, and they're making it easy to write in SQL and easily productionize

it very quickly.

With this kind of technology,

it became more and more predominant,

best practice expectations

like traceability

is 1 thing that will become common practice

because everything is in tables. We are using SQL. It's easy to kind of, like,

capture the lineage using tools like Informatica and others.

Few years ago, we had more technology options. So traceability was a bit of a challenge. But with everything

done in data warehouse using SQL

in tables,

it's easy to have the leakage and the traceability.

So whenever there is any data issue, it is easy to trace back and narrow it down to the origin of the issue and quickly fix it. That is

1 I see, you know, a best practice between more and more common

and easy to apply.

And

data quality is also something that is used and applied in ETL pipelines. This, again, is becoming more and more a common best practice. And we also recommend all our

clients, and we also

enforce and we also apply data quality issues

on the

source data that comes in. We initially profile the data,

and we also set up some rules

to monitor the quality.

And

through the ETL, once we have the process data as well, we also

have some data rules that will again check the quality of the data before the data is consumed

for any reporting

which is used by the business

to take patients and also for any other analytics case.

So,

again, with tools like this, you know, and, you know, data quality is also making more and more easy to apply and more common as well

along with data traceability.

Are there any other aspects of your work with DBT and Optimus and some of the ways that you're engaging with your clients to help them build out their analytical workflows with those tools that we didn't discuss yet that you'd like to cover before we close out the show? Any other base we are using, DBT and Optimus? So Optimus

is

a open source tool but built in 1 of the organizations that I'm working.

There is no widespread adoption

of the tool. It's currently being open source but built

for specific automation requirements.

DBT is, of course, a more predominant tool. And, you know, many companies, I believe, I think, 1, 000 plus companies

are using DBT in production. And in terms of other ways, I think I have seen 2 ways that we have used DBT, 1 in the mining company that I spoke about, and the other is

the investment product that I was talking about. So investors would like to see. So those are the 2 places that I've seen dbt use,

you know, use in a in a particular way that I could work.

Well, for anybody who wants to get in touch with you and follow with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

The biggest gap is a lot of the technology and the tools that are available today

creates a lot of dependency on data engineering.

So that takes a lot

of time and effort when you have something, some kind of, like, you know, input that comes in from different teams, data engineer becomes a bottleneck,

or there is a lot of dependency on data engineering.

So with

tools like the bare cost tools and, DBT and optimizers tools to be more and more popular,

I see, you know, less dependency

on data engineering teams, making a lot more teams

more capable and empowered

to write SQL code, which is production driven.

So

that I see as, you know, a trend going forward.

Like, more automation tools where it's easy to write code

And not just data engineers can productionize solutions,

more and more teams can write directly production ready code. Alright. Well, thank you very much for taking the time today to join me and share your experiences

of working with DBT

and Optimus and some of the ways that this combination of tools can be used to

more easily and effectively allow analytics engineers and organizations to be able to build out their different data products. So I appreciate all the time and energy that you're putting into that work and in helping to support this new open source utility. So thank you again for that, and I hope you enjoy the rest of your day. Thank

you.

Thank you for listening. Don't Don't forget to check out our other shows, podcast.init,

which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast,

which helps you go from idea to production with machine learning.

Visit the site at dataengineeringpodcast

dotcom to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a product from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links