Building The Foundations For Data Driven Businesses at 5xData

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode.

With our managed Kubernetes platform, it's now even easier to deploy and scale your workflows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster.

With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.

Go to data engineering podcast.com/linode

today. That's

l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show.

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours or days.

DataFold helps data teams gain visibility and confidence in the quality of their analytical data through data profiling, column level lineage, and intelligent anomaly detection.

DataFold also helps automate regression testing of ETL code with its data diff feature that instantly shows how a change in ETL or BI code affects the produced data, both on a statistical level and down to individual rows and values.

DataFold integrates with all major data warehouses

as well as frameworks such as Airflow and DBT and seamlessly plugs into CI workflows.

Go to dataengineeringpodcast.com/datafold

today to start a 30 day trial of DataFold.

Once you sign up and create an alert in DataFold for your company data, they'll send you a cool waterflask.

Your host is Tobias Macy. And today, I'm interviewing Tarush Agarwal about his mission at 5x Data to teach companies how to build solid foundations for their data capabilities.

So, Tarush, can you start by introducing yourself?

Yeah. Absolutely.

First of all, thank you, Tobias, for having me on the show. Super excited to be here.

Just a little bit of myself. I've been in the data space for the last 10 years.

I started off in Silicon Valley at salesforce.com.

Back then, 10 years ago, Salesforce didn't really have a data team,

so I got to be the 1st data engineer on the analytics team over there.

That team is probably now massive.

Yeah. Most recently,

I

was at WeWork,

which is a super colorful company,

but we did some really cool stuff around data. And I got to

scale the data team up from 2 to a 100 people. So I really spent the last 10 years in data.

Yes. I'm super passionate about what's happening, and I think this is a really, really exciting time for the data space in general.

Yeah. It's definitely pretty remarkable how much the overall ecosystem has grown just in the past year or 2, let alone the past decade.

Yeah.

You know, 1 of the things right now is

just seeing what remote work is doing to, you know, the data

space. The fact that, you know, people are not in the office. It's putting a highlight on

what visibility does the company need

around its systems and around its people.

So I think

we're 1 of the few industries which has been

sort of positively

affecting data.

Yeah. It's definitely an interesting time for everybody and an interesting time to be in data. And you mentioned how you first got involved in data management, but can you give a bit of an overview about what it is that you're building now at 5xdata

and some of the story behind your decision to go that direction?

Just a little bit of context, I left WeWork around 6 months ago.

And

as I was, you know, I was still advising a few companies

on their data strategies.

And what we really realized is it doesn't really matter what you're trying to do. You know, it doesn't matter if you're a Fintech company or real estate or marketing or ecommerce or traditional SaaS company.

Yeah. If you're serious about scaling your business,

at some point, you are going to need to build a data foundation.

You're gonna want to have

visibility into your go to market strategy,

And more importantly,

you are going to want to be able to leverage data to

build products that your customers love, to discover,

hidden insights in your data. So, you know, what we've seen is the only difference is when you invest in it. Now if you're an ecommerce business, you might go do this at 7 figures. If you're a traditional SaaS company, you might do this even pre revenue.

But what we realized is

all companies need to have this foundation.

And the second thing is, it's not very easy

to go invest in data. Right? Data hires are expensive.

And more often than not, what we find is that companies rush to, you know, gain insights from the data. So they might go hire data scientists or data analysts,

and these folks are going to go focus on the insights layer.

And

and to start off with that works,

though what we see is that, you know, at some point

without a data foundation,

everything comes crashing down. So think of it like a skyscraper. Right? If you wanna

build an iconic skyscraper,

you need to spend some time building a foundation.

Otherwise, it doesn't matter how much steel and cement you have. Without the foundation, it just doesn't scale.

You know, what we focus on at 5xdata

is

how do

we teach companies to build a foundation

so that they can build on top of?

Yeah. And I think that another interesting

element in this overall equation is the

availability

of a number of different

hosted and managed systems for being able to do all kinds of data operations that were

previously either very complex or very expensive to do in house.

And so a lot of companies, particularly,

you know, startups who are early stage, you know, out of the box, go with things like Fivetran

and Snowflake and Looker for being able to get their full end to end visibility solution in place, but they don't necessarily

have the

expertise in house

to understand some of the

complexity

that accrues around the data ecosystem where they'll just start pushing data into the data warehouse, and then they might do some transformations or modeling on it, but maybe not in ways that are scalable or maintainable in the long term.

And then they might hit a point of complexity

where they spend a lot of their time paying down the technical debt rather than being able to move forward. And I'm wondering what your experience has been in terms of working with or seeing some of the approaches that companies take who might go down this path of using these managed services and some of the incidental complexity that grows up around it.

And, you know, I really like that you sort of mentioned Looker, Firetran, Snowflake. Obviously, these are really, really good tools, and putting them together and working with them

is, you know, a lot of the secret sauce.

I like going back to really the fundamentals. Right? Like, why are we doing this? You know,

what's the goal from all of this stuff?

And what we've what we're really convinced by is that this idea that

if you can answer

80% of your questions

in a self-service

manner,

then you're far more likely to succeed. So what does that really mean? Right? Like, if your intern at your company, you know, someone who's just joined you can answer really complicated questions. How effective was this campaign?

Is the previous campaign we ran

last Christmas season? What is the LTV

of those users?

You know, if anyone in your company can start to answer these questions in a purely self-service manner,

number 1 is you're giving your employees

autonomy

to answer questions for themselves.

So now this whole idea of fail fast at the start up is built on this model that employees can answer questions.

They can come up with their own hypothesis, test them, and hydrate them. So, you know, the first goal is really to give employees as much autonomy as possible.

And what this really does is it frees up your data team so that they can focus on,

you know, needle moving work. I think data teams are really well positioned to be able to really find

gold in your data and to focus on, you know, what products should you be building next? So what are, you know, or or sort of what are the interesting areas?

And if your teams are really

bogged down by,

as you said, building models and, you know, answering questions and backlogs and all of the typical stuff which teams are spending time on right now,

it doesn't quite work.

So

the goal has always been around autonomy

for every employee in your company, as well as focus for the data team to focus on the high value stuff.

And the way we see it is is we break it down into, you know, 3 different pillars,

which we teach in this fundamental program. And, you know, the first pillar is, you know,

automated data ingestion.

So, you know, traditionally,

what we find is that most companies spend a lot of their times building pipelines and managing these pipelines.

Whereas, you know, with Fivetran, what we're seeing now is really the rise of ELT.

So and the first thing we focus on is moving away from ETL into ELT. And what that allows you to do is, you know, completely automate the EL steps, which is a big advantage.

The second thing we focus on is the data modeling layer.

Now a lot of your source systems have really been structured

for the different applications. Right? So

your

your sort of databases are structured

for your application. Your your sort of front end tracking

is structured in a way which makes sense for those tools.

We help you, you know, go back to what are the questions you're trying to answer, which again goes back to how are your employees gonna be using this, And how do you design a data modeling layer

which can answer all of those questions

in a pretty robust way? So instead of having 1, 000 or 100 of source tables, how do you do this with much smaller set, which becomes a lot easier to manage.

And the third area is really the the self-service part. Right? So what are the BI tools? Again, there's no need to reinvent the wheel over your you know, there is a playbook which has made sense.

The big companies have used it to scale to tens of thousands of employees.

So, you know, how do you set up your BI tools in a way such that anyone can answer questions?

So just to recap, you know, 3 pillars,

ingestion,

data modeling,

as well as self-service.

And what we're really

good at doing now is, you know, making this into a playbook,

which we can teach in 12 weeks. So we help companies build this foundation from scratch in 12 weeks.

That covers 1 of the challenges

in the space is that because there are so many changes happening so fast, it can be difficult to keep up and to understand

what choices to make, how to structure your systems

because there's the fear of missing out, and then there's also the fear of picking the wrong tool. So you're kind of torn between, I wanna move fast and just get something working, but then I also wanna make sure that I don't move too fast and make a mistake that's going to hamstring me, you know, 6 months or 6 years down the line.

Yeah. Absolutely.

Very often when companies are hiring

or assigned to build out a data team, they're solving for immediate problems. Right? We wanna get value from our data.

And

what they're also doing is, you know, if if you're just getting started, you know, you're probably not gonna go hire a VP of data or someone with, you know, 15 years of experience. You're gonna start with someone with a few years of experience who probably understands 1 part of the stack, but is solving for, you know, a local maximum instead of, you know, the big picture.

So

we work on this idea that there's no need to

reinvent the wheel, you know. These are certain best practices needed and, you know, we wanna make it super super easy,

almost like a no brainer

that here's the program, this is how you do it, and you pick a project. Right? So if your goal is to add visibility into your go to market strategy or to, you know,

find parts of your product which customers are using, great. Pick that as a project.

And while you implement the foundations which we teach you, you work towards that project. So at the end of 12 weeks, you've solved that problem, and now you really know how to go do this in future. So instead of, you know, selling you a fish, which is what, you know, what a lot of traditional consulting companies do, our goal is really to sort of teach you how to fish so that you can go do this yourself.

And, you know, once you get to this sort of foundational layer,

that's where really all the fun stuff starts to happen. That's when you can really go a lot deeper

and, you know, machine learning or, you know, more in-depth analysis start to

sort of really come alive for you. But you can't really do that

unless you're at a certain level

of sort of maturity, which is what our goal is.

And so in working with these different companies,

1 question is, is there a particular profile of company that you're specifically looking to work with? And if so, sort of what are the characteristics?

And then also in terms of the companies that you've worked with either at 5x Data or in previous roles,

what do you see as being the impact on the particularly industry or vertical that they're in as to the types of challenges that they're facing in being able to reach

this goal of being data driven?

So, you know, I'll start with the first question.

As I mentioned earlier, you know, it doesn't really matter what industry you're in. You know, at some point, you are going to need to have a data foundation in order to take things to the next level. You know, we go after

series a or b companies.

These companies are at a point where

they have proven out they have a business, and now they've raised money to basically go scale out the business. So at at this point, they start to go hire a data team, which is the perfect time to do a program like us, which just gives you the best practices and, you know, sort of sets you up for success. So a lot of the companies

we work with are

just getting started with data.

They have a data team of, you know, under 10 people,

and they're either just getting started or they have

basically somewhere down the stack of the data train. We also work with a lot of companies

that are traditionally

not tech companies,

but they could be in other industries like real estate,

education,

coaching

has been a big 1 recently for us.

Ecommerce is another 1 where they want to leverage data in order to take things to the next level. So we find that, you know, at some level,

a lot of companies start to plateau.

You know, hustle

from entrepreneurs

gets you from 0 to 1, but at some point, it starts to plateau and, you know, you lose visibility into what's working for your business and what's not. And without these systems, it's really difficult to give your employees

clear directions

and measure their success.

And vague input leads to vague output.

So we work with a lot of companies which have hit a plateau

and now need to go take things to the next level,

and this is very often the first investment in data.

You know, 1 of the things we've seen is that data hires are expensive. Right? Especially

in America where, you know, average cost to company is $100, 150, 000.

It's an expensive sport. Obviously, our programs are

much cheaper than that. So a lot of our companies

were the 1st investment in data. So I think that's the first part of the question. And the second part was around what value

are these companies trying to get to and sort of what are the obstacles

they are facing.

You know, I think I did speak about both of these

a little bit earlier, but, you know, number 1 is

very often, the obstacle these companies are facing is that they are making these hires.

These hires start to go build out some really interesting stuff. But what happens is without these foundations,

at some point, everything starts to get slower.

So, you know, how many people have found themselves in situations where, you know, easy analysis

start to take longer

and

decision making starts to get bottlenecked by data instead of being enabled by data. And stakeholders start losing trust in numbers because you have multiple sources of truth. And often, you know,

small mistakes sort of start to enter our stack, and we enter a world where we have prioritization

based on who screams the loudest.

So, you know, all of these are really telltale signs that what you are building and the way you're building it is not scalable, and you really need to go invest in inside foundations.

And hiring new engineers or, you know, a quick rearchitecture of the stack is fixing the solution, not the fundamental problem that you haven't thought through this holistically,

and you haven't implemented a system which is really scalable.

So, you know, this is a big problem

which these companies are trying to invest in data facing. The other 1 is around those companies, you know, who just don't know any better. Right?

Entrepreneur

build companies.

You know, the entrepreneurs were experts

at what they did. But, again,

you don't know what you don't know. And at some point, if you don't know that you need to invest in this, you hit a plateau.

And, you know, we help go in there

and add all this visibility,

you know, give people autonomy.

And at that point, they can take things to the next level.

The first step that most of these companies need to take is actually bringing on some capacity for people who are able to actually build and manage these systems, either by bringing someone who's already internal up to speed with the technologies and with the needs of the business

or by hiring externally.

And there's sort of the catch 22 there of

you need someone who has expertise to be able to evaluate the potential for somebody who you're looking to fill this position, but then you need to fill this position because you don't have the expertise. And I'm curious

what you see is

some of the useful strategies for businesses to be able to

attract talent or identify talent internally, and then how best to evaluate their potential for being able to help the organization succeed in their data projects?

We are big believers in this concept of

doing data in house.

I think data is a core competitive advantage to your business. So, you know, it's out of anything, this is 1 of the things that you do wanna keep in house.

Obviously, working

externally

is great because that helps you accelerate your timelines towards

towards projects, which makes sense.

But we really push for teams investing in internal data resources.

Now,

yeah, this has been pretty tricky for companies just given the fact that data is relatively new. Right? Well, I think probably 9 or 10 years ago is when it started being recognized as, you know, data engineers and data scientists started being recognized as as real professions.

So because of this,

we're still figuring out what are some of these best practices and, you know, what is the best way to go organize this stuff. So I think that makes hiring for data particularly

challenging.

Where we help is, you know, our program really

gives you a lot of the foundational stuff. Right? So no need to reinvent the wheel. You sort of probably get started

with someone who has experience, but sort of hasn't put this together end to end. So, you know, what we say is you're looking for again, sort of depending on what you can hire and, you know, where you are in this journey. We're looking for, like, a mid level data hire

as a minimum to come

do our program. So, you know,

very typically, you will look for

sort of a full stack data hire who can do some data engineering stuff, but also some analysis stuff. And our program will really get into the specifics

of exactly what to do, what are the step by step instructions,

what are the best practices,

and also access to our faculty. So as these companies are implementing them and they're stuck,

so we can go in there and help them.

So if you do wanna start cheaper, you know, our minimum requirements would be

a junior analyst

sort of who understands

some basic Python and SQL.

But what we really recommend is a mid level full stack data higher, which would give you, you know, the best ROI in a program like ours.

Once the organization has

1 or multiple people on staff who are able to manage the data systems

maintain them.

What are the core components or the foundational layers of the data platform that you have found to be

most useful or most broadly applicable and that you recommend for these different organizations who are just starting on the journey of being data driven?

I think there's sort of 2 parts to it. Right? Number 1 is, what is the, you know, infrastructure stack? And as you mentioned earlier, you know,

you know, Looker, Pipedran, Snowflake, DBT, you know, all of these are what I call best in class vendors.

You know, we are partnered with all of these guys. If you're on the Google Cloud or,

you know, you have a slightly different BI setup, that's fine. A lot of

our stuff is built on first principles.

So while there are certain tools which we recommend,

there are sort of multiple ways to go do this.

So a lot of the time historically has been spent on just operating, maintaining these tools, and building analysis on the tools, which is shared

in you know, we find that,

sort of, typically, companies are spending 80% of the time on just the ad hoc stuff, maintenance, backlog,

answer questions for the business, and, you know, 15, 20%, if that, on the need removing work.

We really

wanna flip that around. If your data team can now spend about 20% of its time keeping these tools alive,

and maintaining them. And, you know, every time when you add a question comes to the business, this is an opportunity

to

add it on to the data modeling layer and expose it as self-service for the business

now that more and more people can answer this. So more and more people can start to answer questions of this. And instead of your data scientist,

as you get more advanced going all the way back to the raw data, if they can focus on this model layer, which we call the business layer, everyone in the company can start to use this. It's gonna make maintenance a lot easier

Instead of having these massive fan out problems

and combining application logic

inside inside your transforms,

you start to build a very clean layer where both your stakeholders as well as your data team can go to start consuming data. So, you know, we wanna shift that. We wanna shift the 80 20

into the 2080

so that now the data teams can actually spend 80% of their time on

on going deeper into insights, on working with the product teams and figuring out what is the market research, what are our customers

using,

and what are some of the features which we should build next.

So

a lot of the data science here, the analytics work,

which

just

frankly, a lot of companies want to do,

but in reality, they never end up sort of doing that. And and that's where we want them to be focusing their time on.

And then on the self serve aspect of things,

what have you found to be some of the

context or training that's needed for different users within the business to be able to

effectively ask and answer questions of the data that they have

and understand how to apply the information that they

receive from that, particularly given the potential for things like

conflicting

concepts of how to distill a given metric or,

different data sources might represent

different information

separately where they might have different scales or different contexts or representations of the data

and just how best to handle the modeling in the warehouse layer to ensure that the self serve layer is actually able to be effective and not cause any

confusion. So just to recap, you know, how do you ensure

success in the self-service layer, and sort of what are the tactics over there? And

what we find is that

self-service

isn't a new concept. Right? We've been using this all the time now. Right? Every time you go to kiosk at an airport and print your boarding pass, so every time you pay for your own own or you pay for your own groceries at at Whole Foods or, you know, at any of these stores, you are using self-service.

What's happening is that this has become so much more complicated

because

the data modeling layer hasn't been set up in a way

to answer business questions. It's still modeled in a way for the different applications,

and it's just stitched together.

So when you combine Mixpanel with your CRM, with your application databases,

and just

expose it inside Looker, it becomes really, really complicated

to answer basic questions because you need to have the context as an engineer

on how this is all stitched together.

What we focus on is

don't expose the raw data inside your BI tools. Actually work backwards and try and figure out, hey, what is 80% of the questions we're trying to answer? What does the marketing team wanna answer? What does the sales team wanna do? And what does the engineering

team do? And work backwards

and design

a model which can answer these questions.

And then, you know and then what the data team focuses on is building out the transforms

from the raw data into this layer. And at that point, when you expose

this clean layer inside your self-service tools, it actually becomes pretty easy and intuitive

to to go use this, you know. We see a lot of companies and even Looker, for instance,

talking about

models like train the trainer and have these

people in every department who know how to use these dashboards, and they become the

role models and, you know, and pushers for using self-service and, you know,

some of that kinda makes sense, but I believe that

success in the data modeling layer is a much, much better indicator of how easily self-service will be adopted

than any training strategies which you can go do later.

RudderStack is the smart customer data pipeline.

Easily build pipelines connecting your whole customer data stack, then make them smarter by ingesting and activating enriched data from your warehouse,

enabling identity stitching and advanced use cases like lead scoring and in app personalization.

Start building a smarter customer data pipeline today. Sign up for free at dataengineeringpodcast.com/rudder.

As you mentioned,

specifically,

not just dumping a bunch of raw data into the business intelligence tool because then

there are just too many areas for

misinterpreting that. And I think that, too, working backwards from the questions that you're trying to answer is helpful because if you're starting from the raw data and then trying to figure out how to model everything, then it can be

confusing to try to figure out what are the high priority items. How can I structure this data in a way that is going to be useful, and just understanding, like, what are the questions that are actually already being asked and then answering those

rather

than trying to design things to be what you might need rather than what you concretely need right now is a useful way to frame the problem?

Absolutely. You know, I think that

over complication

of your modeling layer is

probably the worst thing you can do. It makes it so much harder for the business to just answer the basic stuff.

Going back to WeWork,

for a long time we had

3 core activity streams, which helped answer

majority of the questions. You know, we had 1 which focused on what are all the activities someone can do before they become a member. The second 1 was all the activities they do after they become a member. And the third 1 was information on our buildings, capacity, opening up, occupancy, all of that fun stuff. And through these 3 tables, we can answer

most questions and even super complicated stuff like how effective was

1 marketing campaign in driving users, and how long did those users stay for

once they signed up and how many posts did they make on our internal social network within the 1st month. Right?

That is

50 different data sources.

You know, if you go back to raw data, there's probably a few 1, 000 lines of code. We could answer that in, like, 50 lines or less. So it all boils down to you know, for most businesses,

you shouldn't need to have more than 5 tables

in your data modeling

layer,

which can answer

majority of the stuff. And, you know, obviously,

use these tables as the core tables. Join them with some of your fact tables

to

get more details.

But from

just a raw from a raw capability standpoint of view, but you can do a lot of cool stuff with activity streams.

And so that's how you should be thinking about modeling your data.

Once you have

these foundational layers of

internal staff to be able to handle the data projects somewhere to store the data and be able to, you know, load source data into the data warehouse,

a self-service layer for business users to be able to ask and answer their own questions.

How do you go about identifying and prioritizing

areas of work for data engineers in particular, but other data professionals as well to be able to understand how to have

the most impact

on the business so that your time is well spent and that you're not chasing down a project that's interesting but isn't necessarily going to add any real value to the business.

That links in really well to

the other part of

our program.

So, you know, we offer this fundamentals training, which is a 12 week program, which helps you build foundations from scratch.

Once you have these foundations,

that's when

it's a really cool time to be because that's when you can really move fast

and execute quickly.

So, you know, the second thing which we offer is this mastermind,

which we really sort of bundle together.

And are you familiar with a concept of a mastermind?

Yeah. It's definitely a useful concept. And for people who aren't familiar, it's the idea that you have a group of people who are

experts in their respective fields, but not necessarily within the same field who gather together on a periodic basis to ask questions of each other and get answers from their peer group so that they're able to teach and learn from each other and be able to accelerate their ability to execute on their vision

rather than a more mentor oriented approach where you have somebody who is a few steps ahead of you in a given area, and so you're learning from them, but it's not as much of a reciprocal exchange. And the only reason I asked is I wasn't familiar with the concept of masterminds

when I lived in America. I just hadn't heard of them. And I've been I've been in Bali, which is another story, but when COVID hit at that time, I was living in Shanghai.

I came to Bali, and I got stuck over here. Very quickly, China got locked down, so I was a COVID. So, you know, I was stuck in Bali, which is there are a lot lot worse places to get stuck in. But what I did is I joined this mastermind and in 2020,

that was that was the best decision that I made. It just

accelerated my professional as well as my personal lives. And, you know, this mastermind wasn't even focused on you know, it had nothing to do with data. It was, you know, just a general

purpose mastermind.

So so much so that I really started paying attention to, you know, masterminds, what they are, and, you know, how they use them. You know, it turns out that this is not a new concept. This has been done for the last 75

years.

Now most famously, Tony Robbins runs 1. She charges half a $1, 000, 000 for access to his and, you know, he's got Fortune 50 CEOs in there.

But

Mastermind,

amazing tools at being able to

accelerate your timelines towards

your business goals or your personal goals.

And as you said, it's this idea when you bring in a diverse group of people.

Our mastermind,

it's a business data mastermind. So, you know, we bring in data leaders,

data engineers,

data scientists

into a potent container

with the idea that if we bring in the right people,

now the group collective

is way smarter, way more balanced, and super beneficial to everyone inside the mastermind.

And going back to your question of, you know, at a strategic level, now that you have the fundamentals, what do you focus on? How do you prioritize?

This is where the mastermind really helps, you know. This idea that

being surrounded with this pure group, which is super diverse and well balanced,

it allows you

to strategize,

brainstorm,

find new perspectives,

very often

learn what not to do,

you know. If you wanna get to a certain level, this is what worked for us. This is what didn't work for us, and that's really where the medicine lies.

So our fundamentals program combined with this mastermind

is super potent in once these companies have these foundations, it's then

sort of what do they focus on, what do they prioritize, what to do, what not to do. And that is just, you know, the sort of fastest way of hitting your business goals and of being able to accelerate even your personal life. Like, what I found is that these people

in my mastermind are now just, you know, very, very dear friends of mine. A lot of us started to work together, so it's really helped 5 x data.

You know, what I find really interesting is,

especially

in, you know, New York and San Francisco and some of the biggest cities in America,

community is often

a word which is thrown around a lot. And we work and we call ourselves a community company, and we acquired meetups, so have a lot of context around there. You know, the 2 areas

of community and these Meetups and open source, which I feel could be more potent,

which masterminds really do well at, is number 1 is this area of

consistency. Right?

In a mastermind group, in our group, you're meeting weekly for a period of 12 weeks.

Anything you do consistently

is when, you know, you start to see exponential results, and that's something

which a lot of communities and meetups

don't do a very good job in. And and the second area is accountability,

which is a huge huge piece in any sort of personal or professional development, which is again lacking inside

our traditional

sort of definitions of community,

which masterminds do a really good job in. So, you know, the consistency and accountability

combined with

the

brainstorming and new perspectives and this peer group, which you're introduced in, is just, you know, an ultra important combination and really takes things to the next level. So I think

we are probably the first

company which has focused on

a mastermind which is purely focused on the data space.

And just for our customers so far and just what we're seeing in the market, this is something people are super, super, super excited about. So we are gonna be doing a lot more of that along with our fundamental programs.

It's definitely an interesting approach. And in a lot of ways, this podcast has become my own sort of personal mastermind group where every week I get to speak with professionals and leaders in the industry who are, you know, building the tools that I'm using or who have been using different

combinations of technologies that I can learn from to be able to understand how best to apply it to my own work. And so I could definitely

see the value that is available for just being able to ask a question of somebody

and be able to get immediate feedback rather than having to resort to some of these

sort of longer cycles of asking questions on Stack Overflow or hunting down the Slack group for something and then hoping that somebody with enough context can respond to answer your question in a satisfactory manner?

I think that makes a lot of sense. Right? And and what really comes to life for me over here is

as you get immersed in these conversations,

many of them might not be directly relevant to you at this point. Right? Like, a lot of this stuff could be, you know, a few people talking about, sort of talking about concepts which

you might not be at a point which are super relevant.

But listening into these conversations and listening into different people's perspectives

really strengthens

your overall understanding

of a topic.

So when these conversations do become relevant or when you start getting into pros and cons and really the more tactical stuff of there are 100 different ways to go do this. How should we approach it? That's when this overall

understanding

of topics and these new perspectives

are really shining. And I bet just with your experience and, you know, all these amazing people, data scientists and engineers and leaders that you have been speaking with, you're probably a very, very good person to go to if someone wants to get started with data. And I think a lot of your success

sort of comes from this fact that you now probably have, number 1, obviously, an extremely potent network,

but also to so much contextual knowledge of what are the problems out there and what are the pros and cons

of these different approaches

in solving these problems,

which allows you to pick the best tools

for the problem in the moment. Yeah. It's definitely been a great experience and 1 that I definitely wouldn't have anywhere near the amount of understanding of what tools to apply when if it was just acquired through

trial and error and working on the projects that come up in my day to day because I've got an exposure to

a much broader range of problem domains than I would in any single

occupation

or job role unless I was maybe switching jobs every day of the week and cycling back to a few of them

periodically. Yeah. For sure. I 100% agree. That makes sense.

For any companies that you're working with who are

maybe further along in the journey of building out data capacity,

and they have hit the point where they're stalling and they're not able to make meaningful forward progress because they're spending so much time trying to pay down technical debt from decisions they've made early on.

What are some of the

common mistakes that you see

them having made that are avoidable for other businesses who are starting out or some of the lessons that you've learned from those organizations that have hit that wall that you've applied to the way that you structured your

lessons in 5 x data?

So I think

with companies that have

already some traction or

or a lot of traction, you know, they already have data teams and they're answering questions and

all of this fun stuff,

what we find is that

they are probably doing

by trial and error.

That's how most people learn. Like, to be very honest, that's how

I learned on what to do, what not to do. Very often, it's what not to do. And what 5 x data's program is really based on is, you know, my experience in the last 10 years trying to get this right. So, you know, very often if companies have already

sort of

got started on this journey and they already have

maybe BI tools and ingestion and doing things in a certain way,

it's still super valuable to go do

a program like ours

for a few reasons. Right? Number 1 is we're structured

very roughly into 12 modules, and each of those modules focus on a core competency.

So even if you're doing

BI really well and, you know, and you have your hand on ingestion,

optimizing

any of these areas, even if 2 or 3 of these modules are relevant

to you, then improving your efficiency by 15, 20%

is extremely valuable. Right? You have a data team of 10 people.

Improving your efficiency by 15% is an extra higher and a half. That's super, super valuable and especially when you sort of start to think about what does an average cost of a data higher cost, you know, a $150, 000

versus the cost of a program like us, which is $15, 000.

You know, even if you get 10, 15% efficiencies

from doing something like this, and you learn about how to model your data or you learn about some of the successful models which where you learn about some of the successful data models which other companies have used and you implement 1 of those to structure your data, again, there's no need to reinvent the wheel,

then a lot of value which just comes from

optimizing a few things. And what we see is that even with teams which are super advanced,

the industry is changing so so quickly.

And, you know, tools which were relevant 2 years ago very quickly become irrelevant.

So I spend a lot of my time constantly in conversation with, you know, senior data leaders or data engineers or data scientists figuring out, hey, what's working at this company? What are some of the pros of this approach? And with these pros, what are some of the cons?

Just to make sure that

we are a step ahead of what's happening out there, and we can change our programs

to better reflect the current state of the industry.

And I think the fundamental training and the mastermind are really, you know, the first few steps in the puzzle. What we would love to get deeper in is the next layer of programs.

You know, building data teams is something I'm super passionate about, especially with my experience in WeWork.

How do you organize data teams? What part of the business should they report in? How do they work with software engineers? How do they work with stakeholders

who are consumers of data? You know, that's a whole program which we're super excited about building. And then other programs like how do you build data products, you know. All these companies are starting to collect some really, really cool information,

But how do you take this information and then go build products around it with this concept of, you know, respecting privacy, which is where we're heading to. So this concept of privacy by design,

but still being able to leverage data to go build data products that your customers love and absolutely adore. So these are, you know, other programs which we'll start getting into later on this year, which will become the natural follow ups to our fundamental and mastermind programs.

And as you have started down this journey of building out the business and the team at 5x Data and starting to work with companies to help them level up their internal data capacity,

what are some of the most interesting or unexpected or challenging lessons that you've learned in that process?

What we've learned with companies across a few industries

it's actually

kind of a sort of beautiful insight is

ultimately

every company

wants to leverage data. Every company has seen the value in it. I think data is the new oil, which

is a famous article The Economist wrote a few years ago.

Is something people really relate to, and

everyone wants to be

leveraging data to take things to the next level. So what we've seen, which is really beautiful, is that companies want to do this, and very often they have no idea how to go do this.

So it doesn't really matter what industry they're in. What's been really cool to see is a lot of companies go want to do this.

What's been challenging is

this is really complicated stuff.

Am I doing it right? Like, how should I do this? Or we don't have the resources

to go do this. And a lot of that has been true. Right? Like, data hires are expensive, and getting it right is not as trivial as it seems. Just like in marketing, you know, it's no longer enough to go have a few posts on Instagram to do marketing. You know, there's a whole iceberg of things which happen underneath the surface.

The same thing really applies in data. So I think the challenging thing is really the mindset.

We love what you say you're gonna do.

Are we actually going to be able to get there? We're less technical than you think we are.

I think that's been the biggest obstacle so far.

It just allows us to focus on our program and, you know, making it accessible for

as many people

as possible. Our goal this year

is to help 500 companies

either through our mastermind or our foundations.

So, you know,

we've been built in a way where, you know, my purpose and the purpose of 5 x data is to build, to serve as many companies as possible. And I'm a big believer

that, you know, we're living right now in this golden opportunity

where

if you leverage

data correctly,

you can use it for exponential

growth. And I believe that, you know, at some point in the next I'm not sure if it's 2 years or 3 years or 5 years, you know, more companies

with advancements

in the tools

and the ecosystem

is that companies will start to get more and more insights for free and more analytics.

So it's gonna start to level the playing field.

But right now, they exist a sort of golden opportunity, which if you do this,

then you can leverage data as a competitive advantage and you can grow faster.

So, you know, our purpose is to help 500 companies this year.

Being on shows like this really allows us to talk about

we have built this program to make this as consumable and as

simple as possible.

And if you're on the fence about investing in data and you have no idea how to go do it,

then we can help educate you around that mindset and then give you all the resources you need to go make it happen.

As you continue to work with these businesses

and try to stay up to date with what's happening in the industry, what are some of the particular trends or specific technologies or groupings of technologies that you're keeping a close watch on for your own uses or for being able to leverage in these programs that you're offering to businesses?

The biggest I keep going back to this. I think

what

was, in my opinion, the biggest thing which happened to data was this concept of warehouses. Right? And and sort of what Snowflake did recently where, you know, the idea of being able to separate out storage from compute.

And with storage becoming so cheap,

the idea that you can put in a lot of data

inside your inside your storage layers,

and you can run compute jobs

separately.

And for those of you who are not familiar with Snowflake,

I think

why they found so much success is, you know, instead of having dedicated

hardware with tools like Redshift, Vertigo, all of these other data warehouses had, what sort of Redshift allows you to do is spin up resources on the fly, which makes it really, really affordable

to then use a warehouse

and sort of leverage data, and then the BI tools are a layer on top of that. So I think with Snowflake now IPO ing and sort of getting all this sort of traction that it has got, it's gonna make the data warehouse architectural layer even

more appealing.

So there's gonna be a lot of advancements around

the layer on top of that, which is your BI, which is analytical tools, data science

y stuff on top of this, and that's an area which we're sort of very closely following. I think Looker does a really good job on data discovery and reporting.

We're super interested, and I'm personally

sort of looking out for stuff around

data modeling

and around really surfacing

these insights

back into the business and making that process a lot easier.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

You know, there's a lot of great tooling out there. Obviously, there is room for improving what we have in all areas of it, ingestion, modeling, analytics,

machine learning, all of this stuff. But at this point, there's no need to go reinvent the wheel. Right? A lot of the tools which exist out there are

doing extremely good job at it. We really help stitch all of that together to make it as easy and as consumable

as possible.

So, you know, I think the 1 thing I would focus on really is there's no need to reinvent the wheel over your, you know, self continue using

the awesome tools which work really, really well together.

And sort of focusing on that

is much better ROI than than sort of trying to optimize

for a few things these tools might not be doing as well.

Well, thank you very much for taking the time today to join me and share the work that you're doing with 5x Data. It's definitely a useful pursuit to help more businesses

understand how to

implement their data stacks and the technical talent that they need to be able to be successful and become data driven and improve their efficiency and their ability to serve their customers. So thank you for all the time and energy you're putting into that, and I hope you have a good rest of your day. Awesome, man. Thank you so much for having me on the show.

I really appreciate

your time, and I look forward to, you know, helping our businesses and entrepreneurs

really take things to the next level with data.

Listening. Don't forget to check out our other show, podcast dotinit@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on Itunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links