Hire And Scale Your Data Team With Intention

Hello, and welcome to the Data Engineering Podcast, the show about modern data management.

When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode.

With their new managed database service, you can launch a production ready MySQL,

Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs.

Go to dataengineeringpodcast.com/linode

today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.

Your host is Tobias Macy, and today I'm interviewing Tripti Nattu about strategies for building your team from the 1st data hire to post acquisition. So, Tripti, can you start by introducing yourself?

Yeah. Hi. I'm Tripti. I work at Block right now, and I worked in Fintech my entire career. And I lead

our merchant

product data science team. And do you remember how you first got started working in the area of data? Background is in engineering, undergrad in computer science. And I did my masters

in information systems management. And there, the course itself

was to marry information systems and, like,

help present it in a way data driven insights,

like a bridge between business and

tech. And so that's where I got

deep into

just learning our data and using data to tell a story. And using data to derive insights and help business make decisions.

And ever since, I've been in data.

And so in terms of your experience working in the data ecosystem, can you just give a bit of a overall picture of the trajectory that you've taken in terms of the team sizes

and organization scales that you've worked in and some of the ways that that has

influenced the way that you think about the work that you do and how to interact with the organization as a data professional?

Early on in my career, I was kind of in boutique management consulting firm.

And so it was heavily data driven

and putting it in frameworks

to solve

attack the problem and come up with a solution.

So

from then on to

various big organizations,

first, our founding team member role within portfolio, for example, Uber Eats or, like, Amazon

in their gift cards domain.

It was more like applying those problem solving skills, but

using those to

then

build from scratch.

And

there was a stark contrast

where I've quickly learned that I tried more in the 0 to 1 space.

It's harder, but it's also more gratifying

when you see the results and you can say, I put my stamp on this. You know, end to end, the details of it.

In terms of the

0 to 1 space that you said you have had the most fun with and is

honestly probably what a large number of people listening are dealing with because of the fact that, you know, data engineering and data science are still

relatively new professions

and still kind of growing into their sort of mature phase of the life cycle.

I'm curious if you can just talk to some of the

traits that

you have found most useful in yourself working in that space or as somebody who works with others at that stage or hiring from, you know, 0 to 1 or, you know, 1 to 2 stage,

the types of capabilities that are most valuable to be able to actually be successful

as 1 of the first or second data hires in an organization?

Yeah. That's a great question.

What I personally

realize is you have to in these situations as the first or second or just 0 to 1 space, smaller space.

1st of anything

building something, you have to fall in love with the problem.

It really has to

derive from somewhere where you are

obsessed with solving it in a way.

Because it's not necessarily

limited to data and all the skill set that you have learned in school or in your previous job. Sometimes you have to stretch beyond just, like,

understanding

how the data is inputted

and knowing the finest SQL, which is tight and, like, efficient.

That's not the end goal. It's fine if your SQL is

a little inefficient. But are you getting the job done by

stretching the limits? 1 good example would be right now so so I joined Afterpay last year, which is now Block.

But being a smaller organization,

a lot of the data didn't really make it to the warehouse. A lot of the nuances of the data

or the verbose data was still in logs. And these were

Sumo Logic is the tool we use.

Engineers, first of all, write to the logs. So if it's not even written to the logs, you can't read it. And so they are more equipped to actually read it and, like, make sense out of it. But that's something my team does. I did that too because,

you know, you just have to cut to the tail and, like, get the ends So that might not be the familiar tools that you're used to. The the the WEGR, AWS,

your cloud, your systems.

So that's what I mean by having the mindset

to go beyond.

And, you know, sometimes it could just be an Excel spreadsheet or your conventional tools, or like I said, other tools and like other job functions might be familiar with it, but you just stretch and you get to the end. Yeah. 1 of the kind of phrases that I've always leaned on to kind of characterize that stage is, you know, make it work, make it right, and then make it fast.

Right. Yeah. That's a good way to put it. Just solve it, basically. What you're saying is it's like because no 1 has the time or the resources to teach you anything. You just, like, from what you have, you solve it and

put trust in your work product. And then you would learn something along the way as well, but also get opportunities and build on it. And then you kind of know the system

beyond just your narrow focus, then you will know that in entirety, which is immensely valuable, actually.

And as somebody who

is the 1st

professional in a given problem domain within an organization or somebody who is managing somebody who is in that role of, you know, being the 1st data professional, whether it's the 1st data engineer, data scientist, etcetera.

What are some of the useful metrics or kind

of objectives to be able to measure against to determine whether or not somebody is being successful in the role

or some of the ways that you can set those expectations

in a way that's useful to let that person gauge their own progress on that journey to being successful?

This is a relatively maybe new term, but I like to tell data professionals that you're the product manager of your own work. It's like and I've played the PM role in the past as well. Like, you know, the smaller you are, the first person you are, you're playing multiple roles. And I kinda

like that because you're managing your own

road map, so to speak. And I feel like

especially in the smaller space and if you're starting off at either a startup or taking on a big scope beyond what we just discussed,

I think play the role of the product manager of your

work product.

And what I mean by that is just take control of the narrative and, like, also the road map and what you're delivering, when people should expect it, what are the underpinnings of it, instill trust in, like, where it's coming from, and drive it, like, backwards from the solution that you want

to where you're getting at. And what tends to happen without that is you might get caught up in the rut of just, like, pulling

data. And that those 2 words are, like, highly like, any data professional would hate those words. And so if you wanna take control of the narrative, then you wanna get away from those. Maybe make a 80 20 rule that it's not a no no. I will do that sometimes

to, you know, unblock people.

But what I wanna focus on is the more

solution driven. Like, what are we solving at the end of the day? And these are the data insights that I have that will help us move forward.

And finding those opportunities in the business so that that helps with the prioritization, taking the end,

business, you know, in doing. Yeah. And in terms of the kind of steps that

you

you might decide are acceptable shortcuts that, you know, if you're keeping your purist hat on is absolutely, you know, verboten is, you know, oh, well, need to be able to build a report on the number of transactions that we've done in this system. You know, if you're doing it the right way, we're going to pull that data out, put it into an analytical store, and then do the analysis. But I need this done yesterday, so I'll just write a SQL query against the production database

and, you know, just make sure that I write it in a way that it isn't gonna take the system down or that it's going against a read replica. You know, it's definitely a situation that I found myself in. And not always happy about it, but it gets the job done, and it gives you the space to be able to say, okay. This works. Now I'm going to work on making it right of you know, I'm going to build up that extra infrastructure and the tooling and the workflows to be able to do this in a way that

is more proper from an architectural and best practices standpoint.

And then, you know, once that's working, I can actually iterate and kind of build up that flywheel of capability.

Yeah. You're absolutely right. So I think sometimes you just have to get the job done like what you just mentioned. It's like cut to the chase and find the solution. And then if it's speeding, then find avenues to maybe automate. How can it be done faster? And how can

the audience just get to the solution by themselves? So it cuts you as the middle person

relaying that, and then it also frees up the data person's bandwidth to work on other things and more impactful things. But getting back to your like, what should be the OKRs if I understand the essence of it? And I think it differs. Right? Especially in a more

smaller environment if you're going in in a more

smaller space, at the end of the day, everyone's job is to make sure the business is moving forward and you're

providing valuable insights. So it's slightly different.

In a more

medium to slightly established organization, you probably wanna make sure that based on the function, even if assuming that you have different functions and you're not a full stack,

basically. So smaller to organize, you might be your own data engineer and also a business intelligence person and also a business and and data scientist, machine learning, what have you. So all of the full stack. Right?

But assuming you're in a job function, you wanna just make sure you're

based on where you are in the stack, your top and bottom stack are your stakeholders and they are happy. I think your OKRs have to probably derive from that as to, like, your data engineering. You wanna make sure your tables are reliable.

You know exactly when the batching is happening, when the downtime, and it's just the availability,

basically, and governance and quality.

Because a lot of people are not gonna do that for you. They're just gonna rely on whatever is in the database. So don't make them do QC for you, for example. And so on and so forth. Because everyone has that footing in a way. Right? That they have stakeholders on either side

that they have to manage. So I feel like that could be 1 way to look at your

work product and measure yourself

apart from other, like, you know, efficiency

and, like, time

needed and so on and so forth. 1 of the other interesting things

is that

as far as being the first hire in an organization for, you know, data specifically, since that's what we're talking about, is that

you're also

kind of setting expectations

for the capabilities

that the organization will grow into.

If an organization doesn't have anybody to produce analytical reports, most people are just gonna be fighting with Excel and doing the best that they can and getting conflicting results

and, you know, dealing with issues of versioning and, you know, no data quality to speak of.

And so as that first data hire, you can kind of set that baseline of these are some of the things that we can do right now because,

you know, when somebody

does hire a data professional, a lot of times, they're going to say, oh, I've got a data professional, so now we can do everything. We can, you know, do what Netflix does because we have somebody who does data. And so part of the responsibility too is being able to set the kind of appropriate expectations for the organization to say, these are the things that we can do right now. If you want to do, you know, a, b, and c, then these are the prerequisites to get there. And I'm wondering what you have seen as some of the

useful ways to kind of convey

that understanding and set those expectations and

how you are able to work with the organization

to

kind of priority rank what your capabilities are at a given stage of sophistication.

Yeah. That's also another good 1. I think I would categorize them as who

needs to understand that first. And if it's a technical audience, and then what I mean by that is if it's a heavy tech

company. Right? And I have seen both sides of it. They might themselves be

understanding and know, oh, we we I know we don't have half of these things,

and some of the data is being manually fed. So we will take that variance in the results. And so that's like your easiest stakeholder or audience to manage.

Hopefully, if you're reporting into, say, your CTO or you have a technical cofounder, that is probably the easiest route because they understand the challenges of gathering, cleansing, all of that we talked about.

But on the other hand, if it's a more nontechnical

audience that is expecting something off of you and

don't understand the challenges underpinning why things can have a variance

or 1, their dupes or what have you. Then I would say

explain it to them at the best of your ability as to, like, what goes on in the pipeline and the food chain that we talked about.

And that may or may not be

well received or, like, they just easily digestible. And so

just show them with an example. I think what works best is just and this I've heard learned probably the hard way. Literally,

let's take an example. You have some dashboard, and it shows

2 of the same

items there. And people are now displaying this dashboard and they're like, oh, there's there's dupes. And you can then explain saying, yes. This is exactly what I was talking about. There are dupes because that's how it's fed, because this is a manual input source or what have you, whatever the reason might be. And that might be a more empathetic way for them to understand what

you are dealing with because they might not know the technical challenges that go in in order to just put a dashboard together.

So bring your audience along. And based on their background and expertise, tell them in the language they will understand. And sometimes

just showing error. And I know we are all perfectionist and we wanna only

provide the best outcome and work product out there. So this is the only

scenario I would advise.

Show it as it is. And it helps make your case stronger.

As long as you know exactly why it's happening,

that will speak volumes. So, yes, you're right. Hopefully, that answers the question. But it's kind of like, yes. Just because you have a data person doesn't mean you can boil the ocean. You have to be very strict with your stack ranking. And that's where, again, be playing that PM, product manager is immensely valuable as to, like, you told me 10 things. What is the ranking on those? Because not all 10 can be important. Everything's important and nothing's important.

And so it forces them to think in the way what is really crucial,

and then you can give them a high level sizing as to this can be done faster. This is a tremendous amount of work. Blah blah blah. And then negotiate

with, like, I can get you this faster by some high level assumptions. It might not be accurate, but what do you wanna use it for? So go back and forth with that negotiation, and you'll have a not only will you instill empathy, so next time they know exactly what your role is and what you have to

go through to even probably give 1 slide or 1 Excel spreadsheet,

But also they know that you can work with them, and there's a negotiation

happening. And that forces them to stack rank.

And then once you have gotten to the point where you have a data hire, they're onboarded, they're, you know, making progress on their objectives, They've helped to educate the organization on

what

their capabilities actually are and what you can actually achieve in a reasonable amount of time given the resources that you

have. What are the indicators that will say this is the right time to now add another data hire or, you know, to what degree you should scale the team.

And as you do move from that, you know, 1 or 2 people on a data team to,

you know, a midsized team, which depending on the scale of the organization, can have very different meanings. You know, what are some of the

ways that you can look for that signal to say, okay. Now is the time to start scaling. And as I scale, these are the capabilities

and backgrounds that I want to look for in those new hires.

Yeah. So first of all, I think your first person should be that

mission driven

with someone who can put multiple hats on and just gets the job done. Right? So we already talked about that. So after that, having that foundation,

by that time, I think you kind of know

what you're dealing with in terms of your work product, your organization

scale, the richness of the data, the quality of the data, and the tools and so on. And then volume too. So I would say the second person

typically, and there are obviously exceptions to every rule, shouldn't be like ML or,

highly sophisticated and, like, a narrow

skill set because

you might not have enough training data to even build models and then they will won't be 100% occupied. Or maybe they're used to having a clean dataset

and working with that. So be very cautious of that. Your second

or, like, from then on should probably

be a bandwidth well

invested in infrastructure. So data engineering, making sure you have

some level of taking the Kafka logs, the raw, even Sumo

logs or whatever

log that you have.

Working with your engineering team and how

rich that infrastructure

there is to make sure now everything is getting into the warehouse

and flowing through that. Someone who has built and worked with cloud infrastructure, then picking the right cloud infrastructure. And that can be very expensive too. And most companies, assuming it's a start up or a smaller company, are not gonna build their own clouds. It's rarely I've seen

I mean, obviously, Amazon. But where was the only exception I've seen that has their own

cloud and data infrastructure?

It can be very expensive. So someone who has worked with these cloud infrastructures and picking the right tools and vendors and

setting that up so that the analysis can be now done faster, but also more reliable. And now you're starting to build that pipeline. Initially, the first aid people, I I would say, try to make sure they they can also wear multiple hats. But if they have now you can now start having over indexing on 1 versus the other. And if they have infrastructure capabilities,

that will help you a lot.

Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code.

Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks.

By the time errors have made their way into production, it's often too late and the damage is done.

DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests.

DataFold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production.

No more shipping and praying. You can now know exactly what will change in your database.

DataFold integrates with all major data warehouses as well as frameworks such as airflow and DBT and seamlessly plugs into CI workflows.

Visitdataengineeringpodcast.com/datafold

today to book a demo with DataFold.

As you are hiring and starting to grow the team beyond just the first 1 or 2 people,

there are definitely certain domains that have a certain degree of requirement to understand

the specifics of that problem space to be able to be effective working with the associated data. I know that your background is largely in Fintech, and I'm wondering if you could just talk to

some of the types of understanding of that problem space that are most necessary to actually be able to

work with that data, understand the ramifications

of different types of transformations or some of the restrictions

from a

regulatory or organizational level that you need to keep in mind as far as, you know, what data to propagate where, what data to obfuscate at different stages of the life cycle, and just some of the ways that the specific

industry or problem domain that you're working with needs to be screened for in that hiring process to make sure that that data professional is going to be successful in that organization?

That's a great question. So domain specific hiring is also

or rather skill set is also important, which is in Fintech, what you're dealing with when it comes to especially data is the customer data. And they typically have payment their whole card information

and ZIP code. And these are considered PII fields, which is personally identified

information

beyond other things.

And there are laws around that and how you store them, how you encrypt them,

and even access controls. So your organization

might have people

from all the way operations, sales,

marketing

to engineers and product and data people. And so and even within that, there are risk professionals and then marketing analytics and then you have BI and whatnot. So every

role

needs to only be granted access to what they

need and not an overarching

blanket. You know, read, write, access, so to speak, because you're literally playing with sensitive

information.

And there are laws around that. You take trainings regularly

to make sure you're compliant with that and

so on. So that's just the basic

difference between just storing,

say, social media data where you may not have real names all the way to it's literally words,

and say tweets or whatnot.

To here, you have transactions

and money

and bank information.

Like, you know, if you save credit, like a lot of websites, especially retail will have, like, save my card information for next time easy checkout and stuff. Right? So you might be saving

that as well, and this is extremely

sensitive.

And if there are any breaches or hacks, you have to report that, especially as a public company. But I believe also

as a financial institute. So just having

the knowledge that all this is important, and what are you storing, and what are the ramifications

of a hack or leak comes part and parcel of it. As you are

recruiting for and screening for people who are going to work in 1 of these domains that does have that extra requirement of understanding

the details of the data that you're working with and the

sort of regulations or the

statistical requirements for being able to report effectively.

In the event that you are talking to somebody who doesn't necessarily have that existing background,

what are some of the ways that you can help them acquire that understanding

post hire or understand whether or not they

will be able to effectively upscale into that arena?

Yeah. I think they're maybe starting with orientation itself, you wanna, like, from emphasize from the very beginning that what are you dealing with when it comes to data over here. And then

just, like, I've seen

people, like, you know, just

copy, paste, maybe an email thread, sometimes the whole 16 digit number of a credit card. Now that in itself

may not be a PII field because there are other elements too, but that is a big no no and not allowed. So you have to, like, encrypt a lot of that. So simple things like that, which especially coming from different fields, they might just think, oh, this is just something that I found in the

column called credit card and I copy pasted that. You have to be very careful about what's going in any form of message within the organization,

let alone outside the organization.

And therefore, I think storing it might also have to be encrypted. On top of that, if you're in a risk team so I work extensively

in in a risk team, basically. And you're doing manual reviews

to understand what was the reason why this is a fraudulent transaction.

So you're looking at the profile of the user,

and you have information

way more than the user

needs to, like, you know, that has probably shared with the organization,

and you can't use it for any other

purposes. So risk team has the

highest, I would say, data access capabilities,

but also

the most sensitivity.

And the training should

be geared towards that because it could just happen to be your friend

or nemesis. And you can't do anything about that, especially

working at Uber. We had very strict

policies

around when you get into someone's account,

first of all, you have to document why it was necessary got was a UUID, and you you went down the rabbit hole, all you got was a UUID and you you went down the rabbit hole of, like, figuring out why the fraud happened and it happened to be a celebrity. You still, like, you have to

to report and, like, document everything why you even went there, or it would be an immediate termination. The consequences

could be very

high. So just you are playing with fire there. But but that was an extreme case. Not every

FinTech needs to do that. But, like, data is important, and it's sensitive, basically. And, like, you can triage based on

little facts, and so just be mindful of that.

On the other aspect

of growing the team and scaling is the ways that you think about the organizational topologies and how you wanna structure the team, where with data professionals in particular, it's challenging

because there have been some teams and some organizations that say, we want a centralized team where all of my data professionals

live in 1 group. They all work on data across the entire organization, whatever that might mean.

Whereas in other scenarios, they say, actually, we want to have maybe a core data platform team, but then we want to have data analysts or data scientists embedded in all of the different business units in the organization.

And I'm wondering what you have seen

in your own work of how you've approached that topological question and some of the different pressures that might pull or push you in 1 direction or the other?

Yeah. I've seen both, and I would say so in a pure like, if you have a chief analytics officer or data officer, which is rare, but maybe it rolls into a CTO.

There's

emphasis on best practices

and making sure, like, your data skills

rich and the bar is high and, like, you can learn because you are kind of in that

peer space of everyone kind of working on similar things, and you learn from each other quickly and up that skill set

fast.

I personally think it all depends on

what you wanna build

in your

career and how you wanna progress your career. So the other end of it is I've worked a lot in, like, the GM based model where a a business unit

has

its product engineers,

data scientists,

marketing, what have you.

And that's

nice that everyone has the same mission, and you come in with your focus area

and concentrate on that.

And we have that right now. And what we do is all the the DS people then meet as more like a hub and spoke model. And then we meet and make sure we are sharing best practices, making sure there's a bar we are maintaining while hiring.

And so that's more loosely tied and a dotted line. And

the mission is very business focused.

So it depends on whether it's purely

Databricks or Snowflake, that kind of pure tech, SaaS kind of an organization

and everyone is supporting the tech

underneath

it, or it's

a Fintech or a business organization

where

the business function drives it more.

And there's no right or wrong answer. I feel like it just depends. If it's a GM model,

you have to understand that the path might cut off after

a certain

level of rich d s function, and then your next step might be

g m. And are you willing to take on that function? Or are you more business focused? Because those are the conversations that will be happening on a more tighter level.

Versus do you wanna stay in the technical track where eventually you will become the chief data or CTO and you wanna be around the like minded

data folks and enriching that knowledge, but you might be a little away from business.

I don't think I've seen any downsides of either model. It just depends on what you wanna do with your career and where do you see yourself. Like, the t axis. Right? Or do you wanna go more in the depth of the

technology and, like, keep learning more and more newer

features and functions? Or do you wanna

apply that now and see how the business

responds to that and help business get better and probably take on other roles and function underneath you. And then as far as that actual

hiring approach, I'm wondering what are some of the useful strategies that you have found for being able to

recruit and screen for candidates, particularly given the fact that

data professionals

are generally in fairly short supply, and so there is going to be a lot of competition for their skills. And so being able to make sure that you are

being considerate of the fact that you're not their only option, but also making sure that you don't just hire somebody because they have that data experience. So being able to understand the nuance of,

you know, they might have some of the technical background, but they won't be a good fit because of the specific problem domain that we're working in or, you know, they have that domain expertise, but they don't necessarily have as much technical expertise as I would want or, you know, just understanding what are the ways

to gain the attention of those individuals and then understand whether or not they will be a good fit for what you're looking for?

Yeah. I think since everyone plays with data to a certain degree, anyone could be a data professional or no 1 can like, you know, it's very vague that way. So I would say hire the people for their strengths and what fits the particular use case. So especially,

again, talking about

starting smaller,

you might wanna

just get a go getter and someone who is willing to learn. And if they have

basic understanding

of how to get the job done and

pull the data that needs to be pulled and so on,

That might be enough. Like, your bar might be different there in terms of

knowing different

statistical functions or what have you. A good example where we leveraged a lot of operations people who came from that understanding

of where the fraudulent

activity might be and just knowing the tools needed to use that and

using

that and common sense and, like, just, like,

understanding the space, like, risk first model.

And

you don't have to necessarily be

PhD in, like, machine learning for that. There is a time and place for that. So that is probably helpful.

And for specific functions, like, if you wanna build a model, now that you have probably a lot of datas

and you are at the point where you wanna build sophisticated models, and then make sure they're retrained and they're

in production

with the latest and the greatest results and the accuracy

is right, then you don't just take someone

and say, try it out.

That just needs a very specific skill set where people have done that in the past. So I would say

so anyone who is hiring,

and I am

at the moment, so

definitely putting that plug. But anyone who is hiring,

don't start with when you get the head count or think about expanding the team. At any given moment, you might be the first person joining with an intention to grow the team, I'm assuming.

So have a decent network and pipeline where you just know the people

and what their strengths are so that you can plug them in when,

the use case comes about and have that deep relationship. Maintain that relationship

as well. That goes a long way. It could be your ex coworkers, your ex teammates, your, you know, people you have worked with on a peer base or just collaborated with, and that goes a long way.

In the situation where you are hiring somebody who might be, you know, junior or mid level from a data professional perspective, and you want to give them the opportunity to grow into a more senior level role or

somebody who's maybe senior level and you wanna bring them up at the principal level,

what are some of the ways that you have found to be able to provide a safe space for learning and being able to make mistakes and fail at different tasks to use those as learning opportunities so that they don't have to just stick to the core of what they know and not feel

supported in being able to actually reach outside of their comfort zone and grow into those capabilities?

Yeah. So for that, you need to know your

team really well, first of all. So establish that relationship beyond

work just to, like, understanding where they wanna take their career.

What do they self identify as their strengths and weaknesses?

And

so day 1, you need to just, like, understand that basic aspects

and then see that in the work. And you might have a different point of view than what they think of their, you know, superpowers and kryptonite, so to speak.

And then show that with data. So, like, oh, you thought this, but I think you have more than more to offer than what you thought or vice versa.

And then work with them to

give them more of the opportunities where they're good at and where they don't need any hand holding.

And then more opportunities

to

strengthen their weaknesses or, like, help them coach their that's the point where they might need a little bit more hand holding.

And I think that's how you grow people

holistically and with their own buy in

versus just top down dumping something or, like, because you have been given a template that this is what it needs to go from l 4 to l 5 and l 5 to l 6 and so on.

So I think that plays a huge role. And if they have bought in and they are ambitious enough and they see

that career trajectory also growing, the half the battle is done. They are going to put in the work and

show them with the data. And

I like to say that the data people are the most notorious

to not measure their own data,

whether that be in terms of number of work products or, like, just in their own like, how many good

projects or products came out of this versus some that needed work and so on. So

do that meticulously

and then just show that this went really well, and here is where

you could need help, and here is why. And so once you agree on that, then the plan can be put forward. I would say, initially, when you're growing and you have young team, young in terms of even the tenure and, like, the level,

obviously, they wanna grow and grow fast. But you have to set the right expectation who gets to grow and what are the stipulations

around that.

A smaller organization

might have loser

rules around that. But

once you get to the maturity,

not everyone even wants to grow up. They probably wanna just, like, deepen their understanding

or go more horizontal

like we talk about. Like,

they worked on 1 domain. Now they wanna figure out the back end of it, or they worked on consumer. They want now I wanna work on merchant or

platforms or

so be open to those opportunities too. And that might mean that you might lose someone on your team. But think about it like you're building long term

relationships by just helping them figure out what they wanna do and finding that path for them.

So the answer could be varied. But if you have a continuous dialogue and it's a 2 way street, I don't think anything is difficult.

And same with you communicating to your higher ups too as to who are the qualified candidates and why they are the best. That work goes in not when you're pitching at the promo

course level.

But that work goes on 6 months prior to that as to setting them up. And as a manager, you are actively doing that.

So there's upward and then downward management

of both of that, and you have to balance that really well.

Unstruck is the data ops platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex,

expensive, and bespoke.

Unstruct Data is changing that equation with their platform approach to manage your unstructured assets.

Built to handle all of your real world data, from videos and images to 3 d point clouds and to geospatial records to industry specific file formats, Unstruck streamlines your workflow by converting human hours into machine minutes

and automatically alerting you to insights found in your dark data.

Unstruck handles data versioning, lineage tracking, duplicate detection, consistency validation, as well as enrichment through sources including machine learning models, 3rd party data, and web APIs.

Go to data engineering podcast.com/unstruck

today. That's u n s s t r u k. And transform your messy collection of unstructured data files into actionable assets that power your business.

Another interesting

transition that a team might go through is the process of being acquired by another organization and dealing with that merger and integration step.

And I know that that's something that you are either going through or just recently went through. And so I'm curious if you can talk to some of the ways that that influences

the

sort of patterns and practices of how your team functions and some of the lessons that you've learned as far as

how

data lives in an organization and how it can be merged with another organization effectively

so that people who

either are planning for that eventuality or

might end up in that situation

can factor that into the ways that they think about their data architectures?

Yeah. So this is my

recent lived experience. I can definitely talk about that.

So expanding on the Fintech aspect, and since it was 2 Fintechs coming together,

on top of that, as you know, they're, like,

heavy FTC

and regulation

and, like, mergers

and acquisitions are super hard, especially in the United States with the latest

on that.

So

our acquisition, I should say, was announced

late August or September last year.

And there was just a lot in between that it was finalized on February 1st.

And so

we did announce 1 product

the day of the merger official merger announcement, February 1st. And my team worked on it, which was all

online Square sellers can now accept after pay, buy now, pay later.

And, obviously, the industry, the analysts, the street welcomed it, and that was the intent. Right? The day of the merger, you're already announcing

a joint product. It's not

easy to pull off. The only way we could do that was

treating

Square in this case like we would have any third party and just having integration

dialogues and so on. So

without getting into the details of it, it was hard.

And

any lessons

from that is just like

so

pre acquisition, obviously, you can't do much because you're not 1 company, and I would rather jeopardize the process. That was not only the message from

everyone, especially legal, but I would relay the same thing. It's like, don't jeopardize your chances

of, like, you know, the acquisition

not going through. That's your

intent. But post acquisition, I would say,

some of the

learnings that I wanna pass on is definitely get your data teams involved, like, day 1 when it's, like, legally

compliance wise and obviously all those, like, conditions applied.

Okay.

Then data infrastructure, to be honest, should be talking to each other. Because day 1, you also wanna give your teams access to both

sides of the data. And you can, like, literally match in

possibly 1 database would be ideal. But if not, have a connector. Some way to marry those 2

and have some

way to

look at it holistically because that's the whole intent.

And,

yeah, I would say if you have similar systems, that helps. But these are the

behind the scenes dialogue that should be happening. How do we do the migration? When do these contracts end?

For the interim basis, are there tools, like, just we were using Jupyter

Notebooks to just, like, you know, stitch those 2 and, like, just get past again going goes back to just focus on the solution.

Increasingly, I figured out this was my first time, especially being in a smaller company that got acquired. And there are just so many learnings. But 1 of them is, like, make sure you make data and data infrastructure and

smooth transition of that a priority.

Because, like, with you touched upon not too long ago, the request start coming in. Not even day 1. Day minus 1, they'll be like, oh, Navia 1 company. Right? So what is the holistic, like, blah? And what are our common

customers? And,

obviously, everyone wants to know. And these were 2 public companies coming together

to make it even harder.

So the questions keep coming. Every data person knows that the question list is always longer than the answers you can give.

But, yeah, just make sure

you emphasize to the leaders, and the leaders also understand that this is important. This is equally important

as the acquisition

going through as to making sure the 2 data talk to each other. And it's a smooth process and

quick ETAs on that or some resolution way to get to the end goal.

As far as that overall process of

integrating the teams and integrating the data systems, what are some of the impedance mismatches that you had to deal with where, you know, maybe there were different expectations of how data work is done and how you track it or some of the mismatches in terms of the ways that you

maybe structure the definition of a given metric between the 2 different organizations where, you you know, this is how you calculate a customer in the canonical sense for, you know, company a. And in company b, we actually do it slightly differently where we bring in these other factors and being able to maybe reconcile those different views of the relevant domain objects or

the ways that the teamwork is organized where maybe you're using Jira in 1 place and you're using GitHub issues in the other place, and just how you kind of integrate those different ways of working and thinking together.

That's just a whole

set of so many things, I think, but you nailed it. It's kind of starts with a definition. Initially,

most of the meetings we use will so what you call this is what we call

on our side that, and then let's just, like, get the lingo. Right?

But, obviously, it you have to adopt the bigger company's lingo. So learn that quickly. In a sense, I also

feel like the way they cut their data in terms of, like,

sizes, for example. What is small, medium versus large in our case could be very different

from

each organization as the internal definition,

so to speak. Right?

So adjust to that or, like, call out the differences quickly. I mean, little things like even we talked about it offline, but Afterpay was Australian company. So our road maps were all going from July 1st to June 31st.

And

Block is

a a US listed company and headquartered in San Francisco. So,

obviously, it was January 1st to

it's calendar year versus fiscal year. And just with those

mismatches could change so much. Like, q 1 is which q 1 are you talking about? Calendar year or fiscal year? So

little things like that all the way to

data and how is yeah. Little definitions

all the way to

how the data is collected and what it's called and when you're gonna present it, how to

use the right terminology, I think matters too because

these insights are what are taken into

next steps and things like that. And there's a lot of confusion

initially. So no 1 can avoid that. But the sooner you get past that and just have that discussion on, like, just having data dictionary and, like, just

checking the definition behind the scenes

is immensely

valuable. Because some estimates,

god forbid, is taken out of context, can just have bigger

ramification

down the road. But it's just unavoidable because

there's no universal

truth to data, so to speak. And every industry within the industry too,

there will always be an internal

company

definition.

And the smaller the company, in fact, there's just a lot of hard coatings

and, like, how it's

pipe through as we all know.

And that just shows. It shows

greatly when these kind of events happen,

but it's a learning opportunity

to, like, make it more streamlined and better.

To your point too about just establishing

what are the things that we're talking about and what is the appropriate definition for it also factors into some of the questions about

how do you think about onboarding

newcomers to your team so that when they come in and they start experiencing some of these elements of internal jargon or these references to these kind of business objects and they want to understand, okay. Well, what are we even talking about?

It's useful to, you know, have that business catalog with relevant documentation so that you can point somebody to it and say, this is what we mean. These are the assumptions that we're making in the construction of this definition, and just being very explicit with those

what is too often implicit,

views of what the what it is that you're actually talking about. 100%. I agree with that in theory,

but I feel like the smaller the company, you always put documentation

and this added extra benefit in a back burner. And there are very few people intrinsically

motivated to do that. Luckily, I have someone on my team

truly great at doing that. But there are very few we have to push ourselves to do that. So,

theoretically, I say that would be ideal. And now I'm seeing the differences of a bigger company having all these

things ready. And, like, you know, a template and to orientation,

hold checklists.

Like, smaller companies don't

have that, and it's mostly word-of-mouth and just pinging 100 people and getting an answer. And then the same thing happens to the newer person and

so on. So

but, yeah, if it's in the culture, and I've never worked at Stripe, but I hear great things. And the person on my team that I was talking about is the next Stripe.

They just have documentation

so much embedded in their culture,

something to learn from, and that helps immensely in these kind of situations. Absolutely. And I will admit to being 1 of the guilty parties of not documenting things as often as I should. We all are, I feel like. Absolutely.

We have to incentivize

that. I feel like the right incentive

will motivate that.

1 good thing we had at Uber was having a buddy system.

And if you assign someone as a buddy, they have to do it multiple times. They just, like, have their own checklist and things like that. And also

some OKRs are, like, you know, just making sure that you will have citizenship goals. You know? And this could be 1 of those.

None of the business metrics, but just helps

the culture in getting the team

highly ramped up. And so, again, it goes down to culture and what leadership

emphasizes on. Absolutely.

I've been finding that

with being in a fully remote environment, which

most people either currently are doing or have at least gone through in the recent past, helps to

provide the kind of space and motivation

to turn more of that implicit knowledge into something written. It's just that all too frequently,

what where it is written is something like Slack or email where it's easy to get buried and lost. And

so 1 of the practices that my team is starting to move more towards is

being very

intentional about not putting those communications into Slack. And any time something turns into a conversation about, oh, well, how does this work or why is this this way? Or how do we want to approach this problem?

If it goes beyond, you know, 2 or 3 responses in Slack, then put it into a more durable

location for being able to have that conversation. So we've been using things like GitHub discussions, but having some sort of internal forum or some people have polarized opinions about wikis, but, you know, even just putting it into a wiki so that you can have that canonical reference to say, this is

the kind of description. These are the conversations that we've had around it.

And then it also encourages people to

pause and think through and be more deliberate about the ways that they're communicating rather than just the very rapid back and forth that Slack encourages.

Yeah. No. 100%. I think there's so many tools available. It's just a matter of focusing

it and not putting it away for later. As you're doing it, I think the knowledge is fresh. So I highly encourage people that are, like, doing it and building things for the first time, like, just dump it somewhere. And I'm happy to, like, put a

finesse and a cherry on top if needed. But

as long as it's in Confluence,

Jira, Git,

what have you,

even Notion page. A lot of startups are using that, like, anything that helps. But as long as it's on the cloud somewhere that the whole company can access, it just saves you effort later.

And as long as there is

1 or at least a very small number of places that it will go so that you don't have to have the cognitive burden of every time you want to write something down, stopping to think about, okay. Well, which is the right context for me to write this down in so that somebody else will be able to find it later? Because as soon as you go into that space, then the entire incentive to actually write those things down just dissipates and you say, well, it's too hard for me to even figure out where to put it, so I'll just, you know, put that as a to do and then

documentation. I think Notion has done a pretty good job at that. Absolutely.

And so

in your experience of working in the data ecosystem

and both being a very early hire at different organizations for the data team and helping to grow and scale data teams and also going through the experience of

being acquired and integrating with a larger organization,

what are some of the most interesting or unexpected or challenging lessons that you've learned?

I mean, first of all, I learned about myself

that I definitely

thrive

in,

like I said, before 0 to 1 space or something

new, something that we are building.

So I like that building aspect of things.

So just figuring it out. So from my learnings, I figured out that people are just motivated by different things. And so figuring out

how you hire

based on people's tendencies is also important.

And

just perfection

is not the goal

in this particular environment of, like, when you're the first of anything or trying to do something

for the first time as a company.

We have to move fast.

And so just

on solving problems, unblocking

people, or unblocking just the hurdles

should be the aim there. So

the data finesse again or the data tools and efficiencies

takes a little bit of a back seat.

The mission takes a more critical

seat there. And in my case, it was obviously in the Fintech space, whether it was risk or gift card growth or card launch and merchant

in the buy now pay later space. But

it has always been something

in the space that I loved and, like, wanted to learn and add more skills to it. So that's important.

Or that's at least, I would say, was a learning for me.

I think we already discussed most of these things. I'm conscious of not repeating myself. But scaling, I feel like, again,

making sure you're putting the right

jigsaw puzzle in the right space is important.

If that's not the case, then just go with the

mindset of, like, making the person understand what they're getting into. So it's a mutually beneficial

space for you as a person coming into your team and then growing from there and having empathy as much as possible for your new hires. And

up, down, I think, go sideways, goes a long way.

You're building relationships at the end of the day more than you're building actually product,

which just happens. So

people forget and get overly

into 1 or the other. But as long as you're mindful of the whole ecosystem,

it will be a wonderful journey.

Are there any other aspects of

team, organization,

or evolution,

or growth, or ways to foster data as a core capability in an organization that we didn't discuss yet that you'd like to cover before we close out the show or any other pieces of advice that you wish somebody had given you early in your career?

I would say data has to be at the center and important to your decision making. And

a lot of people say those words, but they don't actually

act on them. And what I mean by that is you can't say it's a data driven decision making org that we have, but not

give the headcount or focus on your data infrastructure

or put your data people in

charge of, like, you know, the decision making.

And so make sure

you join organizations

that have that sync in terms of what,

you know first of all, is it a very data centric, data driven

organization? And if so,

are they showing that in their data teams and, you know, empowering them to do everything

they can. So that's how I would answer the first part of your question. And, second, slightly unrelated,

I would say is, like,

something that I

learned over the course of my career that I wish I knew earlier was to play the long game. I feel like

especially early professionals

and young professionals,

I have, seen get overly caught up in, like, little things like promotion and then this and my manager team, but I was hired for this and then my scope team. I would say figure out what you want.

Or if you don't know that, then focus on figuring that out and just play the long game. And then focus on the long term goals. And everything else will just be

noise at the end of the day. If you focus on just that

end goal and continue focusing on that, the distractions will just minimize, and you will get what you want. Absolutely.

Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or learn more about the types of roles that you're hiring for, have you add your preferred contact information to the show notes.

And so as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.

I would say industry specific data management tooling, I haven't seen. I always say that Fintech infrastructure

is gonna be big,

and this could be part of it. Just making

sure you're not giving a generic tool for a domain specific

company. So that will be

where I would wanna see more focus and niche expertise

tools coming out.

And then I don't think anyone parts part comes to mind, but anyone has figured out

the question that we all hate, which we touched upon is, like, pull the x or like pull the data for x are blank.

And that can be automated and made

better.

Then that will make every data professional's life so much easier. Their focus will be more on

the rich insights.

So, hopefully, that will happen sooner rather than later.

Alright. Well, thank you very much for taking the time today to join me and share your thoughts on how to effectively

establish and grow and evolve a data team and some of the skills and capabilities that are useful at the different stages of that journey. So appreciate all the time and energy you've put into sharing your experience and all the work that you've done to grow and scale your own teams and organizations. So, hope you have a good rest of your day. Thank you. Thanks for having me. You too.

Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com

to learn about the Python language, its community, and the innovative ways it is being used.

And visit the site at dataengineeringpodcast.com

to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com

with your story. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers.

Data Engineering Podcast

Summary

Announcements

Interview

Contact Info

Parting Question

Closing Announcements

Links