Summary
Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. Trupti Natu has been the first data hire multiple times and gone through the process of building teams across the different stages of growth. In this episode she shares her thoughts and insights on how to be intentional about establishing your own data team.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos.
- Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
- Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold.
- Unstruk is the DataOps platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex, expensive, and bespoke. Unstruk Data is changing that equation with their platform approach to manage your unstructured assets. Built to handle all of your real-world data, from videos and images, to 3d point clouds and geospatial records, to industry specific file formats, Unstruk streamlines your workflow by converting human hours into machine minutes, and automatically alerting you to insights found in your dark data. Unstruk handles data versioning, lineage tracking, duplicate detection, consistency validation, as well as enrichment through sources including machine learning models, 3rd party data, and web APIs. Go to dataengineeringpodcast.com/unstruk today to transform your messy collection of unstructured data files into actionable assets that power your business.
- Your host is Tobias Macey and today I’m interviewing Trupti Natu about strategies for building your team, from the first data hire to post-acquisition
Interview
- Introduction
- How did you get involved in the area of FinTech & Data Science (management)?
- How would you describe your overall career trajectory in data?
- Can you describe what your experience has been as a data professional at different stages of company growth?
- What are the traits that you look for in a first or second data hire at an organization?
- What are useful metrics for success to help gauge the effectiveness of hires at this early stage of data capabilities?
- What are the broad goals and projects that early data hires should be focused on?
- What are the indicators that you look for to determine when to scale the team?
- As you are building a team of data professionals, what are the organizational topologies that you have found most effective? (e.g. centralized vs. embedded data pros, etc.)
- What are the recruiting and screening/interviewing techniques that you have found most helpful given the relative scarcity of experienced data practitioners?
- What are the organizational and technical structures that are helpful to establish early in the organization’s data journey to reduce the onboarding time for new hires?
- Your background has primarily been in FinTech. How does the business domain influence the types of background and domain expertise that you look for?
- You recently went through an acquisition at the startup you were with. Can you describe the data-related projects that were required during the merger?
- What are the impedance mismatches that you have had to resolve in your data systems, moving from a fast-moving startup into a larger, more established organization?
- Being a FinTech company, what are some of the categories of regulatory considerations that you had to deal with during the integration process?
- What are the most interesting, unexpected, or challenging lessons that you have learned along your career journey?
- What are some of the pieces of advice that you wished you knew at the beginning of your career, and that you would like to share with others in that situation?
Contact Info
- @truptinatu on Twitter
- Trupti is hiring for multiple product data science roles. Feel free to DM her on Twitter or LinkedIn to find out more
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.
Your host is Tobias Macy, and today I'm interviewing Tripti Nattu about strategies for building your team from the 1st data hire to post acquisition. So, Tripti, can you start by introducing yourself?
[00:01:02] Unknown:
Yeah. Hi. I'm Tripti. I work at Block right now, and I worked in Fintech my entire career. And I lead our merchant product data science team. And do you remember how you first got started working in the area of data? Background is in engineering, undergrad in computer science. And I did my masters in information systems management. And there, the course itself was to marry information systems and, like, help present it in a way data driven insights, like a bridge between business and tech. And so that's where I got deep into just learning our data and using data to tell a story. And using data to derive insights and help business make decisions.
And ever since, I've been in data.
[00:01:52] Unknown:
And so in terms of your experience working in the data ecosystem, can you just give a bit of a overall picture of the trajectory that you've taken in terms of the team sizes and organization scales that you've worked in and some of the ways that that has influenced the way that you think about the work that you do and how to interact with the organization as a data professional?
[00:02:14] Unknown:
Early on in my career, I was kind of in boutique management consulting firm. And so it was heavily data driven and putting it in frameworks to solve attack the problem and come up with a solution. So from then on to various big organizations, first, our founding team member role within portfolio, for example, Uber Eats or, like, Amazon in their gift cards domain. It was more like applying those problem solving skills, but using those to then build from scratch. And there was a stark contrast where I've quickly learned that I tried more in the 0 to 1 space. It's harder, but it's also more gratifying when you see the results and you can say, I put my stamp on this. You know, end to end, the details of it.
[00:03:15] Unknown:
In terms of the 0 to 1 space that you said you have had the most fun with and is honestly probably what a large number of people listening are dealing with because of the fact that, you know, data engineering and data science are still relatively new professions and still kind of growing into their sort of mature phase of the life cycle. I'm curious if you can just talk to some of the traits that you have found most useful in yourself working in that space or as somebody who works with others at that stage or hiring from, you know, 0 to 1 or, you know, 1 to 2 stage, the types of capabilities that are most valuable to be able to actually be successful as 1 of the first or second data hires in an organization?
[00:04:04] Unknown:
Yeah. That's a great question. What I personally realize is you have to in these situations as the first or second or just 0 to 1 space, smaller space. 1st of anything building something, you have to fall in love with the problem. It really has to derive from somewhere where you are obsessed with solving it in a way. Because it's not necessarily limited to data and all the skill set that you have learned in school or in your previous job. Sometimes you have to stretch beyond just, like, understanding how the data is inputted and knowing the finest SQL, which is tight and, like, efficient.
That's not the end goal. It's fine if your SQL is a little inefficient. But are you getting the job done by stretching the limits? 1 good example would be right now so so I joined Afterpay last year, which is now Block. But being a smaller organization, a lot of the data didn't really make it to the warehouse. A lot of the nuances of the data or the verbose data was still in logs. And these were Sumo Logic is the tool we use. Engineers, first of all, write to the logs. So if it's not even written to the logs, you can't read it. And so they are more equipped to actually read it and, like, make sense out of it. But that's something my team does. I did that too because, you know, you just have to cut to the tail and, like, get the ends So that might not be the familiar tools that you're used to. The the the WEGR, AWS, your cloud, your systems.
So that's what I mean by having the mindset to go beyond. And, you know, sometimes it could just be an Excel spreadsheet or your conventional tools, or like I said, other tools and like other job functions might be familiar with it, but you just stretch and you get to the end. Yeah. 1 of the kind of phrases that I've always leaned on to kind of characterize that stage is, you know, make it work, make it right, and then make it fast. Right. Yeah. That's a good way to put it. Just solve it, basically. What you're saying is it's like because no 1 has the time or the resources to teach you anything. You just, like, from what you have, you solve it and put trust in your work product. And then you would learn something along the way as well, but also get opportunities and build on it. And then you kind of know the system beyond just your narrow focus, then you will know that in entirety, which is immensely valuable, actually.
[00:06:36] Unknown:
And as somebody who is the 1st professional in a given problem domain within an organization or somebody who is managing somebody who is in that role of, you know, being the 1st data professional, whether it's the 1st data engineer, data scientist, etcetera. What are some of the useful metrics or kind of objectives to be able to measure against to determine whether or not somebody is being successful in the role or some of the ways that you can set those expectations in a way that's useful to let that person gauge their own progress on that journey to being successful?
[00:07:12] Unknown:
This is a relatively maybe new term, but I like to tell data professionals that you're the product manager of your own work. It's like and I've played the PM role in the past as well. Like, you know, the smaller you are, the first person you are, you're playing multiple roles. And I kinda like that because you're managing your own road map, so to speak. And I feel like especially in the smaller space and if you're starting off at either a startup or taking on a big scope beyond what we just discussed, I think play the role of the product manager of your work product.
And what I mean by that is just take control of the narrative and, like, also the road map and what you're delivering, when people should expect it, what are the underpinnings of it, instill trust in, like, where it's coming from, and drive it, like, backwards from the solution that you want to where you're getting at. And what tends to happen without that is you might get caught up in the rut of just, like, pulling data. And that those 2 words are, like, highly like, any data professional would hate those words. And so if you wanna take control of the narrative, then you wanna get away from those. Maybe make a 80 20 rule that it's not a no no. I will do that sometimes to, you know, unblock people.
But what I wanna focus on is the more solution driven. Like, what are we solving at the end of the day? And these are the data insights that I have that will help us move forward. And finding those opportunities in the business so that that helps with the prioritization, taking the end, business, you know, in doing. Yeah. And in terms of the kind of steps that you
[00:08:57] Unknown:
you might decide are acceptable shortcuts that, you know, if you're keeping your purist hat on is absolutely, you know, verboten is, you know, oh, well, need to be able to build a report on the number of transactions that we've done in this system. You know, if you're doing it the right way, we're going to pull that data out, put it into an analytical store, and then do the analysis. But I need this done yesterday, so I'll just write a SQL query against the production database and, you know, just make sure that I write it in a way that it isn't gonna take the system down or that it's going against a read replica. You know, it's definitely a situation that I found myself in. And not always happy about it, but it gets the job done, and it gives you the space to be able to say, okay. This works. Now I'm going to work on making it right of you know, I'm going to build up that extra infrastructure and the tooling and the workflows to be able to do this in a way that is more proper from an architectural and best practices standpoint.
And then, you know, once that's working, I can actually iterate and kind of build up that flywheel of capability.
[00:09:53] Unknown:
Yeah. You're absolutely right. So I think sometimes you just have to get the job done like what you just mentioned. It's like cut to the chase and find the solution. And then if it's speeding, then find avenues to maybe automate. How can it be done faster? And how can the audience just get to the solution by themselves? So it cuts you as the middle person relaying that, and then it also frees up the data person's bandwidth to work on other things and more impactful things. But getting back to your like, what should be the OKRs if I understand the essence of it? And I think it differs. Right? Especially in a more smaller environment if you're going in in a more smaller space, at the end of the day, everyone's job is to make sure the business is moving forward and you're providing valuable insights. So it's slightly different.
In a more medium to slightly established organization, you probably wanna make sure that based on the function, even if assuming that you have different functions and you're not a full stack, basically. So smaller to organize, you might be your own data engineer and also a business intelligence person and also a business and and data scientist, machine learning, what have you. So all of the full stack. Right? But assuming you're in a job function, you wanna just make sure you're based on where you are in the stack, your top and bottom stack are your stakeholders and they are happy. I think your OKRs have to probably derive from that as to, like, your data engineering. You wanna make sure your tables are reliable. You know exactly when the batching is happening, when the downtime, and it's just the availability, basically, and governance and quality.
Because a lot of people are not gonna do that for you. They're just gonna rely on whatever is in the database. So don't make them do QC for you, for example. And so on and so forth. Because everyone has that footing in a way. Right? That they have stakeholders on either side that they have to manage. So I feel like that could be 1 way to look at your work product and measure yourself apart from other, like, you know, efficiency and, like, time needed and so on and so forth. 1 of the other interesting things
[00:12:03] Unknown:
is that as far as being the first hire in an organization for, you know, data specifically, since that's what we're talking about, is that you're also kind of setting expectations for the capabilities that the organization will grow into. If an organization doesn't have anybody to produce analytical reports, most people are just gonna be fighting with Excel and doing the best that they can and getting conflicting results and, you know, dealing with issues of versioning and, you know, no data quality to speak of. And so as that first data hire, you can kind of set that baseline of these are some of the things that we can do right now because, you know, when somebody does hire a data professional, a lot of times, they're going to say, oh, I've got a data professional, so now we can do everything. We can, you know, do what Netflix does because we have somebody who does data. And so part of the responsibility too is being able to set the kind of appropriate expectations for the organization to say, these are the things that we can do right now. If you want to do, you know, a, b, and c, then these are the prerequisites to get there. And I'm wondering what you have seen as some of the useful ways to kind of convey that understanding and set those expectations and how you are able to work with the organization to kind of priority rank what your capabilities are at a given stage of sophistication.
[00:13:23] Unknown:
Yeah. That's also another good 1. I think I would categorize them as who needs to understand that first. And if it's a technical audience, and then what I mean by that is if it's a heavy tech company. Right? And I have seen both sides of it. They might themselves be understanding and know, oh, we we I know we don't have half of these things, and some of the data is being manually fed. So we will take that variance in the results. And so that's like your easiest stakeholder or audience to manage. Hopefully, if you're reporting into, say, your CTO or you have a technical cofounder, that is probably the easiest route because they understand the challenges of gathering, cleansing, all of that we talked about. But on the other hand, if it's a more nontechnical audience that is expecting something off of you and don't understand the challenges underpinning why things can have a variance or 1, their dupes or what have you. Then I would say explain it to them at the best of your ability as to, like, what goes on in the pipeline and the food chain that we talked about.
And that may or may not be well received or, like, they just easily digestible. And so just show them with an example. I think what works best is just and this I've heard learned probably the hard way. Literally, let's take an example. You have some dashboard, and it shows 2 of the same items there. And people are now displaying this dashboard and they're like, oh, there's there's dupes. And you can then explain saying, yes. This is exactly what I was talking about. There are dupes because that's how it's fed, because this is a manual input source or what have you, whatever the reason might be. And that might be a more empathetic way for them to understand what you are dealing with because they might not know the technical challenges that go in in order to just put a dashboard together.
So bring your audience along. And based on their background and expertise, tell them in the language they will understand. And sometimes just showing error. And I know we are all perfectionist and we wanna only provide the best outcome and work product out there. So this is the only scenario I would advise. Show it as it is. And it helps make your case stronger. As long as you know exactly why it's happening, that will speak volumes. So, yes, you're right. Hopefully, that answers the question. But it's kind of like, yes. Just because you have a data person doesn't mean you can boil the ocean. You have to be very strict with your stack ranking. And that's where, again, be playing that PM, product manager is immensely valuable as to, like, you told me 10 things. What is the ranking on those? Because not all 10 can be important. Everything's important and nothing's important. And so it forces them to think in the way what is really crucial, and then you can give them a high level sizing as to this can be done faster. This is a tremendous amount of work. Blah blah blah. And then negotiate with, like, I can get you this faster by some high level assumptions. It might not be accurate, but what do you wanna use it for? So go back and forth with that negotiation, and you'll have a not only will you instill empathy, so next time they know exactly what your role is and what you have to go through to even probably give 1 slide or 1 Excel spreadsheet, But also they know that you can work with them, and there's a negotiation happening. And that forces them to stack rank.
[00:17:01] Unknown:
And then once you have gotten to the point where you have a data hire, they're onboarded, they're, you know, making progress on their objectives, They've helped to educate the organization on what their capabilities actually are and what you can actually achieve in a reasonable amount of time given the resources that you have. What are the indicators that will say this is the right time to now add another data hire or, you know, to what degree you should scale the team. And as you do move from that, you know, 1 or 2 people on a data team to, you know, a midsized team, which depending on the scale of the organization, can have very different meanings. You know, what are some of the ways that you can look for that signal to say, okay. Now is the time to start scaling. And as I scale, these are the capabilities and backgrounds that I want to look for in those new hires.
[00:17:52] Unknown:
Yeah. So first of all, I think your first person should be that mission driven with someone who can put multiple hats on and just gets the job done. Right? So we already talked about that. So after that, having that foundation, by that time, I think you kind of know what you're dealing with in terms of your work product, your organization scale, the richness of the data, the quality of the data, and the tools and so on. And then volume too. So I would say the second person typically, and there are obviously exceptions to every rule, shouldn't be like ML or, highly sophisticated and, like, a narrow skill set because you might not have enough training data to even build models and then they will won't be 100% occupied. Or maybe they're used to having a clean dataset and working with that. So be very cautious of that. Your second or, like, from then on should probably be a bandwidth well invested in infrastructure. So data engineering, making sure you have some level of taking the Kafka logs, the raw, even Sumo logs or whatever log that you have.
Working with your engineering team and how rich that infrastructure there is to make sure now everything is getting into the warehouse and flowing through that. Someone who has built and worked with cloud infrastructure, then picking the right cloud infrastructure. And that can be very expensive too. And most companies, assuming it's a start up or a smaller company, are not gonna build their own clouds. It's rarely I've seen I mean, obviously, Amazon. But where was the only exception I've seen that has their own cloud and data infrastructure?
It can be very expensive. So someone who has worked with these cloud infrastructures and picking the right tools and vendors and setting that up so that the analysis can be now done faster, but also more reliable. And now you're starting to build that pipeline. Initially, the first aid people, I I would say, try to make sure they they can also wear multiple hats. But if they have now you can now start having over indexing on 1 versus the other. And if they have infrastructure capabilities, that will help you a lot.
[00:20:12] Unknown:
Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it's often too late and the damage is done. DataFold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. DataFold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying. You can now know exactly what will change in your database.
DataFold integrates with all major data warehouses as well as frameworks such as airflow and DBT and seamlessly plugs into CI workflows. Visitdataengineeringpodcast.com/datafold today to book a demo with DataFold. As you are hiring and starting to grow the team beyond just the first 1 or 2 people, there are definitely certain domains that have a certain degree of requirement to understand the specifics of that problem space to be able to be effective working with the associated data. I know that your background is largely in Fintech, and I'm wondering if you could just talk to some of the types of understanding of that problem space that are most necessary to actually be able to work with that data, understand the ramifications of different types of transformations or some of the restrictions from a regulatory or organizational level that you need to keep in mind as far as, you know, what data to propagate where, what data to obfuscate at different stages of the life cycle, and just some of the ways that the specific industry or problem domain that you're working with needs to be screened for in that hiring process to make sure that that data professional is going to be successful in that organization?
[00:22:05] Unknown:
That's a great question. So domain specific hiring is also or rather skill set is also important, which is in Fintech, what you're dealing with when it comes to especially data is the customer data. And they typically have payment their whole card information and ZIP code. And these are considered PII fields, which is personally identified information beyond other things. And there are laws around that and how you store them, how you encrypt them, and even access controls. So your organization might have people from all the way operations, sales, marketing to engineers and product and data people. And so and even within that, there are risk professionals and then marketing analytics and then you have BI and whatnot. So every role needs to only be granted access to what they need and not an overarching blanket. You know, read, write, access, so to speak, because you're literally playing with sensitive information.
And there are laws around that. You take trainings regularly to make sure you're compliant with that and so on. So that's just the basic difference between just storing, say, social media data where you may not have real names all the way to it's literally words, and say tweets or whatnot. To here, you have transactions and money and bank information. Like, you know, if you save credit, like a lot of websites, especially retail will have, like, save my card information for next time easy checkout and stuff. Right? So you might be saving that as well, and this is extremely sensitive.
And if there are any breaches or hacks, you have to report that, especially as a public company. But I believe also as a financial institute. So just having the knowledge that all this is important, and what are you storing, and what are the ramifications of a hack or leak comes part and parcel of it. As you are
[00:24:18] Unknown:
recruiting for and screening for people who are going to work in 1 of these domains that does have that extra requirement of understanding the details of the data that you're working with and the sort of regulations or the statistical requirements for being able to report effectively. In the event that you are talking to somebody who doesn't necessarily have that existing background, what are some of the ways that you can help them acquire that understanding post hire or understand whether or not they will be able to effectively upscale into that arena?
[00:24:53] Unknown:
Yeah. I think they're maybe starting with orientation itself, you wanna, like, from emphasize from the very beginning that what are you dealing with when it comes to data over here. And then just, like, I've seen people, like, you know, just copy, paste, maybe an email thread, sometimes the whole 16 digit number of a credit card. Now that in itself may not be a PII field because there are other elements too, but that is a big no no and not allowed. So you have to, like, encrypt a lot of that. So simple things like that, which especially coming from different fields, they might just think, oh, this is just something that I found in the column called credit card and I copy pasted that. You have to be very careful about what's going in any form of message within the organization, let alone outside the organization.
And therefore, I think storing it might also have to be encrypted. On top of that, if you're in a risk team so I work extensively in in a risk team, basically. And you're doing manual reviews to understand what was the reason why this is a fraudulent transaction. So you're looking at the profile of the user, and you have information way more than the user needs to, like, you know, that has probably shared with the organization, and you can't use it for any other purposes. So risk team has the highest, I would say, data access capabilities, but also the most sensitivity.
And the training should be geared towards that because it could just happen to be your friend or nemesis. And you can't do anything about that, especially working at Uber. We had very strict policies around when you get into someone's account, first of all, you have to document why it was necessary got was a UUID, and you you went down the rabbit hole, all you got was a UUID and you you went down the rabbit hole of, like, figuring out why the fraud happened and it happened to be a celebrity. You still, like, you have to to report and, like, document everything why you even went there, or it would be an immediate termination. The consequences could be very high. So just you are playing with fire there. But but that was an extreme case. Not every FinTech needs to do that. But, like, data is important, and it's sensitive, basically. And, like, you can triage based on little facts, and so just be mindful of that.
[00:27:31] Unknown:
On the other aspect of growing the team and scaling is the ways that you think about the organizational topologies and how you wanna structure the team, where with data professionals in particular, it's challenging because there have been some teams and some organizations that say, we want a centralized team where all of my data professionals live in 1 group. They all work on data across the entire organization, whatever that might mean. Whereas in other scenarios, they say, actually, we want to have maybe a core data platform team, but then we want to have data analysts or data scientists embedded in all of the different business units in the organization. And I'm wondering what you have seen in your own work of how you've approached that topological question and some of the different pressures that might pull or push you in 1 direction or the other?
[00:28:21] Unknown:
Yeah. I've seen both, and I would say so in a pure like, if you have a chief analytics officer or data officer, which is rare, but maybe it rolls into a CTO. There's emphasis on best practices and making sure, like, your data skills rich and the bar is high and, like, you can learn because you are kind of in that peer space of everyone kind of working on similar things, and you learn from each other quickly and up that skill set fast. I personally think it all depends on what you wanna build in your career and how you wanna progress your career. So the other end of it is I've worked a lot in, like, the GM based model where a a business unit has its product engineers, data scientists, marketing, what have you.
And that's nice that everyone has the same mission, and you come in with your focus area and concentrate on that. And we have that right now. And what we do is all the the DS people then meet as more like a hub and spoke model. And then we meet and make sure we are sharing best practices, making sure there's a bar we are maintaining while hiring. And so that's more loosely tied and a dotted line. And the mission is very business focused. So it depends on whether it's purely Databricks or Snowflake, that kind of pure tech, SaaS kind of an organization and everyone is supporting the tech underneath it, or it's a Fintech or a business organization where the business function drives it more.
And there's no right or wrong answer. I feel like it just depends. If it's a GM model, you have to understand that the path might cut off after a certain level of rich d s function, and then your next step might be g m. And are you willing to take on that function? Or are you more business focused? Because those are the conversations that will be happening on a more tighter level. Versus do you wanna stay in the technical track where eventually you will become the chief data or CTO and you wanna be around the like minded data folks and enriching that knowledge, but you might be a little away from business.
I don't think I've seen any downsides of either model. It just depends on what you wanna do with your career and where do you see yourself. Like, the t axis. Right? Or do you wanna go more in the depth of the technology and, like, keep learning more and more newer features and functions? Or do you wanna apply that now and see how the business responds to that and help business get better and probably take on other roles and function underneath you. And then as far as that actual
[00:31:20] Unknown:
hiring approach, I'm wondering what are some of the useful strategies that you have found for being able to recruit and screen for candidates, particularly given the fact that data professionals are generally in fairly short supply, and so there is going to be a lot of competition for their skills. And so being able to make sure that you are being considerate of the fact that you're not their only option, but also making sure that you don't just hire somebody because they have that data experience. So being able to understand the nuance of, you know, they might have some of the technical background, but they won't be a good fit because of the specific problem domain that we're working in or, you know, they have that domain expertise, but they don't necessarily have as much technical expertise as I would want or, you know, just understanding what are the ways to gain the attention of those individuals and then understand whether or not they will be a good fit for what you're looking for?
[00:32:17] Unknown:
Yeah. I think since everyone plays with data to a certain degree, anyone could be a data professional or no 1 can like, you know, it's very vague that way. So I would say hire the people for their strengths and what fits the particular use case. So especially, again, talking about starting smaller, you might wanna just get a go getter and someone who is willing to learn. And if they have basic understanding of how to get the job done and pull the data that needs to be pulled and so on, That might be enough. Like, your bar might be different there in terms of knowing different statistical functions or what have you. A good example where we leveraged a lot of operations people who came from that understanding of where the fraudulent activity might be and just knowing the tools needed to use that and using that and common sense and, like, just, like, understanding the space, like, risk first model.
And you don't have to necessarily be PhD in, like, machine learning for that. There is a time and place for that. So that is probably helpful. And for specific functions, like, if you wanna build a model, now that you have probably a lot of datas and you are at the point where you wanna build sophisticated models, and then make sure they're retrained and they're in production with the latest and the greatest results and the accuracy is right, then you don't just take someone and say, try it out. That just needs a very specific skill set where people have done that in the past. So I would say so anyone who is hiring, and I am at the moment, so definitely putting that plug. But anyone who is hiring, don't start with when you get the head count or think about expanding the team. At any given moment, you might be the first person joining with an intention to grow the team, I'm assuming.
So have a decent network and pipeline where you just know the people and what their strengths are so that you can plug them in when, the use case comes about and have that deep relationship. Maintain that relationship as well. That goes a long way. It could be your ex coworkers, your ex teammates, your, you know, people you have worked with on a peer base or just collaborated with, and that goes a long way.
[00:34:48] Unknown:
In the situation where you are hiring somebody who might be, you know, junior or mid level from a data professional perspective, and you want to give them the opportunity to grow into a more senior level role or somebody who's maybe senior level and you wanna bring them up at the principal level, what are some of the ways that you have found to be able to provide a safe space for learning and being able to make mistakes and fail at different tasks to use those as learning opportunities so that they don't have to just stick to the core of what they know and not feel supported in being able to actually reach outside of their comfort zone and grow into those capabilities?
[00:35:27] Unknown:
Yeah. So for that, you need to know your team really well, first of all. So establish that relationship beyond work just to, like, understanding where they wanna take their career. What do they self identify as their strengths and weaknesses? And so day 1, you need to just, like, understand that basic aspects and then see that in the work. And you might have a different point of view than what they think of their, you know, superpowers and kryptonite, so to speak. And then show that with data. So, like, oh, you thought this, but I think you have more than more to offer than what you thought or vice versa. And then work with them to give them more of the opportunities where they're good at and where they don't need any hand holding.
And then more opportunities to strengthen their weaknesses or, like, help them coach their that's the point where they might need a little bit more hand holding. And I think that's how you grow people holistically and with their own buy in versus just top down dumping something or, like, because you have been given a template that this is what it needs to go from l 4 to l 5 and l 5 to l 6 and so on. So I think that plays a huge role. And if they have bought in and they are ambitious enough and they see that career trajectory also growing, the half the battle is done. They are going to put in the work and show them with the data. And I like to say that the data people are the most notorious to not measure their own data, whether that be in terms of number of work products or, like, just in their own like, how many good projects or products came out of this versus some that needed work and so on. So do that meticulously and then just show that this went really well, and here is where you could need help, and here is why. And so once you agree on that, then the plan can be put forward. I would say, initially, when you're growing and you have young team, young in terms of even the tenure and, like, the level, obviously, they wanna grow and grow fast. But you have to set the right expectation who gets to grow and what are the stipulations around that.
A smaller organization might have loser rules around that. But once you get to the maturity, not everyone even wants to grow up. They probably wanna just, like, deepen their understanding or go more horizontal like we talk about. Like, they worked on 1 domain. Now they wanna figure out the back end of it, or they worked on consumer. They want now I wanna work on merchant or platforms or so be open to those opportunities too. And that might mean that you might lose someone on your team. But think about it like you're building long term relationships by just helping them figure out what they wanna do and finding that path for them.
So the answer could be varied. But if you have a continuous dialogue and it's a 2 way street, I don't think anything is difficult. And same with you communicating to your higher ups too as to who are the qualified candidates and why they are the best. That work goes in not when you're pitching at the promo course level. But that work goes on 6 months prior to that as to setting them up. And as a manager, you are actively doing that. So there's upward and then downward management of both of that, and you have to balance that really well.
[00:38:56] Unknown:
Unstruck is the data ops platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex, expensive, and bespoke. Unstruct Data is changing that equation with their platform approach to manage your unstructured assets. Built to handle all of your real world data, from videos and images to 3 d point clouds and to geospatial records to industry specific file formats, Unstruck streamlines your workflow by converting human hours into machine minutes and automatically alerting you to insights found in your dark data.
Unstruck handles data versioning, lineage tracking, duplicate detection, consistency validation, as well as enrichment through sources including machine learning models, 3rd party data, and web APIs. Go to data engineering podcast.com/unstruck today. That's u n s s t r u k. And transform your messy collection of unstructured data files into actionable assets that power your business. Another interesting transition that a team might go through is the process of being acquired by another organization and dealing with that merger and integration step. And I know that that's something that you are either going through or just recently went through. And so I'm curious if you can talk to some of the ways that that influences the sort of patterns and practices of how your team functions and some of the lessons that you've learned as far as how data lives in an organization and how it can be merged with another organization effectively so that people who either are planning for that eventuality or might end up in that situation can factor that into the ways that they think about their data architectures?
[00:40:40] Unknown:
Yeah. So this is my recent lived experience. I can definitely talk about that. So expanding on the Fintech aspect, and since it was 2 Fintechs coming together, on top of that, as you know, they're, like, heavy FTC and regulation and, like, mergers and acquisitions are super hard, especially in the United States with the latest on that. So our acquisition, I should say, was announced late August or September last year. And there was just a lot in between that it was finalized on February 1st. And so we did announce 1 product the day of the merger official merger announcement, February 1st. And my team worked on it, which was all online Square sellers can now accept after pay, buy now, pay later.
And, obviously, the industry, the analysts, the street welcomed it, and that was the intent. Right? The day of the merger, you're already announcing a joint product. It's not easy to pull off. The only way we could do that was treating Square in this case like we would have any third party and just having integration dialogues and so on. So without getting into the details of it, it was hard. And any lessons from that is just like so pre acquisition, obviously, you can't do much because you're not 1 company, and I would rather jeopardize the process. That was not only the message from everyone, especially legal, but I would relay the same thing. It's like, don't jeopardize your chances of, like, you know, the acquisition not going through. That's your intent. But post acquisition, I would say, some of the learnings that I wanna pass on is definitely get your data teams involved, like, day 1 when it's, like, legally compliance wise and obviously all those, like, conditions applied.
Okay. Then data infrastructure, to be honest, should be talking to each other. Because day 1, you also wanna give your teams access to both sides of the data. And you can, like, literally match in possibly 1 database would be ideal. But if not, have a connector. Some way to marry those 2 and have some way to look at it holistically because that's the whole intent. And, yeah, I would say if you have similar systems, that helps. But these are the behind the scenes dialogue that should be happening. How do we do the migration? When do these contracts end? For the interim basis, are there tools, like, just we were using Jupyter Notebooks to just, like, you know, stitch those 2 and, like, just get past again going goes back to just focus on the solution.
Increasingly, I figured out this was my first time, especially being in a smaller company that got acquired. And there are just so many learnings. But 1 of them is, like, make sure you make data and data infrastructure and smooth transition of that a priority. Because, like, with you touched upon not too long ago, the request start coming in. Not even day 1. Day minus 1, they'll be like, oh, Navia 1 company. Right? So what is the holistic, like, blah? And what are our common customers? And, obviously, everyone wants to know. And these were 2 public companies coming together to make it even harder.
So the questions keep coming. Every data person knows that the question list is always longer than the answers you can give. But, yeah, just make sure you emphasize to the leaders, and the leaders also understand that this is important. This is equally important as the acquisition going through as to making sure the 2 data talk to each other. And it's a smooth process and quick ETAs on that or some resolution way to get to the end goal.
[00:44:49] Unknown:
As far as that overall process of integrating the teams and integrating the data systems, what are some of the impedance mismatches that you had to deal with where, you know, maybe there were different expectations of how data work is done and how you track it or some of the mismatches in terms of the ways that you maybe structure the definition of a given metric between the 2 different organizations where, you you know, this is how you calculate a customer in the canonical sense for, you know, company a. And in company b, we actually do it slightly differently where we bring in these other factors and being able to maybe reconcile those different views of the relevant domain objects or the ways that the teamwork is organized where maybe you're using Jira in 1 place and you're using GitHub issues in the other place, and just how you kind of integrate those different ways of working and thinking together.
[00:45:42] Unknown:
That's just a whole set of so many things, I think, but you nailed it. It's kind of starts with a definition. Initially, most of the meetings we use will so what you call this is what we call on our side that, and then let's just, like, get the lingo. Right? But, obviously, it you have to adopt the bigger company's lingo. So learn that quickly. In a sense, I also feel like the way they cut their data in terms of, like, sizes, for example. What is small, medium versus large in our case could be very different from each organization as the internal definition, so to speak. Right?
So adjust to that or, like, call out the differences quickly. I mean, little things like even we talked about it offline, but Afterpay was Australian company. So our road maps were all going from July 1st to June 31st. And Block is a a US listed company and headquartered in San Francisco. So, obviously, it was January 1st to it's calendar year versus fiscal year. And just with those mismatches could change so much. Like, q 1 is which q 1 are you talking about? Calendar year or fiscal year? So little things like that all the way to data and how is yeah. Little definitions all the way to how the data is collected and what it's called and when you're gonna present it, how to use the right terminology, I think matters too because these insights are what are taken into next steps and things like that. And there's a lot of confusion initially. So no 1 can avoid that. But the sooner you get past that and just have that discussion on, like, just having data dictionary and, like, just checking the definition behind the scenes is immensely valuable. Because some estimates, god forbid, is taken out of context, can just have bigger ramification down the road. But it's just unavoidable because there's no universal truth to data, so to speak. And every industry within the industry too, there will always be an internal company definition.
And the smaller the company, in fact, there's just a lot of hard coatings and, like, how it's pipe through as we all know. And that just shows. It shows greatly when these kind of events happen, but it's a learning opportunity to, like, make it more streamlined and better.
[00:48:15] Unknown:
To your point too about just establishing what are the things that we're talking about and what is the appropriate definition for it also factors into some of the questions about how do you think about onboarding newcomers to your team so that when they come in and they start experiencing some of these elements of internal jargon or these references to these kind of business objects and they want to understand, okay. Well, what are we even talking about? It's useful to, you know, have that business catalog with relevant documentation so that you can point somebody to it and say, this is what we mean. These are the assumptions that we're making in the construction of this definition, and just being very explicit with those what is too often implicit, views of what the what it is that you're actually talking about. 100%. I agree with that in theory,
[00:49:03] Unknown:
but I feel like the smaller the company, you always put documentation and this added extra benefit in a back burner. And there are very few people intrinsically motivated to do that. Luckily, I have someone on my team truly great at doing that. But there are very few we have to push ourselves to do that. So, theoretically, I say that would be ideal. And now I'm seeing the differences of a bigger company having all these things ready. And, like, you know, a template and to orientation, hold checklists. Like, smaller companies don't have that, and it's mostly word-of-mouth and just pinging 100 people and getting an answer. And then the same thing happens to the newer person and so on. So but, yeah, if it's in the culture, and I've never worked at Stripe, but I hear great things. And the person on my team that I was talking about is the next Stripe.
They just have documentation so much embedded in their culture,
[00:49:59] Unknown:
something to learn from, and that helps immensely in these kind of situations. Absolutely. And I will admit to being 1 of the guilty parties of not documenting things as often as I should. We all are, I feel like. Absolutely.
[00:50:13] Unknown:
We have to incentivize that. I feel like the right incentive will motivate that. 1 good thing we had at Uber was having a buddy system. And if you assign someone as a buddy, they have to do it multiple times. They just, like, have their own checklist and things like that. And also some OKRs are, like, you know, just making sure that you will have citizenship goals. You know? And this could be 1 of those. None of the business metrics, but just helps the culture in getting the team highly ramped up. And so, again, it goes down to culture and what leadership emphasizes on. Absolutely.
[00:50:49] Unknown:
I've been finding that with being in a fully remote environment, which most people either currently are doing or have at least gone through in the recent past, helps to provide the kind of space and motivation to turn more of that implicit knowledge into something written. It's just that all too frequently, what where it is written is something like Slack or email where it's easy to get buried and lost. And so 1 of the practices that my team is starting to move more towards is being very intentional about not putting those communications into Slack. And any time something turns into a conversation about, oh, well, how does this work or why is this this way? Or how do we want to approach this problem? If it goes beyond, you know, 2 or 3 responses in Slack, then put it into a more durable location for being able to have that conversation. So we've been using things like GitHub discussions, but having some sort of internal forum or some people have polarized opinions about wikis, but, you know, even just putting it into a wiki so that you can have that canonical reference to say, this is the kind of description. These are the conversations that we've had around it.
And then it also encourages people to pause and think through and be more deliberate about the ways that they're communicating rather than just the very rapid back and forth that Slack encourages.
[00:52:09] Unknown:
Yeah. No. 100%. I think there's so many tools available. It's just a matter of focusing it and not putting it away for later. As you're doing it, I think the knowledge is fresh. So I highly encourage people that are, like, doing it and building things for the first time, like, just dump it somewhere. And I'm happy to, like, put a finesse and a cherry on top if needed. But as long as it's in Confluence, Jira, Git, what have you, even Notion page. A lot of startups are using that, like, anything that helps. But as long as it's on the cloud somewhere that the whole company can access, it just saves you effort later.
[00:52:47] Unknown:
And as long as there is 1 or at least a very small number of places that it will go so that you don't have to have the cognitive burden of every time you want to write something down, stopping to think about, okay. Well, which is the right context for me to write this down in so that somebody else will be able to find it later? Because as soon as you go into that space, then the entire incentive to actually write those things down just dissipates and you say, well, it's too hard for me to even figure out where to put it, so I'll just, you know, put that as a to do and then
[00:53:18] Unknown:
documentation. I think Notion has done a pretty good job at that. Absolutely.
[00:53:22] Unknown:
And so in your experience of working in the data ecosystem and both being a very early hire at different organizations for the data team and helping to grow and scale data teams and also going through the experience of being acquired and integrating with a larger organization, what are some of the most interesting or unexpected or challenging lessons that you've learned?
[00:53:46] Unknown:
I mean, first of all, I learned about myself that I definitely thrive in, like I said, before 0 to 1 space or something new, something that we are building. So I like that building aspect of things. So just figuring it out. So from my learnings, I figured out that people are just motivated by different things. And so figuring out how you hire based on people's tendencies is also important. And just perfection is not the goal in this particular environment of, like, when you're the first of anything or trying to do something for the first time as a company.
We have to move fast. And so just on solving problems, unblocking people, or unblocking just the hurdles should be the aim there. So the data finesse again or the data tools and efficiencies takes a little bit of a back seat. The mission takes a more critical seat there. And in my case, it was obviously in the Fintech space, whether it was risk or gift card growth or card launch and merchant in the buy now pay later space. But it has always been something in the space that I loved and, like, wanted to learn and add more skills to it. So that's important. Or that's at least, I would say, was a learning for me.
I think we already discussed most of these things. I'm conscious of not repeating myself. But scaling, I feel like, again, making sure you're putting the right jigsaw puzzle in the right space is important. If that's not the case, then just go with the mindset of, like, making the person understand what they're getting into. So it's a mutually beneficial space for you as a person coming into your team and then growing from there and having empathy as much as possible for your new hires. And up, down, I think, go sideways, goes a long way. You're building relationships at the end of the day more than you're building actually product, which just happens. So people forget and get overly into 1 or the other. But as long as you're mindful of the whole ecosystem, it will be a wonderful journey.
[00:56:13] Unknown:
Are there any other aspects of team, organization, or evolution, or growth, or ways to foster data as a core capability in an organization that we didn't discuss yet that you'd like to cover before we close out the show or any other pieces of advice that you wish somebody had given you early in your career?
[00:56:33] Unknown:
I would say data has to be at the center and important to your decision making. And a lot of people say those words, but they don't actually act on them. And what I mean by that is you can't say it's a data driven decision making org that we have, but not give the headcount or focus on your data infrastructure or put your data people in charge of, like, you know, the decision making. And so make sure you join organizations that have that sync in terms of what, you know first of all, is it a very data centric, data driven organization? And if so, are they showing that in their data teams and, you know, empowering them to do everything they can. So that's how I would answer the first part of your question. And, second, slightly unrelated, I would say is, like, something that I learned over the course of my career that I wish I knew earlier was to play the long game. I feel like especially early professionals and young professionals, I have, seen get overly caught up in, like, little things like promotion and then this and my manager team, but I was hired for this and then my scope team. I would say figure out what you want.
Or if you don't know that, then focus on figuring that out and just play the long game. And then focus on the long term goals. And everything else will just be noise at the end of the day. If you focus on just that end goal and continue focusing on that, the distractions will just minimize, and you will get what you want. Absolutely.
[00:58:18] Unknown:
Well, for anybody who wants to get in touch with you and follow along with the work that you're doing or learn more about the types of roles that you're hiring for, have you add your preferred contact information to the show notes. And so as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today.
[00:58:37] Unknown:
I would say industry specific data management tooling, I haven't seen. I always say that Fintech infrastructure is gonna be big, and this could be part of it. Just making sure you're not giving a generic tool for a domain specific company. So that will be where I would wanna see more focus and niche expertise tools coming out. And then I don't think anyone parts part comes to mind, but anyone has figured out the question that we all hate, which we touched upon is, like, pull the x or like pull the data for x are blank. And that can be automated and made better.
Then that will make every data professional's life so much easier. Their focus will be more on the rich insights. So, hopefully, that will happen sooner rather than later.
[00:59:33] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share your thoughts on how to effectively establish and grow and evolve a data team and some of the skills and capabilities that are useful at the different stages of that journey. So appreciate all the time and energy you've put into sharing your experience and all the work that you've done to grow and scale your own teams and organizations. So, hope you have a good rest of your day. Thank you. Thanks for having me. You too. Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used.
And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show, then tell us about it. Email hosts at data engineering podcast.com with your story. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers.
Introduction and Guest Introduction
Tripti's Career Journey in Data
Traits for Success in Early Data Roles
Metrics for Measuring Success in Data Roles
Setting Expectations for Data Capabilities
Scaling the Data Team
Domain-Specific Knowledge in Fintech
Organizational Topologies for Data Teams
Recruiting and Screening Data Professionals
Providing Growth Opportunities for Data Professionals
Integrating Data Teams Post-Acquisition
Onboarding and Documentation Practices
Lessons Learned in Data Team Growth and Acquisitions
Final Advice and Closing Remarks