Summary
Building a data team is hard in any circumstance, but at a startup it can be even more challenging. The requirements are fluid, you probably don't have a lot of existing data talent to manage the hiring and onboarding, and there is a need to move fast. Ghalib Suleiman has been on both sides of this equation and joins the show to share his hard-won wisdom about how to start and grow a data team in the early days of company growth.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their extensive library of integrations enable you to automatically send data to hundreds of downstream tools. Sign up free at dataengineeringpodcast.com/rudderstack
- Your host is Tobias Macey and today I'm interviewing Ghalib Suleiman about challenges and strategies for building data teams in a startup
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by sharing your conception of the responsibilities of a data team?
- What are some of the common fallacies that organizations fall prey to in their first efforts at building data capabilities?
- Have you found it more practical to hire outside talent to build out the first data systems, or grow that talent internally?
- What are some of the resources you have found most helpful in training/educating the early creators and consumers of data assets?
- When there is no internal data talent to assist with hiring, what are some of the problems that manifest in the hiring process?
- What are the concepts that the new hire needs to know?
- How much does the hiring manager/interviewer need to know about those concepts to evaluate skill?
- What are the most critical skills for a first hire to have to start generating valuable output?
- As a solo data person, what are the uphill battles that they need to be prepared for in the organization?
- What are the rabbit holes that they should beware of?
- What are some of the tactical
- What are the most interesting, innovative, or unexpected ways that you have seen initial data hires tackle startup challenges?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working on starting and growing data teams?
- When is it more practical to outsource the data work?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Closing Announcements
- Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
- To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers
Links
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Sponsored By:
- Rudderstack: ![Rudderstack](https://files.fireside.fm/file/fireside-uploads/images/c/c6161a3f-a67b-48ef-b087-52f1f1573292/CKNV8HZ6.png) RudderStack provides all your customer data pipelines in one platform. You can collect, transform, and route data across your entire stack with its event streaming, ETL, and reverse ETL pipelines. RudderStack’s warehouse-first approach means it does not store sensitive information, and it allows you to leverage your existing data warehouse/data lake infrastructure to build a single source of truth for every team. RudderStack also supports real-time use cases. You can Implement RudderStack SDKs once, then automatically send events to your warehouse and 150+ business tools, and you’ll never have to worry about API changes again. Visit [dataengineeringpodcast.com/rudderstack](https://www.dataengineeringpodcast.com/rudderstack) to sign up for free today, and snag a free T-Shirt just for being a Data Engineering Podcast listener.
Hello, and welcome to the Data Engineering podcast, the show about modern data management. Legacy CDPs charge you a premium to keep your data in a black box. RudderStack builds your CDP on top of your data warehouse, giving you a more secure and cost effective solution. Plus, it gives you more technical controls so you can fully unlock the power of your customer data. Visitdataengineeringpodcast.com/rudderstack today to take control of your customer data. Your host is Tobias Macey. And today, I'm interviewing Galeb Soleiman about challenges and strategies for building data teams in a startup. So, Galeb, can you start by introducing yourself? Sure. Hi. I'm Gallip. The current CEO and cofounder of a company called Poitomic,
[00:00:52] Unknown:
which does ETL software. But I do have over a decade of experience in data. Initially, being with machine learning and then going and working in analytics, I've started, founded, and managed data teams at, growing start ups.
[00:01:08] Unknown:
And do you remember how you first got started working in data?
[00:01:12] Unknown:
Yeah. I was I was just a general software engineer who, goodness. This is back in 2010, entered the world of machine learning as applied to language processing. Worked for a while there, and then it was moving to San Francisco 2013. Joined a startup early 2014. There were 25 people, tiny office, and joined as an I'll do anything engineer, but then quickly realized there's no 1 taking care of data matters. I was the closest guy to a data background even though my background was in machine learning. And I was, well, help with analytics, please, was the message from the founders.
And I said that there was no Snowflake back then, but, you know, I set up a Redshift cluster, set up ETL pipelines, just the little circus, and then was an early customer of Looker as well and just went down that whole path and grew a team. My team members then hired their own people and so on and so forth for 4 years until the company was about 350 people. So I saw all the chaos going from 25 to 350 people from a data perspective. Company ended up selling for about a $1, 000, 000, 000. It worked out well for everyone involved, but it was a 4 years of, every data mess you can imagine under the sun.
[00:02:22] Unknown:
And so in terms of the focus of the conversation today, we're discussing some of the complexities of dealing with data in a start up because of a number of factors where data is critical to be able to ensure that the startup is, iterating in the right direction. It's hard because you don't have a lot of people to be able to throw out the problem. I'm wondering if you can just start by sharing your conception of what are the responsibilities of a data team and maybe what are some of the specifics of a start up environment that will color that answer?
[00:02:55] Unknown:
Some people will include machine learning models and so on under data team. I don't. I think colloquially, I do consider it a team that helps companies make the right decisions. You'll never know. It's a rare case where you'll know for certain with certainty that you're making the right decision, but you can certainly look at evidence and weigh probabilities of various paths. But that's really the big 1. I just view them as the company's arm to consult before picking between multiple options when making decisions.
[00:03:27] Unknown:
And for early stage startups or even mid stage startups or people who are just starting to explore that path of what does it mean to be able to use data to ask and answer questions effectively. If you don't have somebody in the founding team who has that data background, it can definitely be very challenging to understand what is the proper scope, what are the things that we need to be thinking about in this process of bringing data into our decision making. And I'm curious if there are any common fallacies that you've seen organizations fall prey to in those efforts of starting to build up that data capability?
[00:04:04] Unknown:
Yeah. I see fallacies actually on both sides. The newly minted data person who enters this environment has their own host of fallacies, and that they get married to. And then the company on the other side that hired this person has their own set of fallacies. Beginning on the company side, 1 big fallacy is the assumption that, oh, if we get in a data person, we will be a data driven organization. There's an implicit claim that suddenly all decisions will be optimal ones, thanks to this magic word called data. There's a certain, I think, obedience towards a god that doesn't really exist. And, of course, the cure to the fallacy is just realizing this is a tool, a guide, something that sheds light on an uncertain world, but it doesn't completely remove uncertainties.
As a grown up running a division or a department, it's still on you to make some decision with some level of uncertainty, but you can couch that with data. This is a different view than just simply assuming data will guide you and, oh, yes. We obey the data. You never have perfect data, really. And then on the side of, 1 other fallacy that they can fall prey to is, perhaps, what I crudely call dashboard porn. And that's where the data person comes in. Oh, can I have a dashboard over the number of users per month? This sounds fine and well. Someone sees a pretty dashboard. Someone else makes a request, and now the company is drowning in about 78 different dashboards across only 3 departments.
The fallacy that's displayed by the data person in this scenario is going, well, if I meet everyone's requests, that's my job done. Not realizing that's actually a lot of these requests perhaps aren't really crucial to the business. So there's a real aspect here that's necessary regarding a partnership between the 2 sides, you know, where it's really on the data person to ask what are your goals, what business impact do you expect from this data investigation, what decisions do you expect to make. And it's on the other side then to also sort of employ this sort of thinking before just asking the data person for random requests. Again, dashboards especially. That's really just a big 1. And, you know, again, perhaps crude, but perhaps half of Looker's revenue comes from dashboards that no one's really looking at. You know, just simply everyone wants to see pretty graphs and look good in their team meetings.
[00:06:21] Unknown:
Absolutely. And there's definitely been a recent trend that I've seen from talking to folks of moving away from dashboards as being the default mode of delivery for the data that is being captured and analyzed where, as you said, dashboards were kind of the be all end all of business intelligence for a number of years, and now people are realizing that it's effectively useless that just because you have a dashboard doesn't mean that you actually know what to do with it or that you understand what it's trying to tell you, and there's no way to just click the button and say, okay. Do the thing that you're telling me to do. And that's where we're starting to get into this realm of reverse ETL or operational analytics or, you know, data activation, however you wanna term it, but being able to actually have that concrete action step that you can take from the data that you're looking at and being able to feed that back into those systems that are feeding into the decision making?
[00:07:12] Unknown:
Yeah. There's 2 aspects here. I think it's not to say that dashboards are useless. The way I've always I've come to think about it is the so called reverse ETL as much as I hate that term, but, the pushing data to systems works for the particular departments involved. But when you think of this company CEO, especially in a start up, the CEO has no home system. The CEO doesn't live in Salesforce. They don't live in HubSpot and so on. If they want to know something, their method of seeing things is the dashboard in the BI tool just because there's no home system. So dashboards, they do have the positives regarding visibility at the CEO level because that person has no home system. So, again, it really just depends.
Whereas if it's something department specific, certainly just pipe it into their home system and move on.
[00:08:02] Unknown:
And then in terms of the actual team formation, talent acquisition, however you wanna think about it, what are some of the best ways that you have seen for people to be able to go down that path of saying, okay. We know that we need to start taking advantage of data and then either identifying people internal to the organization who maybe either have that interest or have that capability, and then deciding whether that's sufficient or whether you need to go to some external person to bring onboard and and go through the hiring process.
[00:08:36] Unknown:
What the crucial bit in the beginning is to really understand whether you even have a problem to solve. Some people go, we need data expertise, and I've asked them, why? And they go, well, because we want to be data driven. But if there's no business problem to be solved, then perhaps hold off. Now if you do have a business problem to be solved, it I've seen it work quite well to hire someone internally who displays an affinity for the stuff. If the problem is heavy enough and important enough, no one's going to wait for a hire. Someone internally is going to step up. They may just learn SQL on the go and get going to help solve the problem at hand.
Generally, it's hard to hire externally again if you just don't specify your problems upfront because then you just don't know how to test people when you interview them. You can't pose a scenario to them if you have no scenario in mind for your own self. Generally, the best characters who do thrive in this role I've seen just tend to be people with a combination of a certain investigative mind as well as 1 that cares about business impact, and the latter bit is crucial. You really don't want people who like technology for technology's sake in this role.
[00:09:52] Unknown:
And for the kind of shape of that initial data person, a a greater label greater level of sophistication and just some of the ways to think about, you know, what that initial role looks like and what those responsibilities are?
[00:10:16] Unknown:
The big 1 is being able to communicate in plain English. Most of the job will be revolving around communication, whether in the form of written reports or presentations at meetings. Someone who expects to sit in a corner and just make plots all day probably is not gonna fit here. The second, on the quantitative side, it's I think someone who really knows to beware of small sample sizes before concluding things. I think everyone's experienced a situation in a start up where some manager sees, you know, 3 users doing something and then panics and goes, this should be the new direction for the company based on what these 3 users have done. And so really having the data person just have an affinity for saying, hang on. You know, this is such a tiny sample size. You really can't conclude much from this is a big 1.
And then, again, it just goes to someone driven by actual business impact rather than, like, an analysis for analysis's sake. So good communication, fear of small sample sizes and knowing what to do about them, and someone just driven by business impact who would who's comfortable having the conversations around, well, how important is this analysis and what goal is it gonna feed to prevent again the dashboard explosion that ends up meaningless.
[00:11:29] Unknown:
And another critical element of forming that internal data capability is education, and that goes both to the person who is doing the work to make sure that they are understanding some of the specific challenges of working with data, how to use the tools, what tools are available, but also for the people who are consuming the data or asking for different answers related to the data, helping them understand what are the actual capabilities, what is it reasonable to ask for now versus what are the things that you're only going to get after you've been through a few iteration cycles. And I'm wondering if there are any kind of tactical, approaches that you have taken or specific resources that you found helpful to build up those those kind of different sides of the educational paradigm.
[00:12:17] Unknown:
Unfortunately, Unfortunately, I actually don't have a good answer here, you know, that's better than simply having someone who's done it before, which is perhaps the cheapest answer 1 can give. On 1 unfortunate side effect of the explosion in data vendors is that and I've spoken to new data people regularly about this kind of thing. It's a dizzying effect. There's a new data person who goes around and googles things. There's a 1, 000, 000 data vendors out there now, and every you know, half of them have their own marketing blog posts, advice blog posts, and so on.
And you can be misled, you know, because ultimately, there's an ulterior motive. You know, there's most people out there want to sell you something in data land, and it can be very distracting. And so I don't haven't yet come across an honest online resource that truly advocates for, for example, keeping things simple to begin with. So I just don't have a good answer here besides find someone who's done it before who could cut through all the vendor BS and keep things simple for you.
[00:13:13] Unknown:
Yeah. That that's not always easy to come by either because most of the people who are going to be easy to find to answer those questions are going to be the vendors. Yeah. Yeah. Exactly. Yep. Or, you know, sort of paid influencers by the vendors and so on. No. It's a terribly, I mean, shadowy situation. In data land, I think, in particular,
[00:13:28] Unknown:
the data industry, if we can call it that, I suppose perhaps after Snowflake IPO, there was just this explosion in funding. It's a lot. It's a lot. You know, you're a new data person. You start googling. Everyone's won the SEO game. You're really battling forces you just cannot win against if all you want is an honest answer. So, hence, I myself don't have a good answer here besides, hopefully, find someone who can give you an honest view. Yeah. My my,
[00:13:55] Unknown:
tongue in cheek answer is, well, just listen to all I think it's something like 300 some odd hours worth of audio in this podcast feed.
[00:14:02] Unknown:
Oh, yes. Yeah. Yeah. Maybe ex exclude the vendors, you know, if you can.
[00:14:06] Unknown:
Yep. And the other kind of poignant challenge to building up that initial data capability is if you do decide that you need to hire someone externally, particularly if you don't have any existing data talent, is getting the hiring manager to a point where they feel comfortable and confident going through that hiring process, drafting what the job description looks like, what are the responsibilities, what are the skills, how are you able to assess those skills effectively to understand whether the person you're talking to can do the job that you're trying to hire them to do? And then we'll dig a bit more into the, you know, after the hiring part. But just in that hiring process, like, how do you help that hiring team effectively find and evaluate the the potential candidates?
[00:14:54] Unknown:
Yeah. This is another hard 1 that's perhaps best to answer through analogy. I mean, being a company founder, you're you're also an expert in every topic, and there's an analogy here. You know, you're looking to hire your 1st head of marketing. Maybe you're a technical company founder. You've never really marketed even a lemonade stand in your life. What do you do? And I think advice applies. Your investors will tell you, at least the good ones will, just find someone. Find someone you trust who maybe then know someone they trust, who they consider to be good at this role, and effectively bring them on as a sort of consultant or have them give you advice on what to do.
Because, again, for data in particular, there really isn't a great guide out there to keep things simple minus any vendor BS. I still lean on, you know, somehow, if you can't find someone directly, find someone who you trust who may know someone. People end up being pretty well connected in tech. No matter where you are, someone's sort of 2 degrees away. And, again, it's a perhaps a disheartening answer. You know? I suppose I should write a blog post, but the problem is I'm a vendor now, and so I, you know, I can't be trusted. But, yeah, that's really what's highly on. Again, just find someone you trust who maybe they know stuff directly or they know someone who they endorse as a data person and just get advice from them. Because it does depend on your task at hand. Again, it may be that you just don't need to hire a data person yet. You know, you go, well, we need to hire a data team because we want 2 data points in Salesforce.
Well, I'll point you to a tool that can just do that, and then off you go. Right? You can get back to work. Hence, yeah, it's a case specific situation. Find someone you trust somehow or someone who you trust who then trust someone else.
[00:16:30] Unknown:
I I guess that's another interesting topic to dig into is, are there any useful heuristics to understand need to understand what are the capabilities that are out there for tools that are
[00:16:47] Unknown:
targeted at people with my level of kind of capacity and understanding. Yeah. The the best heuristic I found is if questions are piling up and answers aren't. And, you know, I remember this 25 person started, this ended up growing very quickly that's I was at most formatively. They were company named PlanGrid at the time, they were bought by Autodesk, but they were serving the construction industry. Now the first query I ever had to run was 1 that was answering the question, where are our paying users coming from? Where are they located?
Now I ended up uncovering 65 different countries covering users, and the company had no clue. Money was coming in. People were signing up, paying online. This is a concrete question. Right? Well, where are users coming from? So we could perhaps direct some marketing budgets in certain places. There's a business impact to this question. The question has no answer as of yet because there's no data person who's assembled even a warehouse to cobble together production database plus Stripe, for example. So that's my favorite heuristic is our business questions that are important to the business. They're important, meaning they keep coming up, and they just linger. They never disappear on their own because they need an answer.
These guys keep piling up and the answers correspondingly don't, then go find a data person. You know, we're spending money on ads. Right? Which ads or campaigns are, you know, is a classic 1, of course. Right? Which ad campaigns are performing the best? Well, you know, marketing team has no clue. Let's find someone.
[00:18:13] Unknown:
And then in that hiring process, once you have drafted the job description, hopefully without everything in the kitchen sink, you know, requiring that they know everything about data for the past 15 years that has only been around for 5. How much actual kind of firsthand knowledge does that hiring manager or the interviewing team need to know to be able to understand whether or not the person they're talking to has those requisite skills or has the capability to do that work?
[00:18:45] Unknown:
This is again, it's a tough 1 if you have no data background yourself and no data instincts. 1 thing 1 here is to confirm, this applies in general for hiring people with, you know, suppose you can't find anyone who can test this person. You don't know anyone you trust or they don't know anyone they trust who's data fluent. 1 heuristic is just ask them to explain things and really dig in deeply to get a satisfying answer and narrative to your questions. You know? So you may ask them, well, what would you do to recommend where marketing allocates their budget for ad spend, for example? And then they go, oh, yeah. I just I'll talk to the marketing boss and, you know, just get the answer from him, and then we're good to go. Well, I mean, clearly not good enough there. But if they can really construct a narrative, well, I'd examine these you know, what are you guys doing today, for example, is a great first question from someone who's just interviewing for this sort of investigative role.
Someone who starts gathering the facts and then goes, okay. Based on these facts, there are 3 potential paths I I go down, and this is how I'd go down them to figure out what the right path would be. So really just pressing the candidates. I think this applies to any role where the other person can be s u. You know, market hiring for marketing is a good 1 to do this. Just really drill down and go, well, how would you do x? How would you do y? Why would you do zed? And so on. Then just use your instinct to go, well, I have a detailed picture here. This seems plausible.
[00:20:09] Unknown:
The the other fun question to ask in any job interview are, what are some of the things that have gone wrong, and how have you addressed them?
[00:20:17] Unknown:
And Oh, yeah. Yeah. That's a great 1. And if they give you an honest answer, then you could be fairly sure that they're giving you some decent answers to the rest of your questions. That's a great 1. Yeah. Because that implicitly also tests, I think, the ability to learn and grow, which is going to be essential. Data needs to morph as companies
[00:20:35] Unknown:
grow. Absolutely. I've got a, systems operation background myself. So 1 of the fun questions to ask in those hiring interviews are, tell me about the time that you took down production.
[00:20:45] Unknown:
Oh, yeah. Yeah. Yeah. Yep. Yeah. With with engineers, 1 thing we ask them is, like, give us your favorite or, you know, could be most hated bug you've caused in production. Absolutely. And
[00:20:55] Unknown:
so once you have gone through that hiring process, you select a candidate, you hire them. Because of the fact that you are building out your data estate from scratch, what are some of the useful onboarding tasks to get them, you know, up and running and effective, and some of the core skills that are most critical for them to be able to actually deliver on the initial set of tasks and goals.
[00:21:23] Unknown:
I think for anyone new who comes in, they will want to know what's important. And so your role is to just give them an overview of the company and its needs and then rank actually have onboarding tasks that's caused them to have quick wins that are also high impact. And this should be pretty easy. If the company has no data people and you're the first 1 to come in or second person to come in, there are going to be very easy wins you can achieve. I think it could be some silly dashboard for the CEO, could be piping a few data points here and there. But it's in prioritization's the big 1. And just rather than them coming in and you giving them a huge pie in the sky project that can take 4 or 6 months, Give them something that takes, you know, 3, 4 days and connect them with the person who's waiting for that task.
Thinking they could just then feel the quick win.
[00:22:19] Unknown:
And then beyond those initial first days, as a first hire, the only person constituting the data team, what are some of the uphill battles that you're likely to face in the process of getting data collected, aggregated, doing the analysis? And then in particular, working with the business to ensure that the work that you're doing is on the most important things and on the deliverables that are going to have the greatest impact.
[00:22:49] Unknown:
Yeah. The I think many of the uphill battles are actually caused by the data people who perhaps come in with too enthusiastic a view of everyone's going start listening to them because they are the data person. There's a self inflicted uphill battle here where you think, well, no 1 in this company is gonna make decisions before consulting me, the data person. And you come in with this attitude, and, of course, this becomes an uphill battle because marketing guys are gonna tell you, you know what? I'm good to go. I don't need you. And you're gonna be outraged going, what do you mean you don't need me? I'm a dating person. You're flying blinds. Trust me. There's an uphill battle there because, of course, what's happening is people are going, who is this person who's stepping on my turf? And it's really worth not making that an uphill battle by not having that attitude. And most of the job is going to be a people job. You're going to have to sit with everyone, forge alliances, so to speak, and really just make yourself available to help rather than to steer them in their own jobs.
This is sort of how you mitigate that uphill battle and just don't make it uphill. Some people will go, we have no data needs. You may disagree, but tough luck. Just acknowledge it and make yourself available to help with anything else. Maybe they have some small task and so on. On. It's a question of building trust that can become an uphill battle if you let it be, but if you simply acknowledge ultimately they're the ones running their jobs and you are there to support them, you can make it less of an uphill battle.
[00:24:16] Unknown:
The other challenge, particularly for somebody who is tasked with doing everything in a particular domain is avoiding rabbit holes that maybe look like a valuable exercise to begin with, but ultimately aren't going to get you to the end result that you're looking for. And I'm curious if there are any categories or specific rabbit holes that that these people should be watching out for or experiences that you've had of ending up at a dead end that looked like the road to delivery?
[00:24:44] Unknown:
I mean, I've been in this situation. It's it's actually harrowing, and I'm guessing mildly stressed just thinking of past situations. The rabbit holes, the big ones stem from you sort of making decisions on what's important. And it sounds crazy. It's just, well, why would this person make any decisions? They're not in charge of this function or that function. But, again, it's the data people. They're they are prone to this. They're gonna go, well, if I only produce this amazing analysis, the whole company will reorient to go east instead of west. I'm just gonna make an exact presentation. Everything's gonna be sorted out. We're good to go. And they go down this so called rabbit hole of, you know, crazy sophisticated analyses, plots, and graphs, and all this stuff. And then someone's curious, so they go, yes. Come into our meeting and present.
And they present with without having spoken to anyone as to whether this is even an important topic or whether anyone's even willing to change direction based on any evidence, if you haven't gathered a willingness to listen to data and haven't gathered proof of that, You will go down a rabbit hole that you'll end up frankly wasting everyone's time, especially yours, and you'll come back despondent and bitter. This isn't a data driven company after all. Why did I join these guys? Time to quit. All you had to do was simply ask people upfront. Do you care about this topic? Is this a direction that you're considering changing? Because I have some thoughts. And they may go, no. This is not on the table. This is a q 4 topic, you know, as they say in the corporates world or a q 3 topic or whatever.
Save yourself time. Skip the rabbit hole. But there's, again, an aspect of, I think, one's ego becomes a bit too big on the so called data driven quest that can lead you to really sort of think you're sort of leading the company in a way when really you are a supporting data person and no more. And there's nothing wrong with that. We're all supporting roles in our own senses. But it's really again, it is that. There's just something about, I think, the the data driven role that causes them want to go, I will change everyone's mind with a few plots and great analysis.
[00:26:42] Unknown:
Yeah. Another rabbit hole or pitfall that, as engineers, we're all subject to at different, points in time is the fact that, oh, well, the ecosystem says that if I'm going to build a proper data platform, then I need technologies a, b, c, and d so that I can do x, y, and z when all that you really need to do for being able to answer the questions is run a SQL query on the application database just to get something moving. And once you have that, then you can actually start figuring out what are the things that I actually need to be able to deliver on these in a reliable fashion versus I have to build the entire suite of platform tools before I can even start to answer any questions.
[00:27:25] Unknown:
Yep. And and, yes, this is actually a great classic 1. The production read replica database is a very underappreciated entity. There is nothing wrong with spinning up a read replica. Run some you know, you're a start up. Right? So how many customers do you possibly have? A 100, 200, 300? Who cares? 10, 000? That's nothing. Spin up your read replica, run your 1 off queries, answer questions, and just move on. It's absolutely true. This is perhaps another blog post to be written, the underappreciated aspects of re duplicates. There's really no need for a data circus if your questions are basic. Yeah. But the problem is you erode trust too. You come in and you go, I cannot do my job.
I'm tech I'm I am trust me. I am a competent person. But at the same time, I cannot do my job until you give me a giant budget to go buy a crap load of tools and take 4 months to set them all up.
[00:28:14] Unknown:
Yes. Absolutely. There there's a lot to be said for just being able to get something up and running so that you can run a few queries and answer some questions so that people have something to look at before you then say, okay. Now you need to build out the x, y, and z, especially in a startup environment where you have 1 application. And as you said, it's probably not serving that many people, so you're not gonna be you know, you're you're it's the, killing a fly with a bazooka problem.
[00:28:37] Unknown:
Exactly. Yeah. And this is really falls under the umbrella of quick wins. Just get the quick wins and move on with your life. Like, no 1 really cares how you did the job, but people will care about the results. Yeah. Yeah. In
[00:28:48] Unknown:
my current role that I'm at right now, we started with just set up a redash server, connected to all the application databases, and that got us where we needed to be for a while. You know, now we're at a point where we actually do need to say, okay. We need a proper data platform. We have too many applications. Redash is falling over all the time, so let's actually build something that scales and works, and, you know, we can evolve more effectively. But, yeah, just just get something that works first.
[00:29:14] Unknown:
The duct tape can take you quite a long way. Yeah. And and it can be for NASA to work for you. Yeah. And and it can be quite,
[00:29:21] Unknown:
informative when you do get to the point of, okay, I need to build a data platform because you can see these are the problems that we've been encountering. These are the shortcomings. These are the questions people are trying to answer, and these are some of the better ways for us to answer them effectively.
[00:29:35] Unknown:
Yep. Absolutely.
[00:29:37] Unknown:
And in a startup, hiring and growth, depending on kind of where you are in your journey, can be of utmost importance or the last thing anybody wants to think about. And I'm curious if there are any useful signals that you've seen for when the, solo data person needs to start advocating for, I actually need more people to help me get my job done, or at the organizational level, things that they need to be thinking about to signal, okay, we need to start growing our data team, bringing in more diversified talent or specific skills?
[00:30:13] Unknown:
It's again a question of being overwhelmed with work. You know, if you, as a sole data person, find yourself in frequent situations where the other person is pissed off because you haven't delivered stuff on time or because you haven't gotten to the request yet or you owe them a status update update and you forgot because you're so overwhelmed with other things. If you can see this path continuing for another 3 months, that's a signal to hire someone else. If you go, you know what? It's very clear. This this, mess is gonna last 3 days, and I'm gonna be over the hump and back to cruising altitude, then there's no need to hire. But if you can look 3 months in the future and go, yep, I'm gonna keep piss keep pissing people off because I'm not delivering, Go hire someone. And it applies on the other side too if you're constantly being let down.
I've seen situations which are quite pernicious where the company sets up a mandate for a data team, hires their 1st data person. This person doesn't hire fast enough that other teams start looking to hire their own analysts because they don't want to wait. Finance, marketing, or 2 actually, sales even, right, with sales operations, they all just go, well, our requests aren't being met. We're just gonna hire our own people. And if they do go ahead with that, I've seen this happen, you end up with effectively 5 data teams in the company who aren't really talking to 1 another. And then the data team are going, well, what's our role exactly? And then they kind of become a product analytics data team and ditch all other aspects of their ostensible wide mandate. So 1 has to be really careful and just react quickly to hiring using this heuristic, I think. It's worked well.
[00:31:51] Unknown:
And the other avenue for scaling that capacity too that is, I think, often overlooked because it's seen as either not viable or too much, upfront investment is that maybe you don't actually need to add new people to the data team. Maybe what you really need to do is educate the other people in the organization to be able to answer their own questions with the sis with the systems that you already have in place. Yeah. Yeah. Absolutely. There can be an element of, falling prey to
[00:32:22] Unknown:
the hero pedestal. You as the data person, again, you sort of, gets off on people coming to you repeatedly with question and request and question and request. Your main job, just like everyone's job, I think, is to make yourself redundant. Now if your company is growing, you'll never get there, and so don't fear your job. And if your company isn't growing, then, well, maybe go find another company. But it is, I think, it behooves everyone, and, again, especially data people to try to make themselves redundant. If you're running the same query over and over again, right, then just get some sort of, again, dashboard that gives people the answer without them bothering you. Just set up a Cron job in an email. Something. Yes. It's Cron job in email CSV. Right? Great.
[00:33:01] Unknown:
I think that that's 1 of the pitfalls that we as technologists fall into is that, oh, well, we can't just use something like Cron because that's outdated. I need the new fancy thing. I I need airflow. I need DAXTER to be able to run this 1 task that's the only thing that people actually care about.
[00:33:18] Unknown:
Yep. Yep. Yeah. There's a search you know, I think the pride is hurt a bit. You know? Oh, I've worked so hard in my career so that I could move on to these sort of sexy tools. Yeah. But reality is, again, everyone starts by crawling before walking. Yep. And organizations do too. And even if you're a sprinter, you're gonna have to crawl with them for a bit. Yeah. And
[00:33:37] Unknown:
particularly in a startup environment where everything is rapid fire and you're not necessarily moving in a a linear direction, I think that that's the other thing that teams need to be thinking about it or particularly if it's a solo individual is what are the opportunities for automation even if that automation is just a cron job? Even if it is just I leave my laptop on overnight to make sure that this thing runs? Like, what are the things that you can set up so that you don't have to be the person to do it manually every single time because that will just be a death by a 1, 000 cuts, and you will very quickly lose momentum and be stuck just running the same thing over and over again. A 1000000%. Yeah. Again, I've done embarrassingly, and this is many years ago, I've done the aspect of leaving my laptop running overnight with no 1 knowing that that's the root mechanism to get them stuff they want. But it's absolutely true. Really just,
[00:34:26] Unknown:
again, focus on the quick wins, couple that with automation. If it's automation coupled with a quick win philosophy, then low tech will get you going initially. Absolutely. And in your own experiences and in working with other people in similar circumstances,
[00:34:41] Unknown:
what are some of the most interesting or innovative or unexpected ways that you have seen some of these initial data hires tackle the challenges that they're faced with at in these start up environments?
[00:34:50] Unknown:
Interesting. You know, I haven't actually seen anything I'd call innovative. I have seen winning approaches versus non winning approaches. And so if you can forgive me, maybe that's the only question I'll answer. The winning approach again, I've employed the losing approach and certainly have the battle scars for that. I think touching on something I said earlier, the effective and winning approaches are those that are people based. You know, people who come in and really understand that their own role is 1 of a supporter, an augmenter of other teams rather than someone driving the other teams.
Right? Someone who comes in, and their first days on the job aren't even producing analyses. It's actually meeting with department heads to understand the problems that are on their minds. That's the sort of approach I've seen work best.
[00:35:42] Unknown:
And in your experience of being an early data person and in building and growing data teams in these fast moving environments, what are some of the most interesting or unexpected or challenging lessons that you've learned in the process?
[00:35:56] Unknown:
The big 1 the big lesson I've had to overcome, which, you know, my ego hates, or hated, I suppose. I'm over it now. I'm old. But, was just how much of the job is not technology based. This was initially a very bitter pill to swallow because I have come in expecting to swim in 7 cutting edge tools, fancy infrastructure, and so on. But something you touched on with the cronjobs now, I'm not advocating, you know, raw cronjobs for everyone, But there's a certain mentality there that does apply to the job more than people expect. And, well, there's really nothing more to say on that. Just the fact that it's so people based, again, it's so relationship based, does also mean it's not so much technology based or at least a lot less than people assume. And, again, this is hard because you couple that with all this vendor content out there, and the new data person really tends to have their expectations skewed, I found, in this day and age. They've never really had a data job. It's their 1st age job. They walk in, and they just expect fancy cutting edge stuff that's set up by them.
And there's a certain despondency I think 1 hits when 1 realizes, oh, hang on. Weekly CSV is gonna do the job here, Amit. Again, mismatch expectations, but that's a big 1 for me. And
[00:37:08] Unknown:
for teams and individuals who are on either side of this balance of being the 1st hire or trying to start these data capabilities, what are some of the, maybe, alternatives that they should be thinking about of maybe it does make sense to just bring on a contractor for a couple of months to get you up and running and teach your you know, teach everybody how to fish versus we actually need a full time person. What what are some of the, I guess, shortcuts or alternative paths that you have, found to be useful where maybe bringing on a full time data hire is actually not the right path at least for right now?
[00:37:49] Unknown:
I think if it's unclear if you haven't convinced yourself as to whether there's enough for a full time person to spend the next 3 months on, then certainly bring in the contractor who's sort of available on retainer 20 hours a week. The bit where people really mess up is they bring in the contractors, and then the contractors are out. So the contractors come in, set up stuff, and then, well, there's no 1 to maintain this. There's no 1 to build on it. There's no 1 to even use it or get around quirks. The hazard with consultants, and at this point, it's actually a personal opinion rather than fact, is that they have no investments in you know, it's not their jobs that are on the line. I mean, it's a part time situation by definition.
They get paid to deliver a piece of work, and then they're gone, usually. And it's something 1 has to be very careful about because institutional knowledge is something that counts for a lot. Right? In the data realm, especially, you know, knowing that your head of marketing is not really a fan of this sort of method of communication through an analysis is something that no contractor will know. So as much as possible, I'd actually argue towards hiring full time people, unless it's just extremely unclear, in which case, have some contractor on retainer for 20 hours a week rather than someone who comes in, delivers infrastructure, and leaves.
[00:39:08] Unknown:
Are there any other aspects of being an early data hire at a start up or being a start up trying to onboard new data talent that we didn't discuss yet that you'd like to cover before we close out the show?
[00:39:21] Unknown:
I think it was 1 which is regarding how to organize your data team. Right? So suppose you're a team of 1, you've hired 1 more person. And second, a third, a fourth, you're now a team of 6 or 5. You will probably get involved in these arguments where other teams will go, we're gonna hire our own analysts. What will you tell them? Right? It's always worth having a counterargument in mind if you believe in a counterargument. I am a fan of the agency model, so to speak, or the ambassador model, where there is a central data team, but every other team, marketing, finance, and so on, gets an assignment, an ambassador, if you will, from the central data team.
At some point, I was looking to her participating in hiring a VP of engineering a while ago, and then 1 candidate said something years ago that stuck in my mind, which was I need a designated throat to choke if things go wrong.
[00:40:09] Unknown:
And there's something very comforting, I think, for the other teams to have a familiar face. And so certainly adopt a central model, but have an ambassador system where every team knows who their go to person is on the data team. Alright. Well, for anybody who wants to get in touch with you and follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as the final question, I'd like to get your perspective on what you see as being the biggest gap in the tool in your technology that's available for data management today.
[00:40:38] Unknown:
There are companies working on this. I think observability, it's actually an indictment of all existing tools that we even need companies dedicated to data observability. But, you know, here we are. Maybe this is because we're in a nascent industry, and these companies will be unneeded after a while. But that's the big 1 for me. Just there's a lot regarding the mechanics of running data operations, but actually knowing if your data even makes sense, has something gone wrong somewhere that doesn't even involve a vendor really, is somewhere where just very nascent. I'm actually quite interested by all the work going on there.
[00:41:17] Unknown:
Alright. Well, thank you very much for taking the time today to join me and share your thoughts and experiences on data work for some of these early stage companies and some of the unique challenges that that poses. It's definitely a situation that a lot of people are going to find themselves in, so it's great to be able to have some of that context and, shared experience. So appreciate you taking the time today for that, and I hope you enjoy the rest of your day. Yep. You too. Thanks, Tobias.
[00:41:49] Unknown:
Pleasure. Thank you for listening. Don't forget to check out our other shows, podcast thought in it, which covers the Python language, its community, and the innovative ways it is being used, and the Machine Learning podcast, which helps you go from idea to production with machine learning. Visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. And if you've learned something or tried out a project from the show, then tell us about it. Email hosts at dataengineeringpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and coworkers.
Introduction and Guest Introduction
Responsibilities and Challenges of Data Teams in Startups
Common Fallacies in Building Data Capabilities
Moving Away from Dashboards to Operational Analytics
Forming and Growing Data Teams
Education and Resources for Data Teams
Hiring Process and Evaluating Candidates
Onboarding and Initial Tasks for Data Hires
Challenges and Pitfalls for Solo Data Hires
Avoiding Rabbit Holes and Overengineering
Scaling Data Teams and Capacity
Innovative Approaches and Lessons Learned
Alternatives to Full-Time Data Hires
Organizing Data Teams and Ambassador Model
Biggest Gap in Data Management Tools
Closing Remarks and Contact Information