Summary
Analytics projects fail all the time, resulting in lost opportunities and wasted resources. There are a number of factors that contribute to that failure and not all of them are under our control. However, many of them are and as data engineers we can help to keep our projects on the path to success. Eugene Khazin is the CEO of PrimeTSR where he is tasked with rescuing floundering analytics efforts and ensuring that they provide value to the business. In this episode he reflects on the ways that data projects can be structured to provide a higher probability of success and utility, how data engineers can get throughout the project lifecycle, and how to salvage a failed project so that some value can be gained from the effort.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, and a 40Gbit public network, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. If you need global distribution, they’ve got that covered too with world-wide datacenters including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
- Managing and auditing access to your servers and databases is a problem that grows in difficulty alongside the growth of your teams. If you are tired of wasting your time cobbling together scripts and workarounds to give your developers, data scientists, and managers the permissions that they need then it’s time to talk to our friends at strongDM. They have built an easy to use platform that lets you leverage your company’s single sign on for your data platform. Go to dataengineeringpodcast.com/strongdm today to find out how you can simplify your systems.
- Alluxio is an open source, distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Alluxio unlocks the value of your data and allows for modern computation-intensive workloads to become truly elastic and flexible for the cloud. With Alluxio, companies like Barclays, JD.com, Tencent, and Two Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/alluxio today to learn more and thank them for their support.
- You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Dataversity, and the Open Data Science Conference. Go to dataengineeringpodcast.com/conferences to learn more and take advantage of our partner discounts when you register.
- Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
- Your host is Tobias Macey and today I’m interviewing Eugene Khazin about the leading causes for failure in analytics projects
Interview
- Introduction
- How did you get involved in the area of data management?
- The term "analytics" has grown to mean many different things to different people, so can you start by sharing your definition of what is in scope for an "analytics project" for the purposes of this discussion?
- What are the criteria that you and your customers use to determine the success or failure of a project?
- I was recently speaking with someone who quoted a Gartner report stating an estimated failure rate of ~80% for analytics projects. Has your experience reflected this reality, and what have you found to be the leading causes of failure in your experience at PrimeTSR?
- As data engineers, what strategies can we pursue to increase the success rate of the projects that we work on?
- What are the contributing factors that are beyond our control, which we can help identify and surface early in the lifecycle of a project?
- In the event of a failed project, what are the lessons that we can learn and fold into our future work?
- How can we salvage a project and derive some value from the efforts that we have put into it?
- What are some useful signals to identify when a project is on the road to failure, and steps that can be taken to rescue it?
- What advice do you have for data engineers to help them be more active and effective in the lifecycle of an analytics project?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Prime TSR
- Descriptive, Predictive, and Prescriptive Analytics
- Azure Data Factory
- Azure Data Warehouse
- Mulesoft
- SSIS (SQL Server Integration Services)
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the project you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With 200 gigabit private networking, scalable shared block storage, and a 40 gigabit public network, you've got everything you need to run a fast, reliable, and bulletproof data platform. And if you need global distribution, they've got that covered too with worldwide data centers, including new ones in Toronto and Mumbai. And for your machine learning workloads, they just announced dedicated CPU instances to ensure that you get the performance that you need.
Go to data engineering podcast.com/linode, that's l I n o d e, today to get a $20 credit and launch a new server in under a minute. And don't forget to thank them for their continued support of the show. And managing and auditing access to all of those servers and databases that you're running is a problem that grows in difficulty alongside the growth of your teams. If you're tired of wasting your time cobbling together scripts and workarounds to give your developers, data scientists, and managers the permissions that they need, then it's time to talk to our friends at StrongDM. They have built an easy to use platform that lets you leverage your company's single sign on for your data.
Go to dataengineeringpodcast.com/strongdm today to find out how you can simplify your systems. An Eluxio is an open source distributed data orchestration layer that makes it easier to scale your compute and your storage independently. By transparently pulling data from underlying silos, Eluxio unlocks the value of your data and allows for modern, computation intensive workloads to become truly elastic and flexible for the cloud. With Aluxio, companies like Barclays, jd.com, Tencent, and 2 Sigma can manage data efficiently, accelerate business analytics, and ease the adoption of any cloud. Go to dataengineeringpodcast.com/ today to learn more and thank them for their support.
You listen to this show to learn and stay up to date with what's happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers, you don't want to miss out on this year's conference season. We have partnered with organizations such as O'Reilly Media, Dataversity, and the Open Data Science Conference. Go to data engineering podcast dotcom/conferences to learn more and take advantage of our partner discounts when you register. And to help other people find the show, please leave a review on iTunes and tell your friends and coworkers.
Your host is Tobias Macy. And today, I'm interviewing Eugene Kazin about the leading causes for failure in analytics projects. So, Eugene, could you start by introducing yourself? Sure. So I am currently a partner in a technology consulting firm
[00:02:53] Unknown:
called PrimeTSR. Started it about 6 years ago with my partner, Josh Davidson. But going all the way back to 1998, I started out as a database developer cranking out reports for a marketing firm. Then I became a database administrator and eventually moved into data architecture. I tended to work all over the place in terms of types of companies where I kinda worked with. Worked with some large big brand companies like Motorola, for example, and then work with work with some tiny start ups. But the unifying thing about all those companies that I was lucky enough to work with were that they were dealing with large for the time of, you know, let's say, back in 15 years ago, 4 gigabytes was a large dataset.
And, you know, I got lucky to work with all these large datasets. And so fast forwarding to 12 years ago, I see this as a general gap in the marketplace. I noticed that working with data at scale was hard for companies. They struggled with it. And so I started my first consulting firm. I became a consultant myself, helping clients deal with their data at scale with their data platforms, processing the data, managing it, and just general data engineering, type of activities. And so today, PrimeTSR, has a large focus on that, on the analytics, on data engineering, and it all kinda grounds back to my passion for the for building scalable data platforms.
[00:04:22] Unknown:
And so as I mentioned at the open, we're talking about sort of the overall scope of failure in analytics projects. But because of the prevalence of analytics and business intelligence and machine learning these days, the term analytics itself has grown to mean so many different things to so many different people. So I'm wondering if you can just start by, providing the definition that you tend to use and just how you want to sort of frame the idea of what an analytics project encompasses for the purpose of this discussion. Yeah. You know, being a consultant, we tend to see all sorts of ways what people call analytics,
[00:04:58] Unknown:
starting all the way back to traditional descriptive, just operational reporting type of workloads and, you know, progressing to decision support systems and then getting to more advanced predictive and AI. I think all of these, in my opinion, have a right to call themselves analytics, but it's really about the taking a pragmatic look at, at the maturity of your organization and deciding where on that scale what you should be doing. And there are 2 measuring questions that I would say any business person or any company that's trying to get into analytics should ask yourself is, what do we, as a company, really have some good use cases for more advanced analytics, or should we still be focusing on descriptive stuff? And even if, second question is, even if we did come up with good u analytical use cases, do we really have the ability to get all the data that we need to satisfy those use cases? And, you know, this isn't necessarily a technology problem, but many companies I see struggle with these concepts, trying to bite too much. And the other sort of open question
[00:05:58] Unknown:
is how you define failure in the context of these projects because different people might have different ideas of what constitutes success or failure at the outset, and also that definition may change over time as they're progressing through building and releasing the project or trying to maintain it once it's been launched to production. Yeah.
[00:06:18] Unknown:
You're absolutely right. The defense is like, if you are a glass half full or half empty kind of person. It's all really about the outcomes. And I think 1 of the criteria for kinda judging a success or failure, The best way to do this is if you come up with an objective measures. How would you measure your success or failure? And, you know, I'll give you an example as part of our methodology as Prime TSR. When we go execute a project with a client, before we even get started, before any paperwork gets signed, We sit down with a client, and we ask them to think about concrete, like, numeric success criteria that could be used to measure that success or failure. And it's you know, for instance, I was just with with a client that's a large hospital, and we talked about how they wanna introduce analytics into reducing the amount of ER visits to to their ER. And so I asked, okay. So how do you measure the success of that project? The answer was, well, we can apply analytics. It will reduce the visits, then that's how we're gonna measure. And it took some probing and kind of priding to come up with, okay, the real goal is within a year after introducing the project, we wanna reduce our ER visits by 7%.
And that 7% is, in my opinion, is critical because it kind of removes that subjectivity of was it successful, was it not. Because it's really hard otherwise. And it's not not just analytics, but it's analytics especially because these are capital intensive projects. They end up costing a lot of money to organizations, and that
[00:07:47] Unknown:
inevitably causes some emotions. These are typically what I call career impacting projects, good or bad. It depends on your situation. And it's worth pointing out too what you were talking about there as far as the hospital said, oh, we have this goal. But then once you start digging, they actually have a much more sort of concrete business outcome that they're trying to drive towards, and that's worth noting for people who do work in these consulting or freelance space that you don't necessarily want to be an order taker. You want to be driving to providing actual value to your customers and figuring not going to get the outcome that they were hoping for. So making sure that you have that understanding ahead of time and have those measurable outcomes definitely helps to ensure that both sides are happy at the end of the project.
[00:08:40] Unknown:
Absolutely. And these are like I said, because they're career impacting projects, people get emotional.
[00:08:46] Unknown:
And, you know, that's not necessarily a good thing, especially when things are not going so well. And so I was actually recently speaking with someone else who quoted a recent Gartner report stating that there's an estimated failure rate on the order of 80% for most analytics projects, whether that's business intelligence or big data. And I'm wondering if your experience has reflected that reality, and in particular, what you have found to be the leading causes of failure in the projects that you've worked on at Prime TSR.
[00:09:14] Unknown:
It's funny that you say this. I think, honestly, because we internally had that conversation recently as well, and I feel like 80% is really an understatement. The reality, though, is it's not like it's 90 or 70. It's it's a bit more nuanced. Where I see a lot of projects kinda the cycle that they go through is they set off on a project, burn through the budget, and only accomplish 20 or 30% of what they were supposed to accomplish. So depending on how you look at it, you did demonstrate something out of the project. It just cost you a lot more than you expected. And so companies tend to depending again on the glass half full or glass half empty type of perspective, some would still call it a success, while others would call it a complete failure. So in that, that viewpoint gets massaged depending on the executive that's looking at it. So, we see we see people treating that differently for each company. But, an interesting way to look at it is, like, what would be the symptoms that are causing, you know, these these organizations to get there, to have those challenges?
It's such a high rate, 80 or 90%. You know, we can argue about that, but it's really there nobody's questioning that it happens. It fails a lot. And, in my experience, it's not necessarily always a super technical problem. So, you know, on the technical side, the data, as we all know, as data engineers, data is hard to consume. Right? For whatever number of reasons you got from, you know, wide variety of data sources, the data may not be clean, not well organized, not available continuously. You know, some you often get, like, a data dump, but then you struggle with feeding that data on a regular basis to really make your what you're building valuable. So that's a technical piece. But there is also you end up often in situations where you present something to your stakeholders and you start seeing questions like, oh, where did this data come from? It should be coming from a different data source.
Or, you know, if you have data and multiple data sources and it's the same dataset, but which 1 should we consider the source of truth? And so, like, addressing that upfront, I think, goes a long way to to dealing, to preventing some of the problems. And and then last on this topic, I think another common cause of failures that I see is that projects take on too much to to tackle. They're just trying to, kinda what I call, boiling the ocean, where these analytical use cases sometimes are grandiose, but they would take years to do. And executives with budgets, they're also humans, and they lose patience. And so you wanna chunk up your problem into manageable pieces so you can demonstrate progress
[00:12:02] Unknown:
some internal capabilities that, as you said, you deliver in bite sized pieces where you can actually get some sort of feedback of, is this even moving us in the right direction, or am I totally off base with this? Because if you just, you know, as you said, do the order taking of, know, this is what I want. Okay. I'm gonna go spend 6 months, a year, however long to build it, and then you deliver it, and then either the business opportunity has moved on or you're completely off base of how you're expecting the data to be, or, you know, the data sources have changed since you first began working on it. And then, you know, you you you have all of this expenditure and all of this, capital tied up in this project, but there's no way to actually sort of gain any real value back out of it, and you, you know, ultimately need to scrap it. Whereas if you do it in pieces and then try to make it a sort of composable process and a composable product, then there's much more probability that you're going to end up with something that's either successful by the original sort of markers of success or at least useful in some other context for some other team or some other purpose.
Exactly. Yeah. That's well put. And so as data engineers who are involved in providing a lot of the foundational layers for these analytics projects, What are some of the strategies that we can pursue to increase the overall success rate of the projects that we work on and ensure that they don't get bogged down in this sort of, long tail of trying to get something delivered that doesn't end up fulfilling the original business needs. Yeah. So piggybacking on what you said previously, right, about the, you know, not trying to boil the ocean,
[00:13:38] Unknown:
it's really important to find a small use case that you can go with. And you can really use that use case to incrementally find the right data elements that are necessary just for this use case. Use that as an opportunity to set up all the back end plumbing and all the processing needed to process that data. Once you have it working and you have something to demonstrate, you can claim your initial success, and it it goes a long way. So that's 1. The other, I think, generally, as data engineers, we are in the business. I tend to tell this to my teams. We are in the business of reducing the friction, of minimizing the friction of getting to the data. Right? What can we do to make the data as consumable, as useful as possible?
Because once you get it in that spot, there are far more use cases that popping up all over the place that wanna get access to that data, that wanna pitch in with budgets, and try to try to leverage the analytics, which at the end, since we are all champions of analytics, it's good for everybody. It's good for the company. So summarizing, minimizing the friction of access to the data and finding the small use case to work through them and incrementally show value are the 2 main tactics to help data engineers increase the success of their projects.
[00:14:56] Unknown:
And also as data engineers, it's easy to get mired down and just trying to maintain the status quo and keep systems running in the day to day. But what are some ways that we can engage with the business earlier on in the projects to ensure that we're able to provide our insights and our, sort of knowledge to what's possible or what's what what we're capable of institutionally and organizationally, but also from a technical perspective, and maybe identify some ways that we can use existing infrastructure rather than having to embark on an entirely new project for providing some analytics capabilities.
[00:15:34] Unknown:
Yeah. So as an engineer, you are already day to day, as you mentioned, working kinda and with the data. You know what it takes to access the data, what it takes to you know, in bigger organizations, that's the red tape process is often, you know, like, 50% of the effort. So understanding that process and explaining to the business of what it's going to take And the best kind of shortcuts or approaches to get to the data and, massage it, process it, and and and making it consumable, I think, I think goes a long way and shows a lot of value to the business. Because oftentimes, I see this working with executives. There is a huge gap between, you know, a leader coming up with a grand strategic idea, for another analytical use case or some dashboard. And then there are layers in between when it gets down to the the engineer that's working with the data, who's accessing it. You know, the effort that it takes is often really hard to conceptualize. So as engineers, if we can expose this information to the upper levels, we are ourselves become much more useful to the organization and also just generally,
[00:16:42] Unknown:
speeding up the projects or making them more realistic. And in your experience of working with your clients and working with your teams to try and produce these successful projects, what have been some of the components or the aspects of the project that are either outside of your control or, that your data engineers have difficulty with as far as maybe providing useful feedback or being able to just sort of understand the broader context of what's necessary, maybe any sort of, like, knowledge gaps or areas of improvement that people working in the data engineering space should be thinking about as they sort of progress within their careers and as the industry evolves? So I think 1 of the, you know, we slightly touched on that earlier about the data being trusted. So identifying
[00:17:30] Unknown:
this is almost like a data governance problem, not so much data engineering, but data engineers run into it every day. So, what data source is considered the source of truth for a specific element is a really hard problem to solve for many larger organizations. And so as a data engineer, because you're working with this data every day, getting an understanding of who you need to talk to to get access to the data, where this data is residing, trying to get everybody on the same page to establish that source of truth for a specific data element. I think it it it it's it's key, and it's something that the engineers are uniquely positioned to do because, again, they work with this data every single day.
And then second, I think, you know, again, data availability is often beyond our control. You run into things like, you know, compliance or, you know, you would like to when on its own, you can run reports against a single data source, and that's fine. But you come up with a more interesting analytical use case when you need to blend your data, and all of a sudden that presents some security problem or some compliance issue that that, you get flagged on. And so kinda keeping your eyes open for these things, understanding what are the governance, what are the compliance requirements within the organization or your also really helps identify red flags that may stop the project in its tracks, you know, let's say, 3 months after it gets started. And in the event that there is a project that ultimately results in failure,
[00:18:55] Unknown:
what are some of the lessons that you can learn that you can fold into future work? And if you have any specific examples of work that you've done to provide some sort of broader context or,
[00:19:08] Unknown:
specific examples that might be useful as well? Yeah. So I think regardless of whether your project is a success or failure, you tend to as an engineer, you tend to spend a lot of time digging through the data, working with the sources, understanding your data. And so inevitably, that, like, feeds nicely into the next project. It helps it tells position it and make the next project more successful. An example would be we recently completed a project for a large insurance company. We're as consultants, we don't always come as much as we would love to. We would we don't come at the beginning of the project. We are often brought in when there are problems. And so we came in, and the project was about reducing the amount of fraud and insurance claims. So they wanted to run some analytics and some some models risk models against the data to be able to detect fraud. And the initial failure of the project was tied to the data being so poorly structured because it came from a mainframe. And so we took a fresh cut, but we were able to also absorb the team that worked on the initial project. And even though they spent a year working with this data, kinda banging their heads against the wall because mainframes are hard to deal with these days, they were really super valuable to us and to the success to the success of the project at the end because they knew how to get to the mainframe. They knew how to pull the data out, and they learned the gotchas and all this kinda hidden things that come up that we wouldn't have never known if we started the project fresh like they did a while back. We brought in our additional skills of, we were able to process the data faster and quicker than the skill set that they were liking initially. But they had that subject matter expertise about the data from the initial unsuccessful attempt at the project, which which really helped. So the bottom line, it's not a lost effort when you spend time digging through your data, understanding it, and trying different options to to make it useful.
[00:21:09] Unknown:
And that leads me into my next question about ways that you can salvage a project that is either on its way to failure or has failed, whether it's trying to sort of revive the original goal or be able to just, extract some of the components or some of the work or knowledge that was gained in the effort to apply it to new projects or abstract it into something that is just broadly useful in the data industry and any experiences that you've had along that lines?
[00:21:38] Unknown:
Yeah. I think, you know, as consultants, again, we often come in to these, you know, what I like to refer, zombie projects. They're already dead, but they're still walking. And, we get brought in to try to rewive them. And so the I think the trick is to be able to show some value in the shortest amount of time possible. So identify some small nugget of outcome that would really bring value of the to your project pro project sponsors. And then the I think and this is a key. It's not just find that outcome, but also quickly try to visualize it using whatever tool you prefer for visualization. But, you know, picture is worth a 1000 worth. And when these executives, they start seeing things in a dashboard, you know, I've seen a lot of projects get additional runway just because they were able to demonstrate some sexy dashboards, to that were still kinda valuable. They weren't a throw away, but they've shown the fruit of the labor that that the engineers put in initially into the project and all that budget that went in in there. And so they tend to rewire these that tends to rewire the projects and get you a little bit more runway to show value. It's still a fire drill, but it helps.
[00:22:49] Unknown:
And in your experience of working with these zombie projects, what have you found to be some of the common issues or common patterns, either architecturally or organizationally, that have led to these types of failures?
[00:23:03] Unknown:
You know, I would say the number 1 kinda warning, red flag for me is when I see a project, I I come in I often come in to do, like, project QAs. So my first question is usually, you've been going for 6 months. When was the last time that you did the demo to your stakeholders? And, you know, sometimes people say, well, you know, we do it fairly regularly. And other times, it's like, well, we've spent 6 months working on the back end. There is nothing to show to the end users yet because it's all back end stuff. You're setting up the plumbing, getting the data in. And to me, this is a major red flag because, you know, if your stakeholders go on without a demo, they start getting kinda antsy. Where is my budget going? What's happening? Right? They wanna see the results continuously. So embracing this agile approach and demonstrating value quickly and visualizing it is 1 of those key things, that you wanna that you wanna do frequently in your project.
And then the other 1 is paying attention to questions. Well, let's say you do a demo and, like, really keeping your eyes open for questions like, where is this data coming from? I thought it was supposed to come from another application. So when the users start questioning your data, you better have some good answers, of where it's coming from, why 1 source is considered the source of truth versus the other. Because nothing kills a project faster than the lack of adoption. You can release the snazziest product ever with nice visualizations, but at the end, if your users don't trust the data that they want, and it's pretty hard to make it make them use something that they don't wanna use, they'll go back to the old ways.
So really focus on making sure you you garner trust of your end users in the data that they see. And so in your experience
[00:25:05] Unknown:
of working with these different analytics projects and trying to build systems that are effective and productive, what are some of the common patterns or technical platforms that you have incorporated into your standard toolkit when engaging with these different clients?
[00:25:23] Unknown:
Yeah. So depending on the use case, we tend to go either streaming or micro batching. In terms of the platforms, because we were born as a as a cloud company, we prefer technologies. We are a large proponent of, Azure and Azure Data Factory. We've done some largest implementations using Azure Data Factory and Azure Data Warehouse, allowing us to scale some fairly impressive workloads and process them very fast. So, you know, that example that I made earlier about extracting the data from the mainframe, the only way, for example, in that situation to scale and make the data available to for analysis and they allow the time frame was to leverage the Azure data warehouse and scale out the compute quickly to process the data in in less than an hour versus initially when they were trying to do this. It took them, like, 3 days to process.
So big fans of the cloud technologies, streaming, or micro batching is is kind of the standard patterns for us. And then we use, some technology to process the data. Sometimes it's cut between anything between custom dev. Too often clients have application some that they invested in. For example, something like MuleSoft. We're we also happen to be a MuleSoft partner, so we have the skills
[00:26:54] Unknown:
and we tend to leverage that platform a lot. And working as a consultant in the data engineering space, I'm wondering what you find to be any particular business challenges as far as identifying customers or raising awareness of the product offerings that you have and just engaging with them to help them educate to, to help educate them on the need for having this capability and, also whether you are engaged with working on training their in house staff to be able to take over the projects once it's complete and, any challenges that you encounter in that overall process?
[00:27:32] Unknown:
Yeah. So part of our engagement model is that we always wanna work in hybrid teams because what you just referred to, the problem of consultants coming in, building something, and then leaving. And the teams on the ground, you know, the client employees are sort of stranded. They don't know how to use a platform. It's super common. We hear about it. You know, we come from big consultancies where a lot of times this is a common issue. So working as a hybrid team, when we come in and we work as as part of our team, we always have some engineers from the client side. We have some some of their business stakeholders that are able to explain how their business runs, is critical to the, in my opinion, the success of the project. And, you know, the consulting engagements are always about concrete outcomes that you define well.
But so but that's only part of it. The other is really trying to understand the client's business and then introducing when you introduce new tech, making sure that the the people on the client side really understand the tech, that they're trained, that they they can carry on and maintain the system and and actually make enhancements to it in the future. And are there any particular
[00:28:51] Unknown:
areas of knowledge or technical capability that you find are, particularly difficult to train them up on or areas that are sort of commonly lacking in the teams that you work with as far as just it's not a concern that they've had to deal with, so they haven't really gained the technical aptitude in it? So, if I go broadly,
[00:29:13] Unknown:
I think cloud technologies and cloud native technologies are still a tough sort of thing to wrap for people to wrap their minds around. And we see that especially more established kinda brick and mortar organizations that still have their physical data centers and, you know, very much into hard into big hardware. The staff that they have are often just don't know how to operate, don't know how to think about the cloud. So that's just a more broad, context. And then digging in, we we tend to again, in the analytics space, we tend to see people focusing on large batches, large ATL, and that's also, you know, trying to switch them to micro batching and streaming is a hard concept to to kinda to teach.
Right? A lot of times, these people are you know, they're let's take mic in the Microsoft world, they would be developing SSIS packages all their lives. And all of a sudden, you're telling them that we are introducing a streaming architecture, and it's a completely different paradigm and a different skill set. And it's it's hard. No question about it. But, you know, this is a future. And, if the leadership of the this organization wants to wants to achieve their goals that that they set for the for the teams, they have to learn. And
[00:30:42] Unknown:
what are some of the overall industry trends in analytics or data platform technologies or just sort of general business awareness of data as a first order concern
[00:30:56] Unknown:
that you are most excited about or keeping close track of? Well, so, you know, I'll give you an example. I just came from a HIMSS conference, which is a large health care informatics conference in Orlando. It's something like 45, 000 people. And the biggest topic that everybody wanted to talk about was and this is a change. It hasn't been that way in the last few years. But this year, the whole topic was about AI and advanced analytics and machine learning. And so that was the top header. And level below, people's concerns were how do we get the right data? How do we get it prepped?
So all the things that data engineering works on and addresses. And it's exciting to me because we are finally seeing the business to really starting to understand, see the value in these advanced concepts. And, you know, there are lots of good vendors that are kinda helping them understand that, through tooling. But also probably a big part of why they're getting there is because the competition is pushing, you know, kinda stepping on their toes. And, also, a lot of the low hanging fruit that they could achieve without applying analytics has been picked. And so now you have to go a level deeper. You have to apply analytics to to squeeze additional margins, efficiencies, and just generally in order to get ahead of your competition.
You have to use analytics these days. And it's super exciting to me because that's my field. And
[00:32:27] Unknown:
do you have any particular advice for data engineers in terms of resources or just general sort of research or, knowledge acquisition that they couldn't pursue to help them be more active and effective in the overall life cycle of an analytics project and to just help them stay up to date with, current industry trends and ways that they can be architecting and building these types of systems?
[00:32:53] Unknown:
You know, I'll answer it in a less technically focused way. I think, you know so because engineers are really good about picking up additional skills and training related to the to the engineering to the technical stuff. But I see this as a major gap, as a problem, is that engineers are sort of, like, heads down deep in solving technical problems without seeing the big picture of what is a project supposed to do, what business value is it supposed to derive for the organization. And so it's, I think, it's critical and it's it helps people's careers. I see it often.
Those engineers that take interest in actually kind of the big picture view of the project into how it's impacting the business, how it's impacting their customers. Those engineers get a lot far a lot further in their careers than the ones that are sort of deep down just hacking away. Nothing wrong with that, but you need to tie what you're doing to business value.
[00:34:00] Unknown:
And are there any other aspects of the work that you're doing at prime TSR or your experience of working with or rescuing these failed projects or zombie projects that we didn't discuss yet that you'd like to cover before we close out the show? I think,
[00:34:16] Unknown:
you know, from a as an another aspect, the the big, hard thing that a lot of companies are just starting to scratch the surface on is a nonstructured data. And, the majority of the data that they're still processing is coming from relational databases. But as AI and machine learning becomes more prevalent, these are very data hungry algorithms. And the data that that that's needed is often unstructured. And so we we are just I feel like we are just starting to scratch the surface, but we are getting in there. And we see a lot of value in being able to deal with unstructured data. And eventually, also things like voice and image processing, because it's going to become the hot topics and bring value to the organization to pushing the analytics use case further and further.
[00:35:14] Unknown:
And for anybody who wants to get in touch with you or follow along with the work that you're doing, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. Yeah. You know this, I I just mentioned the voice and image processing. I think the tooling there is still
[00:35:37] Unknown:
not where it needs to be to really process it. We are seeing some advances in l NLP, but it's definitely not where it needs to be in my opinion. But I'm confident that it will catch up over the next few years. You know, everyone has Amazon Alexa in their house nowadays, and I'm sure Amazon will try to leverage that voice data and build some products to to somehow harvest it. And so this will be the trigger and that then the adoption will will come into play in a couple years. Alright. Well, thank you very much for taking the time today to share your experience
[00:36:10] Unknown:
of working with these projects in various states of failure or success. It's definitely useful to get that broad perspective. So I appreciate that, and I hope you enjoy the rest of your day. Thanks, Tobias. Have a good day
[00:36:30] Unknown:
as well.
Introduction and Guest Introduction
Defining Analytics Projects
Measuring Success and Failure
Common Causes of Failure
Strategies for Success
Challenges and Knowledge Gaps
Learning from Failures
Common Issues in Zombie Projects
Industry Trends and Exciting Developments
Advice for Data Engineers