Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

25 February 2020

Shining A Light on Shadow IT In Data And Analytics - E121

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary

Misaligned priorities across business units can lead to tensions that drive members of the organization to build data and analytics projects without the guidance or support of engineering or IT staff. The availability of cloud platforms and managed services makes this a viable option, but can lead to downstream challenges. In this episode Sean Knapp and Charlie Crocker share their experiences of working in and with companies that have dealt with shadow IT projects and the importance of enabling and empowering the use and exploration of data and analytics. If you have ever been frustrated by seemingly draconian policies or struggled to align everyone on your supported platform, then this episode will help you gain some perspective and set you on a path to productive collaboration.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With 200Gbit private networking, scalable shared block storage, a 40Gbit public network, fast object storage, and a brand new managed Kubernetes platform, you’ve got everything you need to run a fast, reliable, and bullet-proof data platform. And for your machine learning workloads, they’ve got dedicated CPU and GPU instances. Go to dataengineeringpodcast.com/linode today to get a $20 credit and launch a new server in under a minute. And don’t forget to thank them for their continued support of this show!
  • Are you spending too much time maintaining your data pipeline? Snowplow empowers your business with a real-time event data pipeline running in your own cloud account without the hassle of maintenance. Snowplow takes care of everything from installing your pipeline in a couple of hours to upgrading and autoscaling so you can focus on your exciting data projects. Your team will get the most complete, accurate and ready-to-use behavioral web and mobile data, delivered into your data warehouse, data lake and real-time streams. Go to dataengineeringpodcast.com/snowplow today to find out why more than 600,000 websites run Snowplow. Set up a demo and mention you’re a listener for a special offer!
  • You listen to this show to learn and stay up to date with what’s happening in databases, streaming platforms, big data, and everything else you need to know about modern data management. For even more opportunities to meet, listen, and learn from your peers you don’t want to miss out on this year’s conference season. We have partnered with organizations such as O’Reilly Media, Corinium Global Intelligence, ODSC, and Data Council. Upcoming events include the Software Architecture Conference in NYC, Strata Data in San Jose, and PyCon US in Pittsburgh. Go to dataengineeringpodcast.com/conferences to learn more about these and other events, and take advantage of our partner discounts to save money when you register today.
  • Your host is Tobias Macey and today I’m interviewing Sean Knapp, Charlie Crocker about shadow IT in data and analytics

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you start by sharing your definition of shadow IT?
  • What are some of the reasons that members of an organization might start building their own solutions outside of what is supported by the engineering teams?
    • What are some of the roles in an organization that you have seen involved in these shadow IT projects?
  • What kinds of tools or platforms are well suited for being provisioned and managed without involvement from the platform team?
    • What are some of the pitfalls that these solutions present as a result of their initial ease of use?
  • What are the benefits to the organization of individuals or teams building and managing their own solutions?
  • What are some of the risks associated with these implementations of data collection, storage, management, or analysis that have no oversight from the teams typically tasked with managing those systems?
    • What are some of the ways that compliance or data quality issues can arise from these projects?
  • Once a project has been started outside of the approved channels it can quickly take on a life of its own. What are some of the ways you have identified the presence of "unauthorized" data projects?
    • Once you have identified the existence of such a project how can you revise their implementation to integrate them with the "approved" platform that the organization supports?
  • What are some strategies for removing the friction in the collection, access, or availability of data in an organization that can eliminate the need for shadow IT implementations?
  • What are some of the inherent complexities in data management which you would like to see resolved in order to reduce the tensions that lead to these bespoke solutions?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast


Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey