Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

06 August 2022

Useful Lessons And Repeatable Patterns Learned From Data Mesh Implementations At AgileLab - E314

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary

Data mesh is a frequent topic of conversation in the data community, with many debates about how and when to employ this architectural pattern. The team at AgileLab have first-hand experience helping large enterprise organizations evaluate and implement their own data mesh strategies. In this episode Paolo Platter shares the lessons they have learned in that process, the Data Mesh Boost platform that they have built to reduce some of the boilerplate required to make it successful, and some of the considerations to make when deciding if a data mesh is the right choice for you.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
  • Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos.
  • Prefect is the modern Dataflow Automation platform for the modern data stack, empowering data practitioners to build, run and monitor robust pipelines at scale. Guided by the principle that the orchestrator shouldn’t get in your way, Prefect is the only tool of its kind to offer the flexibility to write code as workflows. Prefect specializes in glueing together the disparate pieces of a pipeline, and integrating with modern distributed compute libraries to bring power where you need it, when you need it. Trusted by thousands of organizations and supported by over 20,000 community members, Prefect powers over 100MM business critical tasks a month. For more information on Prefect, visit dataengineeringpodcast.com/prefect.
  • The only thing worse than having bad data is not knowing that you have it. With Bigeye’s data observability platform, if there is an issue with your data or data pipelines you’ll know right away and can get it fixed before the business is impacted. Bigeye let’s data teams measure, improve, and communicate the quality of your data to company stakeholders. With complete API access, a user-friendly interface, and automated yet flexible alerting, you’ve got everything you need to establish and maintain trust in your data. Go to dataengineeringpodcast.com/bigeye today to sign up and start trusting your analyses.
  • Your host is Tobias Macey and today I’m interviewing Paolo Platter about Agile Lab’s lessons learned through helping large enterprises establish their own data mesh

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you share your experiences working with data mesh implementations?
  • What were the stated goals of project engagements that led to data mesh implementations?
  • What are some examples of projects where you explored data mesh as an option and decided that it was a poor fit?
  • What are some of the technical and process investments that are necessary to support a mesh strategy?
  • When implementing a data mesh what are some of the common concerns/requirements for building and supporting data products?
    • What are the general shape that a product will take in a mesh environment?
    • What are the features that are necessary for a product to be an effective component in the mesh?
  • What are some of the aspects of a data product that are unique to a given implementation?
  • You built a platform for implementing data meshes. Can you describe the technical elements of that system?
    • What were the primary goals that you were addressing when you decided to invest in building Data Mesh Boost?
  • How does Data Mesh Boost help in the implementation of a data mesh?
  • Code review is a common practice in construction and maintenance of software systems. How does that activity map to data systems/products?
  • What are some of the challenges that you have encountered around CI/CD for data products?
    • What are the persistent pain points involved in supporting pre-production validation of changes to data products?
  • Beyond the initial work of building and deploying a data product there is the ongoing lifecycle management. How do you approach refactoring old data products to match updated practices/templates?
  • What are some of the indicators that tell you when an organization is at a level of sophistication that can support a data mesh approach?
  • What are the most interesting, innovative, or unexpected ways that you have seen Data Mesh Boost used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Data Mesh Boost?
  • When is Data Mesh (Boost) the wrong choice?
  • What do you have planned for the future of Data Mesh Boost?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast


Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey