Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

05 September 2022

Introduce Climate Analytics Into Your Data Platform Without The Heavy Lifting Using Sust Global - E322

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Share on social media:


Summary

The global climate impacts everyone, and the rate of change introduces many questions that businesses need to consider. Getting answers to those questions is challenging, because the climate is a multidimensional and constantly evolving system. Sust Global was created to provide curated data sets for organizations to be able to analyze climate information in the context of their business needs. In this episode Gopal Erinjippurath discusses the data engineering challenges of building and serving those data sets, and how they are distilling complex climate information into consumable facts so you don’t have to be an expert to understand it.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
  • Data stacks are becoming more and more complex. This brings infinite possibilities for data pipelines to break and a host of other issues, severely deteriorating the quality of the data and causing teams to lose trust. Sifflet solves this problem by acting as an overseeing layer to the data stack – observing data and ensuring it’s reliable from ingestion all the way to consumption. Whether the data is in transit or at rest, Sifflet can detect data quality anomalies, assess business impact, identify the root cause, and alert data teams’ on their preferred channels. All thanks to 50+ quality checks, extensive column-level lineage, and 20+ connectors across the Data Stack. In addition, data discovery is made easy through Sifflet’s information-rich data catalog with a powerful search engine and real-time health statuses. Listeners of the podcast will get $2000 to use as platform credits when signing up to use Sifflet. Sifflet also offers a 2-week free trial. Find out more at dataengineeringpodcast.com/sifflet today!
  • The biggest challenge with modern data systems is understanding what data you have, where it is located, and who is using it. Select Star’s data discovery platform solves that out of the box, with an automated catalog that includes lineage from where the data originated, all the way to which dashboards rely on it and who is viewing them every day. Just connect it to your database/data warehouse/data lakehouse/whatever you’re using and let them do the rest. Go to dataengineeringpodcast.com/selectstar today to double the length of your free trial and get a swag package when you convert to a paid plan.
  • Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
  • Your host is Tobias Macey and today I’m interviewing Gopal Erinjippurath about his work at Sust Global building data sets from geospatial and satellite information to power climate analytics

Interview

  • Introduction
  • How did you get involved in the area of data management?
  • Can you describe what Sust Global is and the story behind it?
    • What audience(s) are you focused on?
  • Climate change is obviously a huge topic in the zeitgeist and has been growing in importance. What are the data sources that you are working with to derive climate information?
    • What role do you view Sust Global having in addressing climage change?
  • How are organizations using your climate information assets to inform their analytics and business operations?
    • What are the types of questions that they are asking about the role of climate (present and future) for their business activities?
    • How can they use the climate information that you provide to understand their impact on the planet?
  • What are some of the educational efforts that you need to undertake to ensure that your end-users understand the context and appropriate semantics of the data that you are providing? (e.g. concepts around climate science, statistically meaningful interpretations of aggregations, etc.)
  • Can you describe how you have architected the Sust Global platform?
    • What are some examples of the types of data workflows and transformations that are necessary to maintain your customer-facing services?
  • How have you approached the question of modeling for the data that you provide to end-users to make it straightforward to integrate and analyze the information?
    • What is your process for determining relevant granularities of data and normalizing scales? (e.g. time and distance)
  • What is involved in integrating with the Sust Global platform and how does it fit into the workflow of data engineers/analysts/data scientists at your customer organizations?
  • Any analytical task is an exercise in story-telling. What are some of the techniques that you and your customers have found useful to make climate data relatable and understandable?
    • What are some of the challenges involved in mapping between micro and macro level insights and translating them effectively for the consumer?
  • How does the increasing sensor capabilities and scale of coverage manifest in your data?
    • How do you account for increasing coverage when analyzing across longer historical time scales?
  • How do you balance the need to build a sustainable business with the importance of access to the information that you are working with?
  • What are the most interesting, innovative, or unexpected ways that you have seen Sust Global used?
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on Sust Global?
  • When is Sust the wrong choice?
  • What do you have planned for the future of Sust Global?

Contact Info

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don’t forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast


Share on social media:


Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey