Data Engineering Podcast


This show goes behind the scenes for the tools, techniques, and difficulties associated with the discipline of data engineering. Databases, workflows, automation, and data manipulation are just some of the topics that you will find here.

Support the show!

Rewind 10 seconds
1X
Skip 30 seconds ahead
0:00/0:00

Listen in your favorite app:



More options

Here are shows you might like

See show recommendations
AI Engineering Podcast
Tobias Macey
The Python Podcast.__init__
Tobias Macey

452 Episodes

Building a Data Vision Board: A Guide to Strategic Planning - E451

Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing a three-year strategic vision for data management. He discusses the importance of having a strategic plan for data, highlighting the need for data teams to focus on impact rather than just enablement. He introduces the concept of a "data vision…

Summary In this episode of the Data Engineering Podcast Lior Barak shares his insights on developing…

23 December 2024 | 00:49:59


How Orchestration Impacts Data Platform Architecture - E450

Summary The core task of data engineering is managing the flows of data through an organization. In order to ensure those flows are executing on schedule and without error is the role of the data orchestrator. Which orchestration engine you choose impacts the ways that you architect the rest of your data platform. In this episode Hugo Lu shares his…

Summary The core task of data engineering is managing the flows of data through an organization. In…

16 December 2024 | 00:59:39


An Exploration Of The Impediments To Reusable Data Pipelines - E449

Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about reusability in data pipelines. The conversation explores the "write everything twice" problem, where similar pipelines are built without code reuse, and discusses the challenges of managing different SQL dialects and relational databases. Max also…

Summary In this episode of the Data Engineering Podcast the inimitable Max Beauchemin talks about…

08 December 2024 | 00:51:32


The Art of Database Selection and Evolution - E448

Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of databases in software engineering. Sam shares his journey into the world of data and discusses the complexities of database selection, highlighting the trade-offs between different database architectures and how these choices affect system design,…

Summary In this episode of the Data Engineering Podcast Sam Kleinman talks about the pivotal role of…

01 December 2024 | 00:59:56


Bridging Code and UI in Data Orchestration with Kestra - E447

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of code and UI-driven interfaces for data orchestration. Anna defines data orchestration as automating the coordination of workflow nodes that interact with data across various business functions, discussing how it goes beyond ETL and analytics to…

Summary In this episode of the Data Engineering Podcast, Anna Geller talks about the integration of…

26 November 2024 | 00:44:30


Streaming Data Into The Lakehouse With Iceberg And Trino At Going - E446

In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going, about the intricacies of streaming data into a Trino and Iceberg lakehouse. Ken shared his journey from product engineering to becoming deeply involved in data-centric roles, highlighting his experiences in ecommerce and InsurTech. At Going, Ken leads…

In this episode, I had the pleasure of speaking with Ken Pickering, VP of Engineering at Going,…

18 November 2024 | 00:39:49


An Opinionated Look At End-to-end Code Only Analytical Workflows With Bruin - E445

Summary The challenges of integrating all of the tools in the modern data stack has led to a new generation of tools that focus on a fully integrated workflow. At the same time, there have been many approaches to how much of the workflow is driven by code vs. not. Burak Karakan is of the opinion that a fully integrated workflow that is driven…

Summary The challenges of integrating all of the tools in the modern data stack has led to a new…

11 November 2024 | 00:56:11


Feldera: Bridging Batch and Streaming with Incremental Computation - E444

Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their incremental compute engine designed for continuous computation of data, machine learning, and AI workloads. The discussion covers the concept of incremental computation, the origins of Feldera, and its unique ability to handle both streaming and batch…

Summary In this episode of the Data Engineering Podcast, the creators of Feldera talk about their…

04 November 2024 | 00:47:36


Accelerate Migration Of Your Data Warehouse with Datafold's AI Powered Migration Agent - E443

Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the challenges and innovations in data migrations. Gleb shares his experiences building and scaling data platforms at companies like Autodesk and Lyft, and how these experiences inspired the creation of DataFold to address data quality issues across teams. He…

Summary Gleb Mezhanskiy, CEO and co-founder of DataFold, joins Tobias Macey to discuss the…

27 October 2024 | 00:48:50


Bring Vector Search And Storage To The Data Lake With Lance - E442

Summary The rapid growth of generative AI applications has prompted a surge of investment in vector databases. While there are numerous engines available now, Lance is designed to integrate with data lake and lakehouse architectures. In this episode Weston Pace explains the inner workings of the Lance format for table definitions and file storage,…

Summary The rapid growth of generative AI applications has prompted a surge of investment in vector…

20 October 2024 | 00:58:01