Summary
Data engineering is a large and growing subject, with new technologies, specializations, and "best practices" emerging at an accelerating pace. This podcast does its best to explore this fractal ecosystem, and has been at it for the past 5+ years. In this episode Joe Reis, founder of Ternary Data and co-author of "Fundamentals of Data Engineering", turns the tables and interviews the host, Tobias Macey, about his journey into podcasting, how he runs the show behind the scenes, and the other things that occupy his time.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show!
- RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder.
- Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer.
- Your host is Tobias Macey and today we’re flipping the script. Joe Reis of Ternary Data will be interviewing me about my time as the host of this show and my perspectives on the data ecosystem
Interview
- Introduction
- How did you get involved in the area of data management?
- Now I’ll hand it off to Joe…
Joe’s Notes
- You do a lot of podcasts. Why? Podcast.init started in 2015, and your first episode of Data Engineering was published January 14, 2017. Walk us through the start of these podcasts.
- why not a data science podcast? why DE?
- You’ve published 306 of shows of the Data Engineering Podcast, plus 370 for the init podcast, then you’ve got a new ML podcast. How have you kept the motivation over the years?
- What’s the process for the show (finding guests, topics, etc….recording, publishing)? It’s a lot of work. Walk us through this process.
- You’ve done a ton of shows and have a lot of context with what’s going on in the field of both data engineering and Python. What have been some of the major evolutions of topics you’ve covered?
- What’s been the most counterintuitive show or interesting thing you’ve learned while producing the show?
- How do you keep current with the data engineering landscape?
- You’ve got a very unique perspective of data engineering, having interviewed countless top people in the field. What are the the big trends you see in data engineering over the next 3 years?
- What do you do besides podcasting? Is this your only gig, or do you do other work?
- whats next?
Contact Info
Closing Announcements
- Thank you for listening! Don’t forget to check out our other shows. The Data Engineering Podcast covers the latest on modern data management. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used.
- Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
- If you’ve learned something or tried out a project from the show then tell us about it! Email hosts@themachinelearningpodcast.com) with your story.
- To help other people find the show please leave a review on iTunes and tell your friends and co-workers
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Podcast.__init__
- The Machine Learning Podcast
- Ternary Data
- Fundamentals of Data Engineering book (affiliate link)
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. When you're ready to build your next pipeline or want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends at Linode. With their new managed database service, you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes with automated backups, 40 gigabit connections from your application hosts, and high throughput SSDs. Go to data engineering podcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don't forget to thank them for their continued support of this show.
Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend. Io, 95% reported being at or overcapacity, With 72% of data experts reporting demands on their team going up faster than they can hire, it's no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. That's where our friends at Ascend dot io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark and can be deployed in AWS, Azure, or GCP.
Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you're a data engineering podcast listener, Macy. And today, we're flipping the script. I've got Joe Rees of ternary data who's gonna be interviewing me about my time as the host of this show and my perspectives on the data ecosystem. So, Joe, can you start by introducing yourself? Hey. I'm Joe Rees, recovering data scientist,
[00:02:08] Unknown:
co founder, CEO of Ternary Data, and author of Fundamentals of Data Engineering. And do you remember how you first got started working in data? I've only worked in data for over 20 years now. Yeah. So
[00:02:20] Unknown:
in some capacity or another, data's always been something I've done. So this time around, you're actually gonna be interviewing me, so I will hand the reins off to you and take it away.
[00:02:30] Unknown:
Cool. Well, your host is Joe Rees, and I'm here to interview Tobias Spacey today. So welcome, Tobias, to your own show. So like we kinda joked earlier, you know, long time listener, first time caller. So thanks for having me on to, I guess, flip the script. You know, to kinda get things kicked off here, I mean, you do a lot of podcasts. I think podcast.net started back in 2015. Data engineering podcast was, you know, January 2017, and now you got a new machine learning podcast. I think you're 2 episodes into that. Walk us through the kind of the origins of each of these podcasts. Like, why did you get started in podcasting?
[00:03:02] Unknown:
It started really because I was waiting for somebody else to do it, and nobody did. So I said, okay. Fine. But going back a little further, I had been listening to podcasts for a number of years. I actually got started working in the tech fields around the same time as I was going to school for computer engineering. So while I was getting my degree, I was starting to listen to podcasts about different areas in tech. Started working in the field as a sysadmin back in, I wanna say, 2009, 2010 time frame. And so just podcasts were a great way for me to learn more about the space, gain new skills, understand context about the things that I was working on. And don't remember exactly when I really started getting involved with the Python language, but when I did start using it kind of in anger for my day to day work, it really kind of stuck with me. So my actual first experience was deploying and managing Ruby on Rails applications in my sysadmin job.
And so I had actually gotten exposed to a lot of Ruby work. I had done some work in Ruby. And then there was 1 occasion where I was just doing a simple admin interface on top of a database, and there was a project called, blanking on the name, but it was some sort of Ruby admin library. And I did some work with it. It didn't really do what I wanted it to do. And so then I just said, oh, well, I hear this Django thing has a good admin interface. I'll give it a shot. Set up a simple Django app, started using the admin there. And a lot of what I had learned about Rails translated really easily to Django because they're both MVC apps, and so that's when I really started getting into Python.
The quote that a lot of people say is that it fit my brain, and so I really got involved in that. That became my default tool chain. And so I listened through all of the backlog of all of the different Python podcasts that I could find, and none of them were producing any new episodes. And I think at the time that I finished listening through all the back catalogs, the newest episode on any of them was maybe a year or 2 old. So I was waiting for somebody to start a show focused on Python. In the meantime, I had been listening a lot to another Ruby language podcast just because it covered a lot of general software engineering topics, and I was hoping that there would be something to fill that gap for Python. And eventually, I gave up and said, okay. Well, nobody else is doing it. I guess it has to be me.
And so I started doing the groundwork of getting set up to run the podcast. Actually started it up with a cohost because I wanted to do more of a panel style show. So my cohost was with the show for maybe the first 2 years or so of the podcast. But the funny thing is that 2 days before I published my first episode of podcast thought in it, Mike Kennedy published his first episode of Talk Python to me. So the show that I had been waiting for for so long that I eventually gave up and said I have to start my own show actually came at the same time that mine did. So we've actually been running our podcasts in parallel for the same amount of time. Got started in podcasting just because nobody else would do it for me. That's interesting you mentioned that too. I've been listening to podcasts in the 2000 as well, and and there weren't a lot of podcasts in general,
[00:06:14] Unknown:
I think, let alone very tech, you know, related podcast. And so yeah. I mean, good on you for, I guess, taking the reins of it because that's about what you had to do if you wanted to get a podcast. Apparently, Michael Kennedy also felt the same way. But, I mean, it's both great podcasts. So what is it? Talk Python to be end? Yep. Yeah. I mean, I listened to both of them. Fantastic. So so thanks. And then data engineering, you know, that was a couple years later. What happened between 2015 and 2017?
[00:06:38] Unknown:
Yeah. So I had been running the podcast for a couple of years. It was seeing some success. I managed to pick up my first sponsor and then a few other sponsors. So it was worth the time and effort that I was putting into it because of that fact. And my wife actually has been helping me all the way along, and she had been doing a lot of editing on the podcast dot init. And she actually suggested because I was doing well with podcast thought in it and I seem to be enjoying it, why don't you try another show? And so I thought about it for a while and said, okay. Well, what else is there that isn't really being addressed?
And in the 2017 time frame, I had been working for a while for a company where I was actually doing a decent amount of data engineering work. My, actually, job title, I think, was maybe DevOps or back end engineer or something like that. But I was effectively building an event stream application for loading data into BigQuery, which that's a whole rant aside right there about why that was the wrong tool for the job, but it came down as mandate. They said, the project was already started when I joined, and they said, okay. Build this thing. So I made it work, but so I was doing a lot of work in data.
Data science was gaining a lot of attention and mindshare. It was, you know, the hottest job of the 21st century, and so there were maybe a dozen or so podcasts at least that I knew about that focused on data science, but nothing that I knew about or could find that touched on data engineering. And so similar to the situation with Python where there weren't any shows addressing it, I said, okay. Well, I guess I'll start a show about data engineering so that somebody's talking about it and so that I can learn more about it, which, really, that's 1 of the big motivators for me across all of my podcasts is that I treat myself as the proxy for the audience where it's something that I say, if I wanna learn about something, then I go out and I find somebody who is an expert in the field or who built the project that I wanna use, and I try to get them on the show so I can learn more about it. And so that's a big part of how I've shaped the overall kind of tenor of all of my shows is using myself as a proxy for the audience to say, is this something that I find interesting that as an engineer, is this something that I find useful as an engineer? Like, is this a show that I would wanna listen to if I weren't the host? And so when I started the data engineering podcast, like I said, there weren't really any shows focused on data engineering. There were tons about data science, but none that talked about the actual, you know, the grunt work and the technologies and all of the complexity that goes into getting the data ready for the data scientists because that was around the time that there was really starting to be that tipping point of companies hiring data scientists and realizing that just throwing a data scientist at the problem isn't gonna fix anything for them. And, actually, the data scientists were getting burnt out because they had to do all the data engineering work. And I think it was maybe around the same time that Max Boeschmann was publishing his episodes about the rise and fall of the data engineer, and he was actually 1 of the first guests that I had on the podcast to talk about that. And so it was just a good time where data engineering was nascent. It was emerging as an actual job description.
It was something that people were starting to focus on and, you know, kind of time has kind of proven me right in that it is an actual worthwhile subject material where the past 5 years has seen a massive explosion in the amount of attention and investment that has gone into data engineering as a job description and as a industry category.
[00:10:00] Unknown:
Those are really good insights. I kinda wanna dive into this a bit because it seems like you have really good instincts on timing as well as to what's potentially gonna be hot. Right? So Python back in 2015, for example, for the audience, you gotta remember, Python wasn't always the number 1 widely used language. In fact, it was you know, in the early days, it was considered kind of a toy. Right? So in the 2000, 20 tens, you know, if you're in data, the the language was definitely gonna be not Python. But, you know, I think it was around 2015 that it really started kinda hitting an apex. I started a Python meetup here in Salt Lake where I live. And it felt like around 2014, 2015 is when things really started to take off with Python.
But then fast forward a couple years to data engineering. At that time, you know, it was it's coming to everyone's radar at the time, but I think it seemed like a very contrarian bet at that time. The rational thing to do or the popular thing to do would have been to start a data science podcast, certainly not a data engineering podcast. Do you feel like you have good instincts, are able to keep a pulse on sort of when trends are about to surface
[00:11:07] Unknown:
in technology and data? I don't think it's really even just a matter of trying to focus on the trends. It's really just going back to using myself as that proxy about what is it that I find interesting? What do I wanna learn about? I'm not really aiming for something that says, okay. What is the best way that I can get to, you know, the front page of Hacker News or at the top of the Apple Apple podcast list or whatever it is. It's what are the ways that I can further my own understanding, learn in ways that are useful to me in my career, and how can I benefit other people in the process? So I'm not necessarily looking for what's the splashiest, what's the fastest way to gain popularity.
I'm looking for what is the thing that needs to be understood.
[00:11:47] Unknown:
Yeah. I guess you would have started a web 3 or 4 podcast at this point. So
[00:11:51] Unknown:
Yeah. Bitcoin podcast. Yeah. And going back to the Python podcast too, and I started it because of anything having to do with the rise in popularity of it as a data science language. It was actually, at that time, my area of focus was still more in the web application space. I was building Django and Flask applications. That was where I was kind of working at the time. And so my focus was just on how do I learn more about this language and the community that I'm investing in, and how do I understand more about some of the, you know, trends that are happening there? How do I learn about the tools that I am using for my own day job? And then it was just kind of after that, by happy coincidence, that Python continued to be, you know, on the rise and kind of catapulted to the forefront of data science and the data engineering ecosystem. They're kind of intertwined in a lot of ways too. Absolutely. Whereas Python for web dev, I mean, it's still going strong, but it's, you know, it's definitely the attraction's all data, at least where I sit. Yeah. It's funny because it used to be, you know, when I first started the show, all of the newsletters were about, you know, oh, here's the latest Django plug in. Here's the latest, you know, celery or whatever.
Now every time you look at a Python newsletter, it's here's another, you know, plugin for PyTorch or a TensorFlow library or a new machine learning framework or something along those lines.
[00:13:04] Unknown:
That's awesome. So so you published, like, 306 episodes of the data engineering podcast, 370 episodes of podcast.net, then you got a new machine learning podcast. How do you find the time to do all this? And then walk us through your process of creating shows.
[00:13:20] Unknown:
So as far as finding the time, a little bit of it is just a maybe overactive work ethic. But I'm fortunate to have a job that supports me in the work that I do with the podcast. So I I don't have to do everything at late hours of the night. I'm able to schedule both my primary work and the podcasts to to be able to work together, so I don't have to kind of fight that scheduling aspect. And, also, I've just gotten a lot of systems in place that make it reasonable. So, actually, recently, as I was getting ready to launch the machine learning podcast and trying to make that final decision of, is this something that I want to do? Is this something that I have time to do? I started tracking my time about how much time am I actually spending each week on getting the podcasts out. And with 1 episode a week of podcast.onit and 2 episodes a week of data engineering, I was spending about 10 hours a week on podcasts.
And so that's a manageable amount of time. And that was also actually while I was doing some of the prep work to get the machine learning podcast up and running. So 10 hours a week across all 3 podcasts. So it's a manageable amount of time, and it's a lot less than I used to spend when I was doing consulting. So when I was early in my career, I was working as a software engineer and then a DevOps engineer, and I was also doing consulting work on the side because I needed to make some extra income. It was a way that I was able to also improve my skill set, learn more about different areas that I might not work on in my day job. And so there was maybe an entire year actually where I was spending 30 plus hours a week doing consulting on top of working full time. So and also going back to when I was getting my degree and starting to work in tech, that was another situation where I was working full time at a job and going to school full time. So I've always as long as I can think of, really, I've been doing kind of at least 2 things at once. So it just seems kind of natural that I have something that I'm doing in addition to my day job that's not just sitting back and relaxing. I don't really know that I would know what to do with my time if that was all I was doing. You can make a podcast about relaxing, I guess.
[00:15:29] Unknown:
So stretch your own edge. I do podcasts as well. And I know that when I started out, I made a ton of mistakes, whether it's, like, just crappy audio for whatever reason. You know, in my case, that shows are sometimes live. So cameras, you know, getting all screwed up, everything in between. Walk me through some of the, I guess, maybe the early regrets that you ran into doing podcasts. And especially back in the day when this definitely wasn't, like, a popular medium like it was now or like it is now?
[00:15:55] Unknown:
Yeah. So definitely lots of mistakes. So, fortunately, I made most of them in the early days of Podcast. Init. So by the time I got around to doing the data engineering podcast and now the machine learning podcast, I had worked through a lot of the kinks, built up the processes. But, yeah, I mean, early on, I had gotten, you know, 1 of the cheapest mics I could get that was reasonable, but I didn't actually set it up properly. So I had it plugged in. I thought I was recording with the mic that wasn't my laptop mic, but for the first few episodes, it was still just my laptop mic, and so the audio was tinny. I actually went back maybe a year or 2 ago to try and listen to some of the audio of the very first interview I did on podcast. Onnit, and just the delivery was so forced and awkward.
It was very painful to try and listen to it. And I apologize to Thomas Hatch, who was the first guest on that show. He was very gracious to take his time to help me with that. But yeah. Was it like a Between 2 Ferns episode, or, like, what was, a Who? No. I was too scripted in the layout. So, you know, I had all the questions laid out, and I was just very mechanical. I hadn't really found my voice, found my cadence, you know, figured out how to actually be an interviewer. I was an engineer. I was just trying to talk to another engineer, but I didn't really know how to actually make it sound or seem natural or figure out the flow. So that's definitely 1 of the things that beyond just helping with success of the podcast has just been beneficial for my own purposes in my own life is figuring out how do I actually manage a conversation.
[00:17:20] Unknown:
It's really subtle. Right? I think I ran into the same things too when I started podcasting, where you have this idea of what a podcast should sound like, but then when you sit down to record with another person, it's a much different story, and you just gotta do it. Absolutely. So you mentioned audio, kind of forcing a script.
[00:17:36] Unknown:
What are some other areas that you found that you could improve your podcasting game? 1 of the things too that definitely I built up is figuring out, like, what are some of the rough edges that you can smooth over in the guest experience? Like, how can you make it easy for the guests to say yes? How do you make it easy for the guests to participate and not have to invest a lot of their own time? Just kind of show up, talk for an hour, and be done with it. So just that overall process of helping to set the expectations. So at the beginning, it was just a lot of, you know, typing the same email or variations of the same email over and over again. And so then saying, oh, well, actually, I can just copy and paste this template and then just fill in these couple of fields, and I'm good to go. So how do I kind of scale my own time without having to really think through a lot of the processes? How do I make it automatic? How do I help the guests?
So things like scheduling tools, I've gone through, at this point, probably 5 or 6 different scheduling tools. Couple of them, I switched over because the 1 I was using actually got acquired or shut down. A couple of them, I switched over because it just was clunky and not really easy to use. So the tool I use now is actually something called SavvyCal, and it's been great. So it actually lets me say, you know, here's a link. I just sent it to somebody, and they can pick a time that fits their schedule. I can set a limit on the number of times that somebody can set a particular event type in my calendar within a given week or on a given day. So I can say, for data engineering, I'm only gonna allow up to 3 interviews per week, only 1 per day. So I don't have to worry about my schedule getting completely overloaded with the podcasts. And so then I can just say, okay. I need to get something scheduled. Here's a link. You pick your time. I don't have to worry about it. It just shows up on my calendar, and I show up and get it done.
Another thing that I've really kind of figured out is, you know, what are the interesting questions to ask? How do I figure out the flow of the podcast? So that's something that's definitely just kind of come about the typical kind of programmer exercise of do something enough times until you figure out what is the common abstraction. Abstraction. So there's just a pattern that has kind of fallen out of the podcast that works well where, you know, I start off, you know, who are you? What is this thing we're talking about? Why does it matter? Why does anybody care? How does it work? How do I use it? And, you know, what are some of the things that you learned in the process is kind of the general story arc that I've fallen into, and it fits pretty much everything that I've come on the show that, you know, that I've talked about on the show. And it's easy to adapt to specific topics or specific tools or specific guests and just, yeah, just building up the systems, both mentally and in terms of the actual, you know, processes and tools that I've got. You know, definitely, constantly room for improvement. You know, I'm actually in the process of starting to tinker with and design a specific web application that will help with automating more of those pieces and manage some of the hosting aspects. But,
[00:20:28] Unknown:
again, you know, too many spinning plates, and maybe I'll get to it. Maybe I won't. If you ever need a beta tester, let me know. So, yeah, it is 1 of these things where I think the perception of podcasting is that, you know, you just get a couple people together, you hit record. But as you say, there's a lot that goes on behind the scenes, scheduling, logistics, you know, booking, rebooking, all this stuff. Right? The thing I like about the data engineering podcast in particular is that when I listen to episodes, there's definitely a formula to it, and it's predictable. But the thing is you do a really good job of balancing kind of the guardrails you put in place with the technical acumen. Like, your questions, you know, we'll get into that a bit, but I you know, the questions are very, you know, technically astute. So it's not like you're just saying, hey, tell me about x, y, and z. Like, why is that important? But it, you know, it seems like you really have a knack for diving into very technically detailed types of questions that wouldn't seem obvious, I suppose, for listening to that stuff for the first time. Because what I mean by that is, like, you know, I was listening to, actually today's podcast. Maintain your data engineer sanity by embracing automation was the 1 I listened to here. But the types of questions you're asking, I mean, it had both historical and technical context that I think was pretty impressive given the amount of interviews that you do. And so I guess, you know, kinda given that notion, how do you stay on top of the developments in data engineering?
[00:21:50] Unknown:
There are kind of a few different aspects to it. So 1 of the things that helps is, you know, I've been working in the engineering space for, what, 11 years now? 12 years? Something like that. And because of the combination of working in the field, getting my degree in the fields, and doing a lot of consulting work as well gives me a very wide breadth of understanding of getting deep into the actual nitty gritty and bugs and challenges of doing the engineering work, and then just also reading about and learning about what are the, I guess, fundamental lessons that hold true across different contexts and technologies and use cases and not spending as much of my focus on what is the latest shiny tool. You know, what are the actual things that are true no matter which tool you're using is really the most important thing for any engineer to actually gain understanding of as they progress in their career from junior dev to mid level. I think that that's definitely 1 of the hallmarks of being able to call yourself a senior engineer is understanding those fundamentals. And so both in my actual day job and my engineering work as somebody who's working in the field and also in my work of trying to understand the context and the technologies of what I'm interviewing people about.
I look to, you know, what are the foundational aspects of these technologies? So that way, I don't have to try and relearn or learn at a high level everything that I'm talking about because, you know, whether I'm talking about the latest data quality tool or distributed systems technologies or a database engine or a front end framework, there are elements of those technologies that are going to be the same no matter what, and there are elements of working with those systems that are gonna be the same no matter what. And so because I've gained that context and gained that understanding both through a lot of conversations, a lot of self study, a lot of work, you know, hands on keyboard work, I've just gained a useful intuition into, you know, how do things work. And so going back to the data engineering podcast, and when I started it, I used myself as a proxy for the audience of this is something I wanna learn about. So I had been doing work with data, but even today, I still don't really call myself a data engineer. I mean, I know a lot about the space, but there's always also that aspect of impostor syndrome of, you know, yes, I know a lot about data and data engineering, but I'm not the person who first built Kafka or whatever it is. But when I first started the data engineering podcast, I was mostly coming from the background of a sysadmin and software engineer and starting to figure out what is data engineering, how does it work, how does this play into the broader technological ecosystem.
And so I just learned a lot by asking people questions and trying to, you know, prepare for shows. So another thing that goes into kind of the behind the scenes aspect of running the podcast is whenever I run an interview, I actually prepare a list of questions beforehand. So I don't just come into it blind and say, okay. Here's a topic. Let's see what happens. I actually say, okay. Here's the list of questions. These are the things I wanna talk about and understand. I send that to the guest so that they have an opportunity to say, okay. Well, that question actually doesn't make sense to talk about, but here's something over here that might be more interesting. These days, there's not as much of that because I've gained that intuition of, you know, what are the interesting things to discuss.
Even to this day, I'm still a little surprised when guests get come on and say, oh, I'm, you know, really impressed with the questions that you asked, you know, the level of detail of these questions because, again, you know, imposter syndrome. I always feel like I'm still a neophyte, but just through a lot of work in the trenches, study, and repetition, I've gained that kind of understanding of how things work, you know, at the foundational levels to be able to go from whatever we're talking about at the time and then say, okay. Well, let's play that back to either some of my own experience or a past episode or a specific
[00:25:48] Unknown:
area of curiosity that I have to say, okay. Well, what about this aspect of whatever it is that you're building? That's really interesting. And, you know, kind of going back to 2017, it wasn't like there was a lot of information on data engineering at the time. Right? It was still, I think, a new term and a discipline that was still forming. You talk a lot about building what I'll call mental models. Right? Mental models around data engineering. I think it'd be beneficial for the audience to really understand what are the key components of the data engineering mental model that you've constructed.
[00:26:20] Unknown:
Oh, man. That is a deep and broad topic. So for the next 5 hours Yeah. Be discussing. Right. Let let me give my PhD dissertation. No. I think that some of the critical things to understand are I think, really, the things that everything kind of boils down to are something needs to be stored somewhere. There needs to be some information about what you're storing. You need to understand why you're storing it and how, and you need to understand what has been done to it or what do I need to do to it, and then who is going to use it at the end of the day. Just kind of thinking on my feet here. I think that those are kind of the core principles.
And then from there, you can dig into, you know, more detailed aspects of, okay. Well, today, I'm talking about Spark. Well, okay. Well, when do I wanna use Spark structured streaming? When do I want to use Spark in batch mode? When do I wanna use Spark SQL? Why do I wanna use Spark SQL instead of Snowflake? Well, what about Trino? How do things like the iceberg table format play between Spark systems and Trino systems? So there's the fundamental aspect of what are the building blocks and then the high level aspect of what are all the pieces that are in the ecosystem and, you know, what are the roles that they play? And that piece has really just come about from talking to people about it for the past 5 years, and for the past year plus, it's been at least twice a week. So just a lot of conversations.
[00:27:48] Unknown:
I find it interesting to kind of standing on the shoulder of giants, to use a cliche, but there is a lot to be said for, you know, talking to experts often. I know my personal experience podcasting has meant, like, it's been an accelerated learning curve. There's no way you could get these insights, You know, I would say, you know, reading articles or blogs, you you might. But, you know, just talking to the people themselves who have built a lot of these systems, it's hard to replace that knowledge. And and it's also hard to, you know, without doing that, to ask questions of these people as they come up. So I can definitely empathize with that, kind of learning through osmosis in a weird way. Yeah.
[00:28:21] Unknown:
And in some ways too, I'm kind of starting it all over again with the machine learning podcast because I don't have a strong mathematical background. I mean, I understand mathematics. I understand the sort of mental models that go along with it. I understand conceptually what machine learning is doing. But if you asked me to say, okay. Well, you know, write out the equation for doing a logistic regression on a, you know, set of data. I wouldn't even know where to start. So the Machine Learning podcast is another opportunity for me to kind of jump in head first, dive into the deep end of an area that I know precious little about but find interesting and valuable. And using myself as the proxy for the audience, figure out how do I explore and find my way through this space. And so that's, I think, a big part of why I've been successful, especially in the data engineering podcast and hopefully in the machine learning podcast going forward is that I'm not starting from that space of, I am an absolute expert. I built all these systems from the ground up. I have a PhD in distributed systems theory. It's just, I'm an engineer.
[00:29:23] Unknown:
I wanna figure this out, and I'm take taking everybody else along for the ride. It's gonna be really interesting. I think you're approaching it from, I think, a really mindful place too. On topic of the machine learning podcast. Like, what are gonna be some of the major differences between that podcast and other machine learning podcasts out there? I think the main thing that I'm trying to bring to the table is
[00:29:43] Unknown:
pragmatism and practicality, which I think is a big hallmark of what I've been able bring to the data engineering podcast is just, you know, again, I don't have a PhD. I'm not an expert in everything that I'm talking about. I'm just somebody with a lot of curiosity and enough understanding to be able to ask the questions. And so in the machine learning space, you know, a lot of the podcasts that are out there are very focused on theoreticians or data scientists or people who are already experts in a lot of the mathematical concepts that go into it. I'm just coming at it as an engineer saying, okay. Well, how do I make sense of this thing? What is machine learning useful for? When is it the wrong choice? How How do I make use of it? How do I support teams who are building machine learning models? How do I take machine learning from, I have this great idea to I've delivered this thing, and I'm running it in production, and now I need to make sure that I can keep it running in production. And then figuring out how do I even understand what is a machine learning problem. You know? Because a lot of times people will say, oh, machine learning. Well, okay. I'll just throw machine learning at it, and everything will be great. Or I'll throw deep learning at it, and I've got this pile of data and magic.
And just trying to bring a lot of kind of grounded approach to it to say, machine learning is a tool just like every other tool. It has its benefits. It has its drawbacks. Here's how we can take a journey together to understand what are the benefits and limitations of this and what are the things that are still being figured out. You know? So with the data engineering podcast, at the outset, what I said is this is a podcast focused on data engineering and data engineers, and I am not trying to address the data scientist audience. So I've had a lot of people come to me saying, oh, hey. I wanna talk to you about this machine learning topic, or I wanna talk to you about this data science topic. And I say, this isn't the right avenue for that. So my litmus test for, is this a topic for the data engineering podcast is, you know, is this something that a data engineer would do, or is it something a data scientist would do? So I wanna talk about everything up to the point where I say, okay. This data is ready for the data scientist to use.
And if it's beyond that point, then it goes on to my other show. So till now, a lot of that has gone to the podcast dotnet. Going forward, a lot of that will go to the machine learning podcast, which is even part of the reason I started it is that I was getting a lot of inbound traffic to say, hey. I wanna talk about this. You know, and it's more ML focused or MLOps focused. And I say, okay. Well, for a little while, I was trying to push the Python podcast in that direction to say, okay. Well, I'll just make it the Python machine learning podcast thought in it, but it was a little awkward. There was too much history where that wasn't the primary focus, and a lot of the audience was starting to say, well, what are you doing here? So I split it out into its new show. That way, it's easier for people to say that, yes, this is something that I'm interested in. I wanna learn more about the machine learning aspect. Really trying to focus on machine learning as a practical and useful tool in the toolbox and not spend all of my time focused on, you know, what is the latest theoretical research?
You know, probably dig into some aspects of theory or some of the research, but more from the perspective of, okay, well, how is this going to help me next year or 5 years from now and not just, you know, this is useful just for the sake of peer research.
[00:33:01] Unknown:
It's interesting timing too. King, you talked about why you didn't start a data science podcast back in 2017. Right? And and data engineering was really it came up as a response to, you know, data scientists not having the proper foundations to do their job. Fast forward 5 years, 2022, I I could argue that's no longer really the case. Right? And I would say in large part because data engineers help facilitate success of data scientists and practices maturing along the way and so forth. Now it's the type of podcast that you would have done in 2017 on data science is not the same type of podcast you'd be doing now. Right? It's just people are actually doing machine learning and production, you know, I think much more successfully than they were back then. And so it seems like really good timing yet again for a podcast from Tobias Massey. With that said, given your experience between data science topics covered in podcast.net and then the data engineering podcast, like, where are you starting to see sort of the intersections of data engineering and machine learning?
[00:33:58] Unknown:
I think that the most obvious place is in this emerging category and topic of MLOps, or how do I take this machine learning model and operationalize it and make it a reliable piece of my business, reliable piece of my applications, where back in 2017, data science was, I think, where analytics engineering is today from the perspective of what the output was. You know, you would hire a data scientist because you wanna understand what is happening in this information that I have. And a lot of that work has been pushed into the space of analytics engineering through tools like DBT and the data engineering systems and platforms that we've built out to make that a possible kind of career choice and career direction.
And a lot of the, you know, mathematical modeling and data science has moved into the machine learning aspect. And also the biggest difference that has happened in that time frame is deep learning has had a huge impact on the industry where it used to be, you know, data science was, okay, I need to figure out the difference between a, you know, gradient descent or a random forest or k nearest neighbors and which 1 do I use when to, okay. I'm going to take a deep learning model, but, you know, am I going to use BERT for, language transformer, or am I going to use Yolo for a image recognition use case? So there's been a lot of investment in, you know, the big tech firms and in research and academia into making machine learning and deep learning a more tractable problem and 1 that isn't solely restricted to people who have those PhDs. So it is a problem now where I can take a tool like Luigi and feed it a bunch of data and get something useful out of it without having to have that PhD. You know, I can just be a data engineer. I've got access to the data. I've got a couple of hours on my hands. I can see, is this something useful? And I can say, hey. Hey. This looks like it might be worth investing more in and hand it off to the data science team, for instance.
I think it's just become a much more accessible and approachable area of effort, whereas before, it was only really viable for businesses who had the money and staffing to be able to support putting a lot of time and money into that effort.
[00:36:12] Unknown:
Oh, for sure. I guess from your perspective being the podcast producer, how are you going to manage context switching now between 3 somewhat related, but obviously different topics?
[00:36:23] Unknown:
It's interesting that you bring up that question because there have been a couple of times where I was recording interview for the machine learning podcast and started to find myself, you know, asking questions in a certain direction or thinking about asking a question in a certain direction that would be where I would push it if I was in the data engineering podcast of, like, oh, well, okay. Well, how does this infrastructure work? How does this work at the bits and bytes level? And then remembering, oh, wait. That's not the audience I'm trying to talk to right now. I'm trying to talk to the person who's the machine learning engineer. They don't care necessarily about all the infrastructure and the automation that's happening under the covers. They wanna know how does this help me do my job. And that's 1 of the things that I try to use as my kind of weather vane of, you know, am I delivering on what I aim to deliver with this episode or with this show is, you know, who am I trying to help?
What are the problems and the challenges that the person who is likely to listen to this podcast is trying to tackle right now. So for data engineers, it's I need to figure out what this data is, why it's useful, who's using it, how to make it reliable, how to make it maintainable, how to test it. So it's, you know, a lot of the very, you know, mechanical and automatable and not so much in the statistical level. You know, there's still some of that too. But and then for the machine learning audience, it's, you know, how do I take this framework or this tool to be able to build a machine learning model or be able to run a machine learning model or understand, is this a problem that's worth solving with machine learning, or what does production mean for this machine learning model? Because a lot of times you say, oh, running machine learning in production. It's, well, the the automatic thing that you think about is, okay, it's running in a web server somewhere, you know, serving API requests in real time, but that's not always the case. Sometimes production for machine learning just means I need to be able to understand the context of this data that I'm looking at. So I'm actually just going to manually trigger an execution of this model to see what the output is.
[00:38:10] Unknown:
And that's production for some people. It's not serving 10, 000 requests per second for some high traffic use case. It's interesting. Kinda bring it back to the data engineering podcast. I mean, so you've been doing this for 5 years now. The industry's changed a ton. Right? What have been some of the major evolutions of the topics that you've covered over the years?
[00:38:30] Unknown:
So many. The things that come to mind first are, when I first started, the Hadoop ecosystem was still relevant, I guess, is a good way to put it. Might be a bit harsh, but, you know, Hadoop and a lot of the surrounding technologies have really, you know, faded out of the day to day conversations. There are definitely still companies using it. It's definitely still providing a lot of value, but there's not as much activity there. So, you know, in 2017, when I started, there were still new products and new projects being introduced into you know, that are tied into the Hadoop ecosystem. And I think s 3 really ate Hadoop's lunch a lot.
Eve and then MapReduce as a paradigm, as a, you know, engineering approach has faded with things like Spark and with things like Trino and cloud data warehouses and just the availability of scalable massively parallel compute that was harder to come by when Hadoop was really on the rise. So I came in with the data engineering podcast right as Hadoop was starting its descent, I'd say. So now there's not really much conversation happening around that ecosystem. We do still have a lot of legacies of the work that went into that. I think 1 of the things that comes to mind most prominently there is the Hive metadata catalog is still the de facto way to understand what is the set of tables that I have in s 3 or in object storage or what have you. But even that is starting to get superseded by some of the other products that are out there. So AWS Glue is, you know, for people running in Amazon, a very reasonable service to use in place of running your own Metastore.
Dremio is trying to push it a bit farther with their Arctic product to say, we don't actually need the Hive metastore at all. We're going to have this metastore that works with iceberg tables and has, you know, atomic commits across tables because of the work that they're doing with project Nessie. So I think we're at another transition point right now. But over the past 5 years, data quality, data monitoring, data observability, you know, data reliability engineering, whatever name you wanna give it, has gained a lot of ground. Reliability engineering, whatever name you wanna give it, has gained a lot of ground.
Now that people are able to build systems, you know, they don't have to be an expert in distributed systems to be able to get a distributed system up and running. They can use a cloud service provider, or there are frameworks available or Helm charts for Kubernetes environments where you can say, okay. I can get this up and running, and I don't have to work on it for the next 6 months. It's freed up a lot of cycles to be able to invest in some of these higher order considerations like data quality, data cataloging, metadata management. Those are huge topics right now. You know, to begin with, it was really just metadata management for a data catalog to understand what data do I even have, who's using it for what purposes.
So Amundsen was 1 of the front runners there. Now we've got, you know, next generation catalogs like DataHub and OpenMetadata and Atlan that are trying to push that even further to say, okay. I've got metadata about everything in my system. It's not just what datasets do I have. It's what datasets do I have? How did they get there? What was done to them? You know, what is the full lineage of them? How do I then take the signals about changes in this metadata to drive automation to then do more things to that dataset. So moving into, you know, what people are trying to call the active metadata. So being able to actually use metadata as a driving force for your data platform so that you, as a data engineer, can, you know, get network effects and, I hate to use the term, but synergies in your system to be able to not have to do as many of the, you know, very rote tasks. You know, you don't have to engage in as much toil. You can actually start to treat your overall data platform as a holistic system and work at a higher level and start to figure out, okay. Well, how do I actually put myself in a position where I'm working to drive the business value, not just I'm fighting to put out this fire and make sure that these files go from over here to over there on time.
[00:42:32] Unknown:
Makes a lot of sense. It matches what I've seen over the years too, which is higher levels of abstraction, and data engineering is becoming more enterprisey. These topics of data management, they weren't happening the way they are now, you know, a few years ago. It was more about it felt like the conversation has moved from, hey, I got big data stuff to like, now it's just I got data. So it's been interesting to watch.
[00:42:58] Unknown:
It's time to make sense of today's data tooling ecosystem. Go Go to data engineering podcast.com/rudder to get a guide that will help you build a practical data stack for every phase of your company's journey to data maturity. The guide includes architectures and tactical advice to help you progress through 4 stages, starter, growth, machine learning and real time. Go to dataengineeringpodcast.com/rudder today to drop the modern data stack and use a practical data engineering framework. I think 1 of the interesting aspects too is that you don't really hear the term big data as much. Before, it was a badge of honor. Like, everything had to be big data or you're not doing it right. Now it's just the data because the data is what's actually valuable. And it used to be too, you know, when Hadoop was on the rise, it was just store everything. Maybe someday it'll be useful. And now with more regulations and with the, you know, rising use of cloud and the, you know, paper use that can actually get quite expensive. It's okay. Well, I'm actually only going to store the data because I know I need it for something, not just because it might be useful. And so starting to be more deliberate in understanding what data you're collecting and how you're using it and for what purposes.
We're definitely now in an era of starting to add more, polish, and focus on user experience for the different data tools and the systems that we're using. Whereas up till maybe the past maybe 2 years ago, I'd even say, or maybe even last year, it was really just how do we get all the tools in place to be able to do things with data, period. And now it's, okay. Well, now how do I make that easier for data engineers, and how do I make that approachable for people who don't want to or don't need to understand all of the complexities that go into making this a reliable system.
[00:44:44] Unknown:
Zooming out, given your very unique perspective of data engineering, having interviewed countless hot people in the field, what are some of the big trends that you see over the next 3 years in data engineering?
[00:44:55] Unknown:
I think that 1 trend, I think, we're gonna start to see some consolidation in a lot of the explosion that happened, especially over the past 2 years. You know, the past 2 years saw massive amounts of investment from venture capital that has allowed a lot of people to be able to throw ideas out there, see what sticks. We're just now starting to see some of that consolidation of, you know, okay. I think we're gonna see things in the metadata category start to coalesce into, you know, whether it's data catalog tools or data lineage tools or data governance tools. Data governance is gaining a lot more attention. You know, it's always been a thing, but for a long time, I think it was relegated to the, quote, unquote, enterprise, and now everybody's realizing, oh, shoot. I really need to focus on this. This is important. I gotta get this right. So data governance, really focusing on user experience and making data accessible to nontechnical users or people who don't want to have to invest in understanding everything about the system, people who just wanna get their job done, and really starting to bring application engineers and software engineers into the overall conversation about data in an organization where there was the DevOps, quote, unquote, revolution over the past 10 to 15 years that brought software engineers and systems administrators more in alignment.
And now we're starting to go through that same process with, you know, software and systems engineers and data engineers and data scientists, and how do we actually smooth that transition from, I'm generating or collecting data in this line of business product, and now I actually need to be able to use that data and bring it full cycle back to that product. And so bringing those application and product engineers into the same space as the data engineers and having conversations about how do we actually smooth this transition. So, you know, for a long time, it was the data engineer said, okay. Well, there's a database somewhere. It has some tables. Now I need to go spend the next month figuring out what tables are there, how they're populated, why they're populated with this information, what does it mean, you know, who's using it, and then copy you know, rip that out of the application database into some other context where now I can do a bunch of expensive transformations and try to recreate the semantic understanding that was originally created in this application and just, you know, redo a lot of the work that was already done and then hand it off to a data scientist to, you know, add a bit more polish to that and gain more semantic understanding from the business perspective and starting to bring everybody together to say, okay. This application is generating data about customers who are using my service.
This is the type of data that I'm collecting and, you know, starting to propagate that semantic context to the data engineers to say, okay. I don't have to rip it out of the database. I can now start to build this contract, whether it's an API or, you know, a Kafka stream or what have you, where the application engineers are part of that conversation and they say, okay. I am going to push this information to you. Here is the information about why that information is being generated, and then the data engineers can take that and say, okay. I'm going to populate my metadata catalog, you know, my data catalog with that context, with this information so that the data scientists can then be effective so we don't have to spend these cycles, you know, doing expensive work of recreating that that knowledge that is being kind of discarded at each of these hand off points and just starting to bring that as a first class concern of the data as it's collected.
[00:48:29] Unknown:
So kind of closing out, what do you do besides podcasting?
[00:48:33] Unknown:
So as I mentioned a couple of times throughout this conversation, this isn't my primary gig. So currently, well, for the past 6 and a half years now, I've actually been working at MIT, so Massachusetts Institute of Technology in their open learning department. I run the platform and DevOps team for that group. So I do a lot of cloud automation. I write software. I debug software. Right now, I'm actually focused on building out our data platform. So using a lot of the lessons I've learned over the past 5 years of the data engineering podcast to try and get things right the first time without having to make the same mistakes that I've already learned from people who have already been there. So trying to figure out how do I take all of that knowledge that I've gained over the past 5 years, condense it into something that is useful for our purposes, and then, you know, build a road map that'll get us to a useful state.
Obviously, we're gonna make our own mistakes along the way, but trying to prevent some of that because of the knowledge that I've gained through this show. That's pretty cool. It's like you got the list of cheat codes.
[00:49:40] Unknown:
So
[00:49:41] Unknown:
Yep. The Konami codes for data.
[00:49:44] Unknown:
That's so awesome.
[00:49:45] Unknown:
Well, Tobias, it's been a pleasure talking to you. Yeah. Definitely been great having you flip the script and interview me for my podcast. So, usually, I close out to say, from your perspective, what do you see as the biggest gap in the tooling or technology for data management today? So maybe I'll ask you that question, and then I'll see if I can come up with something to answer that as well.
[00:50:05] Unknown:
I definitely feel like there's a weird pendulum point in data right now, in data engineering. I feel like there's not just a consolidation of companies that's about to happen and tools, but also practices. And I'll walk you through this. So something that's been on my mind a lot lately have been kind of, quote, old school techniques like data modeling, some governance pieces as well. What I'm most interested in right now is sort of what's the next phase of data modeling. If you look back in history, tell me the last, I guess, great evolution of data modeling. I think it might have been Data Vault back in 2000 or something like that. And I think it's, you know, that, Kimball, Inman, you know, and similar approaches are great, but they're very much suited for, you know, a batch oriented world. Data Vault could arguably work with streaming. But sort of what's next, really. Right? How do you tie domains, business logic together, not just across structured data, but across unstructured data, across streams, graphs. You know, so what's happened, I think, along the way is, like, data modeling has been sort of this mainstay.
People still do it, but technology and approaches and architectures and systems have rapidly changed over the last 20, 30 years. And so to me, the biggest gap, at least the 1 that's top of mind for me, I'm sure there's, like, a 1000000 others, but the biggest gap that I see right now is really, you know, what comes next for data modeling. Some people would argue that everything's fine. You don't need to do anything. I would counter that argument.
[00:51:24] Unknown:
So what do you think? The data modeling question is definitely an interesting 1, and 1 that I come back to a number of times with different guests is trying to understand what is the state of data modeling. Like you said, we had Kimball and Inman where we had, you know, the star or snowflake schema, and we had the data vault. And those have been the ways you do it, or you just put everything in a big wide table, which is another way that people address it now. You know? Or you just say, I'm just going to put together a table using my DBT, and I don't care how expensive it is to recompute people need to be sort of people need to be sort of conscious of, you know, as data becomes more democratized is being very deliberate in that modeling aspect. You know, a lot of people will say, oh, I can, you know, have my entire data up in an afternoon with a credit card. So I'm just going to get started and see what happens. And then, you know, people will kind of work themselves into a corner where they've got this giant mountain of tech that they have to try and, you know, dig themselves out from underneath. And so just really pushing for the being very deliberate and considered in the way that you approach building out data capabilities and not just falling into the pit of saying, oh, well, I can just do it in an agile manner. You know, agile is definitely very useful for data, but you do need to have that initial planning phase to understand where am I trying to get to before I just kind of see where I end up. For sure. And it seems like these discussions around data modeling that I'm seeing recently,
[00:53:04] Unknown:
you know, it seems to be very much a reaction to what you just described where maybe these systems, they certainly allow you to get in, you know, very easily, but it's like a bear trap. Right? It's kinda hard to wiggle your way out once you're in it. So it's always about trade offs. Right? So when you're talking about data architectures, things that are, I think, easy to use, the trade off of that is, yeah, it's easy to use, which means you might overlook some pretty important things to your point, data modeling being 1 of them. I'm sure there's a lot of other things. But especially, you know, where it ties into machine learning is especially where I'm interested.
And so, you know, up to now, data modeling has very much been an analytical exercise. But tying your data from point a to z within domains, across domains, whatever, it's an area that, I don't know, personally been nerding out on quite a bit. Absolutely. Yeah. And on machine learning aspect, you know, with data engineering and, you know, analytical approaches, there's always the problem of garbage in, garbage out.
[00:53:59] Unknown:
And because machine learning is such a force multiplier, it's garbage in, you know, an entire an entire trash heap out. And so you need to be very cautious about how you're managing the data, especially as you're feeding it into machine learning because it can either be a huge force multiplier that benefits you, or it can be something that just, you know, takes that mountain of technical debt that you've got and multiplies it by a 1, 000. Oh, for sure. And especially given that I think what you're also gonna be seeing is, like, tighter feedback loops between application,
[00:54:30] Unknown:
analytics, and machine learning. Like, as real time comes more into vogue, and I think that's the next big, I think, progression too, is just the accessibility of real time. Just like cloud data warehouses really democratized and brought, you know, once multimillion dollar contracts of, you know, very good hardware, but very expensive down to the masses. You're gonna see the same thing with streaming. What that means is things are gonna get faster, which means you can either do smart things faster or done things faster. So Absolutely.
[00:54:55] Unknown:
Hopefully, smart things, but Yeah. We'll see. And that's where a lot of the, you know, data quality in the data engineering space, but also in the machine learning space, which is where people are starting to invest into, you know, testing and validating the models. You know, that's where it becomes really important is because as you speed up these feedback loops and iteration cycles, you need to know as you're doing it whether you're going in the right direction or not. Because if you're just going and then you wait until you've gotten to where you think you're supposed to be before you check, you might find that, you know, you thought you were in Albuquerque, but you're actually in Portland, Maine. That's awesome. Well, thank you, Definitely, look forward to having further conversations with you. So thank you very much for taking the time today to join me and interview me for my podcast. Definitely look forward to having further conversations with you. So thank you again for taking the time, and I hope you have a good rest of your day. Yeah. Thanks, Tobias. Thank you very
[00:55:52] Unknown:
much.
[00:55:54] Unknown:
Thank you for listening. Don't forget to check out our other shows, the Data Engineering Podcast, which covers the latest on modern data management, and the Machine Learning Podcast, which helps you go from idea to production with machine learning. Visit the site at pythonpodcast.com to subscribe to the show, sign up for the mailing list and read the show notes. And if you learned something or tried out a project from the show, then tell us about it. Email hostspythonpodcast.com with your story. And to help other people find the show, please leave a review on Apple Podcasts and tell your friends and co
[00:56:29] Unknown:
workers.
Introduction and Current Data Team Challenges
Guest Introduction: Joe Rees
Origins of the Podcasts
Instincts and Trends in Technology
Managing Multiple Podcasts
Early Mistakes and Lessons Learned
Improving the Podcasting Process
Staying Updated in Data Engineering
Key Components of Data Engineering
Machine Learning Podcast: Goals and Differences
Managing Context Switching Between Podcasts
Evolution of Data Engineering Topics
Future Trends in Data Engineering
Current Role and Work at MIT
Biggest Gaps in Data Management Tooling