Summary
Data Engineering is a broad and constantly evolving topic, which makes it difficult to teach in a concise and effective manner. Despite that, Daniel Molnar and Peter Fabian started the Pipeline Academy to do exactly that. In this episode they reflect on the lessons that they learned while teaching the first cohort of their bootcamp how to be effective data engineers. By focusing on the fundamentals, and making everyone write code, they were able to build confidence and impart the importance of context for their students.
Announcements
- Hello and welcome to the Data Engineering Podcast, the show about modern data management
- When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show!
- Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo, or Facebook Ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you’re looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hightouch for Reverse ETL today. Get started for free at dataengineeringpodcast.com/hightouch.
- Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription
- Your host is Tobias Macey and today I’m interviewing Daniel Molnar and Peter Fabian about the lessons that they learned from their first cohort at the Pipeline data engineering academy
Interview
- Introduction
- How did you get involved in the area of data management?
- Can you start by sharing the curriculum and learning goals for the students?
- How did you set a common baseline for all of the students to build from throughout the program?
- What was your process for determining the structure of the tasks and the tooling used?
- What were some of the topics/tools that the students had the most difficulty with?
- What topics/tools were the easiest to grasp?
- What are some difficulties that you encountered while trying to teach different concepts?
- How did you deal with the tension of teaching the fundamentals while tying them to toolchains that hiring managers are looking for?
- What are the successes that you had with this cohort and what changes are you making to your approach/curriculum to build on them?
- What are some of the failures that you encountered and what lessons have you taken from them?
- How did the pandemic impact your overall plan and execution of the initial cohort?
- What were the skills that you focused on for interview preparation?
- What level of ongoing support/engagement do you have with students once they complete the curriculum?
- What are the most interesting, innovative, or unexpected solutions that you saw from your students?
- What are the most interesting, unexpected, or challenging lessons that you have learned while working with your first cohort?
- When is a bootcamp the wrong approach for skill development?
- What do you have planned for the future of the Pipeline Academy?
Contact Info
Parting Question
- From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
- Pipeline Academy
- Scikit
- Pandas
- Urchin
- Kafka
- Three "C"s – Context, Confidence, and Code
- Prefect
- Great Expectations
- Docker
- Kubernetes
- Become a Data Engineer On A Shoestring
- James Mickens
The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA
Hello, and welcome to the Data Engineering Podcast, the show about modern data management. Have you ever woken up to a crisis because a number on a dashboard is broken and no 1 knows why? Or sent out frustrating Slack messages trying to find the right dataset? Or tried to understand what a column name means? Our friends at Outland started out as a data team themselves and faced all this collaboration chaos. They started building Outland as an internal tool for themselves. Outland is a collaborative workspace for data driven teams like GitHub for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets and code, Atlant enables teams to create a single source of truth for all their data assets and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker, and more.
Go to dataengineeringpodcast.com/outland today. That's a t l a n, and sign up for a free trial. If you're a data engineering podcast listener, you get credits worth $3,000 on an annual subscription. When you're ready to build your next pipeline and want to test out the projects you hear about on the show, you'll need somewhere to deploy it. So check out our friends over at Linode. With our managed Kubernetes platform, it's now even easier to deploy and scale your work flows or try out the latest Helm charts from tools like Pulsar, Packaderm, and Dagster. With simple pricing, fast networking, object storage, and worldwide data centers, you've got everything you need to run a bulletproof data platform.
Go to data engineering podcast.com/linode today. That's l I n o d e, and get a $100 credit to try out a Kubernetes cluster of your own. And don't forget to thank them for their continued support of this show. Your host is Tobias Macy. And today, I'm interviewing Daniel Molnar and Peter Fabian about the lessons that they learned from the first cohort at the Pipeline Data Engineering Academy. So, Daniel, you've been on here before, but if you can just introduce yourself again for the folks who haven't listened to that episode. Thank you for having me. My name is Daniel.
[00:02:12] Unknown:
I like to call myself the data janitor because I spend so much time doing data, almost 12 years now, and I realized that most of the time what I'm doing is actually janitorship. I'm trying to shovel things from a to b. Besides, people call me an engineer or scientist or analyst. I actually just try to make sure that the pipes are running. Yes. I'm running with my lovely friend, Peter, Pipeline Data Engineering Academy,
[00:02:39] Unknown:
the 1st 1st Data Engineering Boot Camp. And, Peter, if you can introduce yourself as well.
[00:02:44] Unknown:
Thank you, Tobias. Hi, everyone. My name is Peter, originally from Hungary, and I'm the counterpart to Daniel when it comes to running Pipeline Data Engineering Academy. I'm originally from Hungary and my background is pretty much in everything digital innovation and tech. That's what I did for the last 15 years before we embarked on this journey last year of founding Pipeline Academy and going on the 1st data engineering boot camp ever. Very interesting journey.
[00:03:17] Unknown:
And so as I mentioned, Daniel, you've been on before talking about before the pipeline academy started and some of your plans for it and your experiences as the data janitor. So going back to you, Daniel, do you remember how you first got involved in the area of data management? Yeah. I think it was by mistake.
[00:03:33] Unknown:
Most likely, as everybody nobody was born. When I was going to school and when I got my jobs, like, there was no such titles like that. I mean, there were some people who were doing like data warehousing, but that seemed like belonging to like a different century. And I think I really got on board again when this whole thing happened, that you can have things running on your laptop, running Python, running scikit and pandas and things. It felt just like the time to get back to this. Let's make something really working with this. It should be like 12 years ago maybe. I think there's a time when you could still buy Google Analytics because it was not called Google Analytics. It was called Urchin. It was like a separate product, but, yeah, don't talk about that.
[00:04:21] Unknown:
And, Peter, do you remember how you got involved in data management?
[00:04:24] Unknown:
I very much do so. Almost 10 years ago, I joined a consultancy that was focusing on building various innovative experiences for larger clients. This basically means IoT products in various formats, whether it's smart home products, automotive products, and services. You can imagine that it's always a combination of hardware and software. And while building these MVPs and final products as well, you have seen that more and more data is being generated by humans, by machines, and captured via various sensors. And the more data was generated, the more meaningful it became, and the more people, tools were needed to interpret them in order to generate business value. And I think this is the culmination of where we are when it comes to that. The last 6 years in data science, the last couple of years in data engineering, it's becoming bigger and bigger because there is value in that data. And this is how I got firsthand experience with, yeah, more and more data and data management.
[00:05:32] Unknown:
And so, Daniel, you've been on the show before talking about the Pipeline Academy before you launched the first cohort as you were planning for it and talking about some of your goals for it. But now that you've gone through the first cohort, I'm wondering if you can just start by sharing some of the highlights of the curriculum that you put together and the overall learning goals that you had for the students that went through it.
[00:05:52] Unknown:
We try to build this curriculum from our own greedy perspective. Like, Peter was sitting in a hiring manager position. I was sitting in a hiring manager position before. We wanted to have, like, a rundown of the topics that we did really care for when we were looking for, like, data engineers. When starting out the boot camp, we really had to focus on tools, baseline tools that enables us to use other tools. So we focus on Python, SQL, command line, and some kind of version control, especially like GitHub. Like, these are the things that you have to have. These are like the hammer and nail that you have to have some kind of proficiency to be able to navigate this landscape.
And we're starting out our boot camp in, like, 2 weeks of prep, making sure that everybody's, like, up to speed, like revving up understanding of this tooling. So we can have like 9 major topics covered in the coming 9 weeks that we believe are utmost importance have in this field. So we started data acquisition, like, how should I get data? Logs, databases, APIs, third parties, scraping what are the entities that we actually try to get hold of. Then we get through telemetry, how do we measure these things that are out there, wherever they are, get them on the web, mobile, how streams and Kafka and queues pops up gets into this game and why do we use these ones.
And finally, in these first 3 topics, do we do EPL or ELT or LTE or what people do the day these days? So really, what are the tooling to have these automated processes? Getting data, capturing, putting them in, like, another place that's getting recovered in the next 3 weeks, which is like data warehousing. Sometimes it's just storage because you want to run your queries on files, but sometimes you like actual data warehouses when you do wanna care about, like, data modeling, good old school Kimball stuff, sometimes you focus more on, like, the computation part. So you much more care about things like Spark or Dask. When you really want to have complicated things, something that is harder to have done in SQL.
And touching base, the first type of products that data teams deliver, BI. BI tooling, data quality, data linage, getting understanding of why this metadata craze is back on us. And then we can go to, like, a far final blob, like, 3 weeks covering how the clouds work, what are typical data stack setups, how they look like, how much DevOps we have to understand to be able to put our things in production, what should we do about Docker and Kubernetes, CDCI, how can we achieve GitHub Actions these days, very nice tool. And finally, how to serve data to different audiences, like machines or humans, via APIs, via feature stores, putting together something that's more complicated than, like, a typical BI solution, and how to handle, like, bottlenecks, how to do optimization.
Finally, 1 has to deliver a portfolio project that combines all these domains into 1 solid thing that can be put in GitHub, that can be used in job application. And the 3 things we are striving for, we don't really like acronyms because we have too many of them anyway, but we came up with these 3 c's, which we deliver to our graduates. What is, like, context? This is like an ever changing field. I mean, this podcast covers every week another angle. To keeping up with the Jonas' is like a big job. So you have to be able to handle this situation. How How do you evaluate the landscape? How do you make a decision? Which tool to go for and which not? What is the solution for your problem? So context. 2nd, confidence.
Because there's so much marketing happening out there. There's so much push on different tooling, different trends, new product lines. Then you have to have a solid core to make sure that you can evaluate these possibilities that you might have. The third, we push you through these weeks. Every week, you have to do a lot of practical things. You have to do coding. You have to build a project that touches base with all of these domains. So you have to have a code base as a delivery. This is the curriculum. Very practical. Very pragmatic. Lots of teamwork,
[00:10:20] Unknown:
lots of coding, getting your hands dirty. And I think a couple of things to add here. The curriculum is basically a product of our understanding of the market demand and, of course, strong collaboration with, especially, the Berlin tech ecosystem, and feedback from stakeholders from various scale ups, startups, and larger companies here. So basically, the curriculum should support you on your journey of finding a job as a data engineer, or level up in whatever niche you're in, or whatever goals you wanna achieve within the realm of data or software engineering. And the way this curriculum has been designed and is being delivered actually as an experience during the 12 weeks of the course is very much based on the way Daniel and I operate in general when we work. And these became our values or disclosed fundamentals, how we operate. We're trying to be as transparent in what we do and how we do things, how we make decisions.
This is this can be checked out on our website, on our blog. We are very pragmatic, so we focus only on things that can be applied at work that is useful for you when making actual decisions as a data engineer. I think our participants, our students, enormous responsibility for the future when it comes to sustainability. It's like, whenever I just imagine an architect today not knowing about how sustainable materials, how sustainable methodologies are being used to build a house or building. It's unimaginable. And today's systems, data infrastructure, is mostly built without taking into consideration how much energy, how much labor, how much money it will consume.
And very often when we do consulting, we see the results of that. That's I think it's it's essential for students to learn about the basics of how to deal with that, and how to consider this when making decisions about the infrastructure. And again, I cannot emphasize this enough, collaboration. Collaboration, not just between, you know, teacher and student, but among students as peers, not just in the cohort, but also afterwards. I think interacting with others via Zoom or whatever tools you use for messaging and communication nowadays it's even more important because what we are seeing that most positions or a lot of positions, data engineer positions, are turning to partially or totally remote positions. So our graduates will likely communicate even more than imagined 2 years ago via online tools,
[00:13:11] Unknown:
via video conferences, and so on. So we're kind of preparing them for that as well. A couple things to pull out of there. 1 is that you mentioned that you collaborated with some of the local startups and businesses in your area. And I'm curious if you had any agreements going into this of sort of preferred placement within those companies for your graduates and just the process that you went through to be able to help the graduates of the academy find a job afterwards and be able to directly apply their skills?
[00:13:41] Unknown:
Yes. Applying their skills and basically proving that what they learn is not just useful, but actually transformational when it comes to their careers is 1 of the key points that we're making throughout the 12 weeks and also after that. What I mean with that is part of my personal responsibility at Pipeline Academy is to coach the participants for the interview process, especially focusing on the soft scale side, and to give them an idea of what what kind of potential they have after they've graduated and after a couple of years they've spent in the realm of data. The interesting part in picking the right companies to work with is very much based on our experiences, our personal network, and our idea of what it means to work at a company on a how do I put this very politely, on a long term mentally, socially sustainable basis. We're trying to filter and inform our students about best practices, companies that are great to work for, and companies that are not necessarily great to work for. It's part of the experience, and if you're entering the realm of data, I think this kind of guidance is definitely
[00:14:57] Unknown:
or can be really really helpful to navigate this world. It's very nice to hear that, you know, our students already started applying for jobs, some of them during the cohort. And sometimes, what they learned yesterday was very important in the interview that they had today. And sometimes they complain that why didn't we learn about this yesterday? Because I should have answered those questions, really. So it's it's good to see that even if they apply for a company that we don't know about. This is kind of a justification that, you know, it's real, what you're learning, because people are asking about it. Yes.
[00:15:36] Unknown:
And just to finish my thought, regards working with organizations who's gonna who gonna hire our participants and our graduates, it was amazing to see that participants who have started applying for positions have started actively interviewing for various different roles, during our first cohort, have received multiple job offers. And it was also very interesting to witness the decision making process, how they give us feedback about the interview process, and how what we teach is very much applicable and very much real in terms of, oh, this was a thing that popped up in my interview. Oh, it's good that we've spoken about files that are bigger than 1 terabyte and how to move them from a to b because this was the fun part of the question. So it's very much feedback for us as well that what we are doing and the way we selected
[00:16:29] Unknown:
what should be thought in the course is useful, and it's helping them to find a job. Because of the fact that your participants are all coming from different backgrounds, different experiences, there's you know, no matter what job you're applying to and what experiences you've had, you're never going to have a 100% overlap of the tools that you've worked with and the tools that you're going to work with. And so because of the fact that you're trying to move everybody along through the same experience, I'm wondering how you approached setting a common baseline for everybody so that they had a good launching off point for the rest of the program.
[00:17:01] Unknown:
1st, again, reflecting that we believe that Python is not the best thing anything, but it's good enough for most of the things. SQL is with us for, like, decades. It will stay. And if you can solve something with a 1 liner, you should solve it with 1 liner. Besides that, you should be able to open a pull request, communicate through issues, report a bug, so these are the baseline. On the other hand, we try to pick some of the upcoming tools also because definitely hiring manager looking for, like, specific tooling that you should have and understanding how do they work. So but how to make some decisions that, yeah, I know, prefect is cool or we see, like, a lot of great expectations projects popping up where we look. So let's include those tools because we see what is the value they provide.
And we know that this is a hot thing, and most likely it's gonna stay hot for 1, 2 years, and you know, you never know how long. That's also an interesting part of the course that week 3, you start to learn about how to evaluate a tool. At that moment, it can be a bit disturbing because how should I know and why do I have to do this? And it's like, this is why. And as you go through all the domains by weeks weeks 6, 7, 8, you get a hold of it. Yes. This is part of the job. Like, tools come and go. I should be able to look at them, and I should be able to make a decision or at least cover my bag that why did I choose that 1 for this extra special roll. So it's a very delicate balance of kind of timeless, if you can say anything like that in in our industry, or things that are really just popping up every day. And, oh, that company got, like, 80 mil.
They should be around for 2 years. Is that good? I don't know it yet. So hard choice, and also it gave us another window in reality and also for the students that Daniel decided this goes into the curriculum. We're gonna do this with it. But the product changed, like, from yesterday to today. It's like a living organism. Like, that's 1 of the other hard things that, you know, you can go out and say, like, this is hard fact about 2 x. It happened to us definitely, like, less than a dozen times that our graduates open, like, issues on, like, live projects and get up saying, like, hey, man. I did everything. Really. This just doesn't work. So that's also 1 level of confidence that you have to say, like, I'm doing data engineering for, like, 3 weeks now.
Opening an issue on project x that's
[00:19:40] Unknown:
really, I can solve this problem. And I think this very much connects to the idea of confidence that our graduates have after the course. What I mean with that is that if you take a look at the data engineer job description today, it will be littered with tools that you're not familiar with, and it can become really quickly very intimidating. And after 12 weeks, you're gonna go, like, okay, I know this, this, this, this tool. I touch this, this, this tool. I'm not perfect at this this, but I've seen it. I know what it does and whatever you give me, I'm confident that I'm gonna learn it in, let's say, a couple of days, a couple of weeks, whatever. The students are confident that they can get it. And after that, the whole idea of this description of the position, it becomes less intimidating. It becomes like, oh, I can manage this. And this is what you wanna have in an interview process. Regardless whether you're gonna go as a junior or you're gonna be actually, let's say, head of data or in an engineering leadership position leading multiple data engineers, and just wanna understand the real landscape. What's there? How to select tools? How to select the right process?
The right team setup and so on. So and this is what the interesting part is really when you look at every week after each other, there's huge differences. And last time, I think it was week number 8, Daniel. I'm looking at you. I can't remember anymore. But I think it was week number 8. Just funny because both you and I were on mute in Zoom call. And there was just a chit chat amongst the participants, like 6, 8 people in the room at that point. And they were discussing some issue that they've had or challenge that they've had during an assignment in the morning. At week 8, I was looking up from my computer. I was like, holy hell.
This is a data engineering conversation that I don't get. I don't understand because they're so deeply into the nitty gritty of how to do things. And I was thinking about the same people 7 weeks ago. It's like, okay. That kind of seems to be working. Okay. This is this sounds like humble bragging and I'm sorry for it, but this is 1 of the proudest moments that I've had during the 12 weeks. So, yeah, I'm I'm kinda proud of that.
[00:21:53] Unknown:
And just to give some context for people who aren't familiar with the group that you're doing. And I'm just wondering what was the size of the cohort, and if you can give any sort of feel for the range of backgrounds and the sort of range of diversity, either in ages or, you know, demographics, things like
[00:22:11] Unknown:
that? Totally. Couple of hard facts. So the course is 12 weeks full time from 9 to 5. Currently, we are in an online hybrid mode, leveraging mostly the video calls. But, of course, we have a campus where we can have students visiting according to lockdown, corona measure, regulation, everything, social distance signal in place. 12 weeks, not more than 12 people in the room. That's set in stone because we wanna maintain the quality and wanna have enough access and time for all the participants and their individual needs. We're pretty much hands on.
So we selected for our founding cohort 9 people, and 9 people from 9 different countries. Just to give you also statistics on diversity, it was 3 females, 6 males, not all of them living actually in Germany, from, again, 9 different countries. And their backgrounds professional backgrounds were super interesting, because we had from starting with an airline pilot, basically, until IT system admin. What I mean with that is that compared to what we've anticipated, it was mind blowing to watch that data engineering or the stuff that they learn can work really as a kind of a magic serum, because it can amplify, like, a superpower almost any relevant skill that you've had in the realm of software engineering or data in the past. What do I mean with that?
We had 1 super talented lady who had multiple years of experience as a QA engineer. So she was very much familiar with processes, reporting bugs, and so on. These are super handy things. We had a person who was more on the process and governance side of data in the past, and now she has a much more in-depth experience and understanding of the infrastructure and the infrastructure decisions that need to be made. We have data scientists who are coming to productionize machine learning models and focus on actually putting all the math and statistics into prod, finally.
We have generalists who have been product managers, product owners, and moving towards the world of becoming a data product owner. And for us, it's also a learning curve to see that although you would think that it is a niche, it's a growing niche. And the interesting part is that if you're a product owner for web, you don't necessarily understand the processes that are inherent to work with data tools, with data products, managing stakeholders, managing the stakeholder expectations, managing a data team. So these are kind of the things that they pick up during this process, and this is how they level up within their professional work. They won't be hands on data engineers. We expect them to code, though, during the 12 weeks. So then nobody gets preferential treatment. Everybody has to go to the same drill. The way you apply it after is very very diverse, And it's very interesting to see that just to give you an example, analytics engineering is something very promising as we've seen and something that is growing. The demand is definitely growing on the job market. Analytics engineers are not just Daniel, as far as I remember, there was this article that defined analytics engineers as a pissed off data engineer or pissed off data analyst. Right? Pissed off data analyst. The data analyst who got to the point saying, basically, that I'm gonna build my own pipelines to get my own data and analyze it. So, basically, imagine a data analyst who learns the basics, the fundamentals of ETL and data entering to move data, and this is something extremely promising and interesting. And we are seeing that in the last 6 years, multiple Berlin based unicorns started hiring excessively for this kind of roles. It's great to see that the analysts and data scientists coming to our school
[00:26:21] Unknown:
learning that more than well set up to get these roles and to be hired for these roles. Before we go too much further, you touched on the fact the sort of impacts of COVID on how you actually conducted the cohort. And I'm wondering what types of shifts you had to go through where you were originally planning on doing it entirely in person and then, obviously, the pandemic hit and just how you were able to adapt the overall approach and the experience and anything that you feel was sort of the pros and cons of that situation. You know, what went well as a result of COVID, the lockdown and having to go fully remote, and what are some of the things that you think that the students missed out on by not being physically in the same space?
[00:27:02] Unknown:
Very good question because we spent quite some time figuring out when to start and how to start, and we were very aware of, like, doing a fully online thing. I'm not a big believers. Like, we did our summer camp, like, last year to do, like, I don't know, kind of drive around how it works, and it worked fine. But the major feedback was that this should happen in person. Yeah. COVID didn't run away, so we had to start off somehow. And I believe that building rapport and building community was harder, but it happened. I really like that we could emulate, like, not forcefully, but, like, natural that, you know, there's Slack, there's an open PR in GitHub, you have to react to the thing, everything goes to that Slack, so you find it there. Then so it felt, like, natural.
And if we have to use these tools, because everybody's sitting in their own corner, and we should be able to talk about our problems in a small box, and interacting and using whiteboards or whatnot. So this type of limitation, I think, became like a catalyst on it. I really have to try harder to make, you know, my bug report proficient, or how can I show you what I should be showing you? So emulating real life engineering experiences during COVID kind of came naturally to this 1. We try to have a good mix of, you know, focus time when you really have to do something, this is your own thing, you have to deliver.
And have teamwork also part of methodology. Sometimes you have to work together, like, very different setups, have to do research together, have to explain to someone else what you're doing. And we also kind of came up with these ideas sometimes during the run up. Like, we had this concept of explain to Peter, for instance, which went very, very well. Like, you know, I was trying to push these crazy agendas and all these architectural whatnots every week, not a topic. And then you have to be able to tell it to someone who's not a data engineer, who might be your stakeholder, your manager, or someone sitting in a different team, something that you just learned yesterday. How can you explain it? How can you make the other person draw the thing on a whiteboard while you're sitting on Zoom in another place?
This is like real life these days if you work in this industry. And it's not easy to do. Like, you have to learn how to do it. You have to practice it. What's your take on this 1, Peter?
[00:29:39] Unknown:
Yes. If it was not clear to everyone, I am not a data engineer. And in this particular case, this comes in handy because I can play the naive product owner who comes in with super naive questions every morning. And I kind of ask fairly complicated concepts to be explained to me in the morning while probably you're just having your first coffee, and I wanna understand it with plain and simple and English. It's a fun exercise because, at least for me, because I learn a lot. But it's also an important 1 for our students because this is kind of what an interview feels like oftentimes. That you there's a question and there's a naive person or seemingly naive person sitting in front of you asking, okay, can you explain this to me? And if you do this every morning for 3 months, it's kind of becoming something simple and something supernatural to you. So I hope I know I know that our lovely students experienced this and had fun in the morning explaining stuff to me as well, but they actually reported that it was useful. And this is something that is essential for your future job because you're gonna have to interface and interact with so many stakeholders who speak different languages and have a different understanding of what data is and how it should be managed and handled and what it's good for, that yeah. Yeah. It's a fun.
Explain to Peter is becoming is definitely staying as as an agenda item in our schedule.
[00:31:07] Unknown:
And then in terms of just the course curriculum and the experiences that you had as the leaders of the cohort and that the students had trying to go through it, I'm wondering what were some of the kind of conceptual stumbling blocks or difficulties that you experienced with the tooling from either direction.
[00:31:25] Unknown:
Yes. I think the challenge here is data engineering is very wide. Just sheer volume of acronyms and definitions. And believe me, throughout these weeks, you will find acronyms that are actually the same but mean different thing in 3 weeks. So this is something that I I always tell them, please keep in notes. I know it's 2021, but write notes. It's a lot of things that's going to cover. I would say that Kafka, Docker, and Kubernetes is not something natural to people, and I feel for that. So you really have to figure that why on earth are we doing this? What are the ups and downs of it?
How deep should I really get in there? 1 thing is understanding the concept. Second thing is to, you know, be able to use for my old thing. And 3rd is, like, wow, all these crazy things are happening at the same time. Should I really understand it? What should I grasp of this thing? So it was very nice to hear after, I don't know, 4, 5, 6, 8 weeks when we get to, like, to the more complicated stuff that some of the people said, like, I will do the make 5. Like and I'm talking about people who did not know what a make 5 was before the course.
So somehow, simplicity and trusting yourself that if you can understand something, that it's most likely there. But if you look at, like, a Docker Compose and it starts, like, 7 different things, then you start thinking, it was like, is it really necessary? Or why are these things are flying all over the place? And it also reflected on some of the tooling that they had chosen for the portfolio projects. When you evaluate Redshift and BigQuery and Snowflake and play around with them, and at the end of the day you say, like, SQLite.
I'm fine with SQLite. That's kind of a mature evaluation process that you can you can use all these things. But, like, if I have to deliver something by tomorrow, let's do it now.
[00:33:39] Unknown:
Are you bored with writing scripts to move data into SaaS tools like Salesforce, Marketo or Facebook ads? Hightouch is the easiest way to sync data into the platforms that your business teams rely on. The data you're looking for is already in your data warehouse and BI tools. Connect your warehouse to Hightouch, paste a SQL query, and use their visual mapper to specify how data should appear in your SaaS systems. No more scripts, just SQL. Supercharge your business teams with customer data using Hitouch for reverse ETL today. Get started for free at dataengineeringpodcast.com/hitouch. As far as some of the kind of notable successes, I know you mentioned, you know, week 8 of the group digging into the nitty gritty details of some problem they were trying to solve. But what were some of the lessons that you learned from going through and some of the wins that you found throughout that you're planning on investing in for future cohorts?
And what were some of the pieces that you discovered didn't really work either because of the fact that it was all virtual or just because it was just too much at once or it didn't all mesh together that you're planning on removing and, you know, that you're able to take a lesson from and learn for future cohorts that you work with?
[00:34:49] Unknown:
I think making data engineering accessible is kind of the thing that is on our walls and kind of our mantra that we keep repeating. Even if someone doesn't end up as a participant of our course, we wanna make sure that we give some help, some support, some guidance when it comes to getting started with learning with it. Because just like coming up in the curriculum and designing the curriculum, the tough part is what to leave in and what to leave out. So actually, when you go through the course, you have something that you can work with, that you can you can leverage. Seeing people finding jobs during the cohort already, and seeing what kind of interviews and offers our students receive is very very rewarding, and it's kind of like the best positive feedback that you can get, because this is the outcome that we are designed for, that we we want to achieve.
So that's definitely a plus. I think what we've learned is that, although we were reluctant in the first place to go online or in a hybrid mode, working online and having the course online allows more people to join, more people from various geographies to actually access the course. And in these terms, it's transformational as well, because we wanna find the right balance between making it accessible for people coming from, let's say, Singapore or the West Coast. At the same time, we wanna make sure that whoever is local and whoever is Berlin gets access to us directly. We can have 1 on 1 sessions, because both Daniel and I are really like interacting with people and having people around. And there's just this magic of camaraderie that is difficult to replicate online, but after having done the whole course in this setting, I do believe that it's possible. It's not the same experience. It's a different experience than being there in person in a classroom, but it's definitely working.
When it comes to the realities and failure failure is such a hard word. But in general, I think what we've learned and what we're gonna iterate on is that we need to manage expectations. And making sure that we're saying that we want to make this course accessible for everyone, It requires pre work, and we wanna make sure that whoever comes to the course knows that. They also need to know what they're signing up for. So what a data engineer does, how their daily work is gonna look like once they're done, and once they found a junior data engineering positions at company x. And I think this is extremely important, because you can only have motivated and dedicated people, or the right people with the right dedication and right motivation, if they know what's coming towards them. And this is why we are seeing that the course works best for people who have put in some effort or have already experience working with Python and SQL, coming either from data angle or from the software engineering angle.
And if you're a total rookie, we usually recommend you to do a couple of courses, spend a couple of months becoming more familiar with coding in general, with Python, with SQL. And after that, you can definitely join the course, and we can make this work for you. And we are supporting a couple of people who are now ramping up to 1 of our future cohorts because they're interested in becoming a data engineer. So I think our constructivist approach, pedagogical approach, is something that's kinda different and something that I would definitely highlight as as something where where I'm saying expectation management is required. We are not a boot camp where the process goes like, okay. I'm gonna show you how to create a blue rectangle on a website.
This is how it's done. Please do it after me. And now your task is gonna be, let's do green rectangle in the afternoon. I'm oversimplifying and exaggerating. I don't mean it in a bad way at all. What we took was the constructivist approach. What we force our students to do is go out of their comfort zone, explore themselves. The assignments that they receive are very much about trying to figure out what they can do and where they hit a wall. Daniel, I think you can speak to that even better than I. Absolutely. And it comes with a lot of surprises.
[00:39:17] Unknown:
When, you know, when you hand out, like, a Kafka assignment. Somebody randomly select, yeah, by the way, I just, like, pulled up this Node. Js thingy. Woah. Woah. Woah. It flies all over the place. We don't have to use this tool, but if it does the job, I'm fine with that, totally. There are some things that people do deliver if they don't know that it's hard. We have a graduate who started, like, the data engineering project. By the way, it became like a full stack thing because while she was at it, that's just through the front end, a bit of a back end. I don't know. It just works together.
Whatever. So it's good to see that people can roam a bit. It can feel scary. So we definitely like to throw people in the water and start talking about swimming when they manage to get out. But, I mean, it's a hard balance again. You can't push people too far, but they have to hit the wall. They have to learn the experience of hitting the wall, because that's what you do. Like, you know, we keep trying until it works, then nobody touches it. It's a lot of engineering here and there. Let me get back to this 1. So we also realized we are being like a data engine in boot camp. Engineering is not something trivial.
And I'm talking about simple stuff. Again, how to report a bug? How to tell me what you did, what you tried, what was the second thing, and then what you when you hit the wall, what's the context, can I run it, make a pull request, how do I produce the bug? These things that really, like how life works. How to clean up your mess, yes, You have to go back. I don't think that you have to write a documentation, but read me wouldn't hurt. These kind of things, like, we have to get into this model. So I think it's important. And I would be super happy if I would be knowing about more, like, engineering boot camps. This is not about, like, language. It's not about, like, algorithms.
It's about processes and this kind of best practices that how can we, again, collaborate. It's not just like my thing, but really getting to the point when week x, 3, 4, I don't know, living in it becomes natural. Like, you know that things are there, that's a pull request. I got an issue. Let's talk about this 1. Please don't push to master, because I will revert it. This kind of things, like, you know, how you learn. We learn a lot from them. And this is like the magic of it that everybody had, like, super different backgrounds, different expectations, different I like to call these ones, like, really superpowers.
Like, need no more of specific things that I do. I'm sure that's the way it goes. So how to handle this, you know, school setup, the authority setup. Like, when you walk in, 1st week, you're like, no, I don't know nothing about this. Let me tell just tell me what should I do. And when we get to demo day, because everybody really had to deliver, like, a portfolio project, then everybody is equal in the room. Like, this is your project. You own it. You can talk about it. You know your decisions. You know your failures. You know your rabbit holes, and we can talk about this. So that's an interesting metamorphosis, 1 might even can say.
[00:42:35] Unknown:
Can you also yes. Once the students were able to go through the program and you were starting to get to the end and you were trying to prepare them for actually moving into the workforce, I guess, what were some of the core areas of focus that you had during some of the interview prep and helping them understand these are the skills that people are hiring for. These are the things that you need to think about as you're going through the process. These are some of the things that you need to be considering to be able to integrate with the team and figure out how to kind of make your voice heard, how to work well with the rest of the, you know, organization, you know, cognitive flexibility that comes with, you know, I've figured out how to use this tool. I'm comfortable with it. This is what I just wanna do my whole career with, and then coming in and saying, no. We actually use tools x, y, and z instead. Now you need to figure it out.
[00:43:25] Unknown:
So to answer your question, there's a structured approach to learning the hard skills and preparing for the exercises that you're gonna have in the interview process. What I mean by that is that our students or participants actually do the take home tests that some of the major companies hiring or handing out to you, or very similar to them. We do stress tests. So you have to do an exercise, for example, on the time pressure. You have exercise and mock interviews in various settings in order to prepare for the questions that you are going to receive at various stages of the interview process.
And I think the rest is basically ingrained in the curriculum and our approach itself. So the fact that we are cleaning up our mess every Monday morning, meaning that nothing from last week should be untouched or just left the way it was, It's something that's very much taken from the actual work life. So there are expectations towards you, and we kind of mimic them in the cohort as well. Our daily structure pretty much mimics the your day at a tech company with a regular stand up in the morning, a closer in the afternoon, to maybe brief the way you get assignments is very much in the format of a ticket that you could receive.
Coming back to the right way of answering this question. The additional thing that we have experienced during our 1st cohort was that people started actively sharing their interview experiences with each other. And once you start hearing from your peers about, hey, this is how I solved it, and this is how you could have solved it, and, oh, I did not know the answer to this question, but now I know the answer because I looked it up. And this company was actually really nice during the interview process versus the other 1 who were, let's say, unprofessional. These are the experiences, when they start sharing them, it becomes more natural and less intimidating. Because again, interview processes can be intimidating, especially if you're going for a role where you've never applied for before.
And with our graduates, it's most often the case. So I think the preparation of having I'm coming back to explain to Peter because I just love it so much. But my point is really that these kind of exercises really prepare you for whatever is thrown in your way during the interview process. And what I meant with the optimal outcome is an increased level of confidence in you, in your skills, in your competences, and that whatever is gonna be thrown at your way, you're gonna solve it somehow. It's an engineering exercise. It's a tool. You can learn it. It's nothing new. Data goes from a to b. And if that's your attitude in an interview process, I think you're halfway there because that's what not just companies are looking for, but that's the attitude that gets you a job. That's what I think at least.
[00:46:19] Unknown:
And as you were going through the cohort and working with your students and maybe particularly as they were presenting their capstone projects, what are some of the most interesting or innovative or unexpected solutions that that you saw them pull together?
[00:46:33] Unknown:
I mentioned some projects in this conversation, but let me mention another 1. When our lovely graduate was thinking that the proper way to run a data product with ETL and the web interface is to run it on his Raspberry Pi at home. And, like, yeah, I will do reverse proxy the whole thing type of stuff, which you wouldn't expect. I mean, I mean, it can be done, but it's not trivial if you, like, never heard this thing before. And it's like yeah. You can connect to my home computer. You can do it. Like, so, again, the attitude that I can push this card forward, and if I don't look at it this way, then, oh, this is gonna be super hard.
Maybe it's not. And in general, I was pretty much surprised at how ingrained it became for them that, you you know, sustainability in this very down to earth way that if I can solve it with a stick, I will solve it with a stick. Can I run my capstone project for a year for free? I will do that. So, like, sometimes it's okay to spend $5 on a thing to make, you know, things fly, but I also took it very seriously. That's like, if I can ship it without spending a buck, I'm cool. There's nothing nice in the world when you can select pop up hundreds of VMs in a sec and spend a lot of money forgetting switching of Redshift while you're learning it.
So I like the approach.
[00:48:04] Unknown:
And were there any interesting tools or approaches that you learned about that you hadn't come across before, but you you know, that 1 of your students brought to you?
[00:48:13] Unknown:
Absolutely. Like, I think I'm kind of okay with unit testing in general and testing in general. But then I've been shown, like, a Python framework. I forgot the name. I have so many names, but, like, it's a hard thing because I I have to keep in my mind all these things because they keep asking about them, and they come up with new new tools every day. So they do. And, again, this is part of the game that Daniel told us it's something, but you have to go and check. From time to time, I try to trick them, I have to be honest with you. So, like, just to get this, you know, out of the way that maybe I'm not telling you the truth. I don't know. Like, you know, double check on me. You can run the benchmark. Like, you have everything in your realm of possibilities to run the benchmark. That was that was also 1 of 1 of the things which made me really proud that my, you know, mania about benchmarking and actually measuring things somehow seeped into the mind that, you know, don't trust no 1. Like, they try to sell you something, and they're not your friends.
They're just selling a thing. And if it's nice, sure. But you have to be aware that it's really nice. You have to be able to evaluate it.
[00:49:24] Unknown:
That's pretty hilarious that you would feed them false information just to see if they could catch it.
[00:49:29] Unknown:
I'm a bad person.
[00:49:31] Unknown:
No. It's a it's a useful skill to know. You know, there are people who are gonna come to you with false information, not necessarily because they're trying to deceive you, but just because they don't know any better. And you need to be able to know,
[00:49:42] Unknown:
okay. That sounds like it might not be correct, so I'm gonna double check it. And then you know what was very interesting that by the end of the course, you know, when it comes to these assignments and these exercises, very often they're designed to get you to a point where you hit a wall. And you need to hit that wall multiple times trying different approaches. And at some point, you arrive at asking yourself, does the wall need to be there? Okay. What if I just send a message to the owner of this website or whoever created this tool whether the wall needs to be there? And they removed the wall, so now I'm good. And if you have this confidence, if you have this attitude of, okay, what if there would be no wall? And if I'm just a message away from that. And we've seen that actually being done that, okay, can I wipe this API from whatever and make it work? And they made it work for 1 of our suits. So it's the approach. It's the attitude that making things work does not necessarily follow a certain process or a certain rule, but it's your creativity that you put into it. And the way you solve problems is what actually generates the value.
[00:50:44] Unknown:
And in terms of your experiences of running the boot camp and working with this first cohort of students, what are some of the interesting or unexpected or challenging lessons that you learned in the process?
[00:50:54] Unknown:
1 of the most important things was definitely about sustainability. We preach sustainability when building tools. We preach sustainability when it comes to making sure that your work life balance is in order. But Dalian and I, we both have to do this for ourselves as well. Starting a company in 2020 and making it work in 2021 definitely takes a toll on you. And what we need to make sure is that when we grow, when our company grows, it remains sustainable for our lives as well. Meaning that, we remain healthy, we have fun at work, and we don't overdo it. Just like we did in the last couple of months. It was fun, but it's also very exhausting. So what we need to do and what we are going to focus on is to keep the balance and keep this going for the long run. Agree.
We also see this an interesting setup because this boot camp is not for fresh grads who wanna get into front end. So we have people who have careers,
[00:52:00] Unknown:
who have experiences, still a walk in the room, day 1st, you know. I wanna be a data engineer. Like, I I mean, again, as I mentioned, you tell me. And what is the role? So this is not like an authoritarian position, but, like, how much more likely facilitate this learning process. So the goal of the game is not that I show you how to do it. What's the way to do it? Deliberately try to show things that can be solved in different ways. Like, I don't care if it's a makefile or if it's a prefect. It does the job. Maybe you like to do things in Python. Oh, fine. Maybe you wanna do things with Jquery and makefile.
[00:52:40] Unknown:
All good. But I should be enable you to do this
[00:52:45] Unknown:
and still provide you guidance. This is the way we kind of do it, but you have to hit the wall by yourself. So the end game is not that you fill out, like, a questionnaire and I take the box, just let you know the nice answers. But the outcome is that you have it in your head. So that's why we really are keen on this constructivist approach that we don't transfer facts to your head, but they should come alive. You should feel on week 4, 5, 6 that you're overwhelmed, and why Daniel is not holding my hand, why I'm suffering with this thing. But the outcome is not that, oh, Daniel knows everything. No. The outcome is that I can solve this thing after 12 weeks. And that's an interesting setup.
Interesting role to play or be in, maybe.
[00:53:38] Unknown:
Adding 1 more thing to that. The goal of the course is to get you a job or to enable you to improve on your career. Although there are a lot of courses popping up nowadays, focusing more on lifestyle content, cohort based courses that are focusing on kind of non job or non career related skills and competences. We're not 1 of them. We're not in the certification business, we're in the education business. And this is why we're gonna make you work. It's something we knew before we got started, but it's very clear after the first cohort done and since graduated, is that you have to work if you join Pipeline Academy, and it's gonna be intense, but it's very rewarding.
And the outcome that we are focusing on, and that's how we select the people who are joining the course, is very much who want and are determined to upscale or learn data engineering and not just have another badge on their LinkedIn profiles.
[00:54:40] Unknown:
And on that note, what are the cases where a boot camp is the wrong approach for skill development and somebody might be better served either just doing some self directed learning, or taking a university course, or anything along those lines?
[00:54:54] Unknown:
I think the pandemic has given most of us a great opportunity to explore learning and learning methodologies that work for you. I myself have done this, and if you can do this basically free of charge. You can learn let's just talk about data at this point. You can use video based courses, text based courses, applications, various methodologies, short courses, intense courses, and boot camps. So first, I think it's important for someone to determine whether all the other alternatives work for you or not. Especially because those are financially much more accessible and require less time dedication from you. That's the general approach I would take when comparing other learning methodologies versus a boot camp. I think if you pick a boot camp, you need to know that it's gonna be intense.
It's exhausting to learn and sit in front of a computer for 12 weeks, and actually expand every single day on your knowledge. I think the next challenge once you determine that a bootcamp is something for you, is to pick the right 1. There are various websites that you can use to compare, to select, read reviews, and have an understanding of what they teach, what kind of students experiences, and outcomes you can work with. You have to be really mindful about taking a close look at the statistics that these institutions provide, which are public, what kind of placements and partners that they have where you can work or should work after that, what kind of positions and outlook you have on your career once finishing.
And this is everything but straightforward. So I do recommend everyone who is contemplating joining a boot camp, comparing them, doing interviews with all of them, talking to the people, checking what the experience feels like, even writing to former graduates to ask them about, hey. Is this cool? Does this work? What can you do? Is this for me? And especially take our case, most of the people are at least vaguely familiar with the idea of data engineering and do with it after. It already serves as a filter. So if the question would sound like, is Pipeline Academy the right boot camp experience for you? Then the answer is, I think if you are dedicated and dedicate the time and the effort, and you're motivated enough to go into this world, we are definitely great partners
[00:57:21] Unknown:
to support you on that journey. It also happened to us that when we talk to people that they say, like, okay,
[00:57:26] Unknown:
you are looking for a data analyst boot camp. Yes. Sometimes we have to give the right advice to people, and we do give the right advice to people all the time. But it it turned out in 1 of the first interview conversation with a few people that you are not looking for us. Go to them. Go to them. There are a couple of here in Berlin that we can definitely recommend. So selection process and picking a bootcamp is not straightforward. However, I also encourage everyone to think about return on investment. When it comes to price investment into a boot camp, regardless whether it's in Europe or in the US, the costs are significant.
Period. But you have to look at what you can get out of it, especially if you're picking I don't know, for example, joining a data science boot camp versus a dev development or full stack boot camp. Is this something for you? Try it out. And this is why online courses are so amazing, because at least you can take a look whether this is something for you. And especially when it comes to coding, I think a lot of people realize that the excitement about the idea of learning coding is not necessarily there once you've done it for 6 months. And this is an amazing filter, free of charge. Put in the effort and see if it's for you.
[00:58:40] Unknown:
Once your students have completed the program, what's the level of ongoing engagement that you have with them as they start new jobs and get started in their careers and just the sort of level of support that you're able to offer after the boot camp is done for that cohort?
[00:58:56] Unknown:
I think after the boot camp, everybody needs to breed for for a week or so, and this is totally normal and healthy. However, we intend and we do keep in touch very frequent with all of our students. Why? Because half of our participants from the 1st cohort did intentionally not get started with their job application process until the end of the cohort, which is a totally legitimate approach to take. 1st, you focus on learning, and then you focus on interviewing. Even though the course is over, we're gonna support them, of course, until they find a job. That's fact number 1. Number 2, we intend to keep them in our community, in our school's community, because we wanna see them grow. We wanna support them on their journeys.
And it's not like in these 12 weeks, all their questions have been answered. And if I think about data engineering as data science, let's say, 6 years ago, then I have to consider also the fact that our graduates are the future data engineering leadership, who are going to actually hire graduates of mine in 2, 3, 5 years. So having a very positive personal relationship, sitting 12 weeks in the same virtual room with these people, gets you also a very intimate personal connection. But it's also professional connections that I wanna see and I wanna see them placed in positive companies and growing even further after the boot camp.
[01:00:25] Unknown:
And as you get ready for the next cohort and you're planning how you're going to evolve or shift both with the changing climate of the pandemic as well as the lessons that you've learned from the first cohort, what are some of the things that you have planned for either the next cohort coming up or just the near to medium term future of Pipeline Academy itself?
[01:00:45] Unknown:
We still have, like, 2 cohorts running this year, so gonna be pretty busy. Second half, for organic growth, we believe now. So what we do, we like this intimate thing that we're not a factory, but really spend quite some time with them. And I think Peter is right. We kind of understand where they're coming from and where they really wanna go. So keeping out the quality is major thing, but we got a lot of requests really from all over the world, not just from here. How can they join? How can they grow and learn? So we're trying to think about that. And also, not everybody has the time and means to, you know, cut out 3 months of their life and dedicate this boot camp. To some kind of a part time course.
It should be interesting. But that's like a definitely different format. Like, boot camp is an intense thing. So let's try to have this conversation again in a year or in 2 years' time. No crazy big ones.
[01:01:50] Unknown:
I would agree. I think what we are focusing on right now is healthy organic growth to make sure that we keep up the quality that we envision for our work in general, and, of course, keep the expectations of our students really high towards us. And to make sure that all the other ideas that we have so while we're not doing cohorts, in the meantime, we are focusing on supporting organizations, start ups, especially around sustainability on consulting, revamping their stack very often, their data stack or the data infrastructure, and supporting them in how to do this on the long run, and also with training.
So we're seeing that during the demand, during the pandemic, the demand for data engineering did not change. The roles have tracked the open positions on various job sites during the pandemic, and it was like almost nothing happened. And in our contacts, various stakeholders from the Berlin ecosystem also reported that 1 of the first teams that they started hiring for again after the first hiring stops, let's say, last April, were the data engineering teams. Because although the numbers, for example, revenue numbers, site usage was lowered on an ecommerce website, the numbers have to be reported even though that's just a fact. So data engineering is kind of the essential work of data that has to be done. It's the infrastructure. It needs to work.
So supporting also these organizations with their recruitment processes and hiring processes, supporting them in designing team setups, goals. This is what we do in the meantime. And we wanna make sure that the handful of organizations we work with, we keep a very positive relationship with, and this goes on. But, yes, as Daniel mentioned, addressing international demand is something that we will tackle very soon.
[01:03:53] Unknown:
Are there any learning resources that you've put together for the purpose of the boot camp that you're planning on making more broadly available for people to do some sort of self directed access and be able to learn from some of the lessons that you've gained in the process?
[01:04:07] Unknown:
Absolutely. This is what we also try to focus on. Let's just try to do focus on it. We believe that our blog is helpful. So we have, like, all stuff that people really like, seems like, and how to start learning, again, data engineering shoestring, and the second part is coming out soon, again, on the reflection of what we see in the first cohort. So trying to give back to the community. And, hopefully, if we have some time, a bit more time besides besides doing the teaching. Yes. A bit of a organized knowledge base is coming towards your general direction. I wouldn't put a deadline on it because speed would kill me.
[01:04:53] Unknown:
True. Absolutely true. In addition to that, we also shared in the last year interviews with data engineers, so newcomers or people unfamiliar with this realm can get more access and insights into what do you do every day as a data engineer. Every month, we publish a selection of articles and posts that are relevant and interesting to improve your skill set, your competences. And, again, what we are addressing with our blog is the difficult part of learning data engineering, and this is what we've seen before we got started, is that difficult to pinpoint where to start, what to learn first, and how to access this, because the offers that you find are so broad and so weird, and the structure is what's missing. And this is what we're trying to put in front of people, because if you take a look at this particular blog post that Daniel is talking about, become a data engineer on shoestring, the books and the courses included in there have been read, have been tested, have been tried, and various others have been tried as well. And that's why we are confident to say, this is the 1 you should pick. If if this is your goal, then go with that 1. So everything has been curated. I think there is value in that guidance for people who are new. Or, for example, where we get a lot of feedback from data scientists who wanna go more towards infra. Yep. You know what? We do physical meetups.
Oh, we do physical meetups. Jesus Christ. Tomorrow is after, what, last October was, I think, our last 1, lost social distancing, but real meetup in the park. And tomorrow, since the sun's out and Berlin has very low COVID numbers at this point, We're gonna go outside. We're gonna grab some beers, and we're gonna have a couple of data engineer experts doing some straight talk about work. Yeah. Hot seat on DataStax
[01:06:50] Unknown:
in cooperation with the lovely Data Council.
[01:06:53] Unknown:
Well, thank you both very much for taking the time today. For anybody who wants to get in touch with you and follow along and possibly apply to the boot camp, I'll have you add your preferred contact information to the show notes. And as a final question, I'd like to get your perspective on what you see as being the biggest gap in the tooling or technology that's available for data management today. The only person comes to my mind who's called James Mickens.
[01:07:14] Unknown:
I don't know if you like his presentations or not. I'm a big fan. I didn't have, like, a James Mickens shrine, although he's still alive. And I think in 1 of his presentations, he told us that America does not need more engineer. America needs less engineers. So what I mean by that is we keep building things. We like to build things. Everybody wants to build things. We're happy to change some LinkedIn titles from engineering manager to over engineering manager. And on 1 hand, I'm grateful for you, for instance, that you do your show, and I will we get to know more and more new things, what's out there. But I believe that it was what we see with with companies and organizations.
This is more than often, it's not about tooling. It's not about technology, Not about culture, processes, the hard fact that there are no silver bullets. So for us, I think engineering is about solving a problem in a maintainable, sustainable way. And a lot of tools are just adding another problem or trying so hard to find a problem to be solved that they come in, like, the wrong direction. So if you can solve a problem with a 1 liner in Bash, don't ask for a framework, or you don't have to apply a paradigm for everything. Maybe what you're looking for, again, is SQLite file synchronized to s 3. These kind of things I do mean by that.
But still reflecting, you know, when we had our chat, like, long time ago in a galaxy far, far away, the number of tools just keep growing. Like, everybody's getting crazy and people are getting thrown money at, like like, crazy amount of money. Like, I just can't believe my eyes. And the only thing I hope is all of that money that they get is not just a sales budget, but actually some engineering maybe. I don't know. Yeah.
[01:09:33] Unknown:
Yeah. It's definitely a lot to keep up with, so I can agree with the sort of proliferation of tools. It'll be interesting to see what happens when the inevitable consolidation comes about, which ones actually are left standing. But definitely appreciate the perspective there. And so thank you both for taking the time today to join me and share the work that you've been doing at the Pipeline Academy, the experience that you've had going through your 1st cohort despite the pandemic. Definitely interesting problem domain that you're working in, helping people upscale into this crazy industry that we're working with. So appreciate all the time and effort you're putting into that, and I hope you enjoy the rest of your day. Thank you very much for having us, Tobias. Thank you, Tobias.
Listening. Don't forget to check out our other show, podcast.init@pythonpodcast.com to learn about the Python language, its community, and the innovative ways it is being used. And visit the site at dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something the show, please leave a review on Itunes and tell your friends and coworkers.
Introduction and Welcome
Interview with Daniel Molnar and Peter Fabian
Pipeline Academy Curriculum Highlights
Collaboration with Local Startups
Setting a Common Baseline for Students
Impact of COVID on the Bootcamp
Lessons Learned and Future Plans
Preparing Students for the Workforce
Interesting Projects and Tools
Choosing the Right Learning Path
Future of Pipeline Academy
Learning Resources and Community Engagement
Biggest Gaps in Data Management Tooling
Closing Remarks